ci: several changes related to new qemu test, ansible-lint, python versions, ubuntu versions #518

richm · 2025-04-08T18:13:45Z

There is a new QEMU based test which uses the qemu/kvm capability of
github action runners. This is the basis for new bootc/image mode tests
which we will be rolling out in the near future.

ansible-lint requires that the collection path is set so that the requirements
it installs are installed in the correct place.

There has been some general github action deprecation of python versions and
ubuntu versions that we have had to fix.

Remove CONTRIBUTOR from the list of users who can trigger citest.

For more information, see

martinpitt · 2025-04-09T10:31:36Z

Uh, qemu tests run into ENOSPC on the host. I suppose we can tone down the space requirements in the test and make smaller images etc., but I haven't looked yet. I can if you want me to.

martinpitt · 2025-04-11T08:17:00Z

On F42, the runner did not ENOSPC, it just failed the tests. The first failed test on CI is tests/tests_luks.yml. tox -e qemu-ansible-core-2.17 -- --image-name fedora-42 --log-level=debug --debug tests/tests_luks.yml by itself passes, so this is some complication in batch mode.

In F42, I see a large "/dev/dm-1 4.0G 110M 3.9G 3% /opt/test1" which is being created during the test in the VM (PV on /dev/sda), but it does get cleaned up again. I'm running watch -n1 du -hsc /tmp/tmp* on the host to check the usage of the sparse backend files, but curiously they always stay at 0 bytes, i.e. aren't even being used; they are also cleaned up properly after the test ends.

I cannot reproduce the "Unable to find enough unused disks. Exiting playbook." either.

For the ENOSPC on C9, I ran tox -e qemu-ansible-core-2.16 -- --image-name centos-9 --log-level=debug --debug tests/tests_filesystem_one_disk.yml locally, and that also passed and didn't leave cruft behind.

Moving this to draft, most likely we'll just drop these tests on GH for storage, they are just too big. But I want to experiment with this a bit on my fork, with getting an interactive shell on a gh runner. I'll see if running a subset of tests works, or what is actually ENOSPCing.

My current idea is to generalize --skip-tags tests::infiniband to --skip-tags tests::no-github, and mark tests accordingly. An initial run with most tests skipped looks quite promising (the c10s failure is a VM download glitch, unrelated).

Next attempt, run with skipping tests::nvme,tests::scsi

…rsions, ubuntu versions There is a new QEMU based test which uses the qemu/kvm capability of github action runners. This is the basis for new bootc/image mode tests which we will be rolling out in the near future. ansible-lint requires that the collection path is set so that the requirements it installs are installed in the correct place. There has been some general github action deprecation of python versions and ubuntu versions that we have had to fix. Remove `CONTRIBUTOR` from the list of users who can trigger citest. For more information, see * linux-system-roles/.github#98 * linux-system-roles/.github#94 * linux-system-roles/.github#93 * linux-system-roles/.github#92 * linux-system-roles/.github#91 Signed-off-by: Rich Megginson <rmeggins@redhat.com>

martinpitt · 2025-04-14T17:20:25Z

@richm the run on my fork is pretty okay -- some "honest" failures on f42 which I haven't yet looked into. But C9/C10 pass now, and it's a sane set of tests now. This includes the change from linux-system-roles/.github#99 but if you prefer I'm happy to remove that here. Or feel free to remove it yourself if you'd like to land this soon otherwise.

.github/workflows/qemu-kvm-integration-tests.yml

martinpitt · 2025-04-15T05:21:57Z

f42 failures:

✅ Unused disk

Many many tests fail with

Unable to find enough unused disks. Exiting playbook

This must be some leakage in batch mode. Running these individually works fine. I already investigated this a bit above, but no result yet. This will be my task today.

Fixed in "tests: Fix stratis block device clean up" commit.

tests_stratis

This has an "honest" failure:

The conditional check 'storage_test_pool.name in _stratis_pool_info.pools[0]['blockdevs']['datadevs'][0]['key_description']' failed. The error was: error while evaluating conditional (storage_test_pool.name in _stratis_pool_info.pools[0]['blockdevs']['datadevs'][0]['key_description']): 'dict object' has no attribute 'key_description'

This could be related to stratis-storage/stratisd#3805 and be due to an API change. This reproduces locally.

tests_resize

failure:

TASK [linux-system-roles.storage : Manage the pools and volumes to match the specified state] ***
volume 'foo-test1' cannot be resized from 9 GiB to 5 GiB: new size same as old size

This does not reproduce in isolation with tox -e qemu-ansible-core-2.17 -- --image-name fedora-42 --log-level=debug tests/tests_resize.yml

tests_lvm_pool_pv_grow.yml

failure

/dev/foo: already exists in filesystem

This smells like another cleanup failure. Now, just about every test uses "foo" as name, but I figure it's one of the other failures. I'll go through the "Clean up" actions once again with a fine comb and see which ones need fixing.

When I run it locally in isolation, it fails differently:

TASK [Verify each PV size] ***************************************************************************************************************************
task path: /var/home/martin/upstream/lsr/storage/tests/verify-pool-member-pvsize.yml:18
Tuesday 15 April 2025  10:45:41 +0200 (0:00:00.413)       0:00:36.427 ********* 
fatal: [/var/home/martin/.cache/linux-system-roles/fedora-42.qcow2]: FAILED! => {
    "assertion": "(dev_size.bytes - actual_pv_size.stdout | int) <= (4194304 + pv_pe_start.stdout | int)",
    "changed": false,
    "evaluated_to": false
}

MSG:

Unexpected difference between PV size and block device size: (device size: 1099511627776) (actual PV size:   8585740288)

#519 should fix CI run to fail "correctly".

✅ tests_remove_mount.yml

failure

cannot remove members 'sda' from pool 'foo' in safe mode

This also smells like cleanup failure, doesn't fail in isolation. Fixed by #519.

✅ tests_existing_lvm_pool.yml

failure

cannot remove members 'sda' from pool 'foo' in safe mode

Exactly the same failure as tests_remove_mount.yml, so cleanup failure. Also passes locally in isolation.

Fixing tests_resize cleanup didn't fix the follow-ups above. Testing run without stratis, and run without resize tests, to see which one breaks stuff. Update: neither change helped at all, the failures are independent.

Fixed by #519.

Always run the "Clean up" tasks, so that the volumes, pool, and most importantly block devices are removed also when a failure occurs. This unbreaks subsequent tests.

martinpitt · 2025-04-15T07:04:45Z

tests/tests_stratis.yml

-            - name: Create encrypted Stratis pool
+            - name: Create one Stratis pool with one volume


For reviewers: git and github make an utter mess of the diff view of "tests: Fix stratis block device clean up". What this does is to wrap the steps of a test into block: ... always: Cleanup. This changes the indendation, and due to the repetitive nature of the test the diff view locks onto" the wrong context. I didn't change any logic.

martinpitt · 2025-04-15T08:30:45Z

OK, this run is down to 5 failures, from the previous 26. I.e. the "Unable to find enough unused disks" are gone with the stratis cleanup fix. This may be good enough to land now, also to avoid regressing the c9/c10 tests?

vojtechtrefny · 2025-04-15T10:21:11Z

tests_stratis

This has an "honest" failure:

The conditional check 'storage_test_pool.name in _stratis_pool_info.pools[0]['blockdevs']['datadevs'][0]['key_description']' failed. The error was: error while evaluating conditional (storage_test_pool.name in _stratis_pool_info.pools[0]['blockdevs']['datadevs'][0]['key_description']): 'dict object' has no attribute 'key_description'

This could be related to stratis-storage/stratisd#3805 and be due to an API change. This reproduces locally.

This is caused by the changes to Stratis metadata and encryption in Stratis 3.8.0, but the problem should be only in tests in verify-pool-stratis, not in the role itself (or blivet). I'll try to find a way to verify the test results that works with both old and new Stratis.

tests/tests_resize.yml

richm · 2025-04-15T13:26:10Z

OK, this run is down to 5 failures, from the previous 26. I.e. the "Unable to find enough unused disks" are gone with the stratis cleanup fix. This may be good enough to land now, also to avoid regressing the c9/c10 tests?

Yes. Once you push the ansible-lint fix, we can merge this.

vojtechtrefny · 2025-04-15T13:47:22Z

tests/tests_stratis.yml

+                - name: Verify role results
+                  include_tasks: verify-role-results.yml
+
+              always:


I guess we should add this style of cleanup to all tests, but that's definitely out of scope for this PR.

Always run the "Clean up" tasks, even on failure, so that the used safe mode disks don't leak into the following tests.

Some roles produce ansible/test logs which are too large to sensibly/comfortably view in the main workflow log view. Show only the interesting failures there (the lines around `fatal:`) and upload the full logs as artifact.

Running all tests is too much for the GitHub runner capacity, and many of them fail.

martinpitt · 2025-04-15T13:53:05Z

Pushed the ansible lint fix. I was trying to make some more progress on the other failures, but they are fighting back well. So yeah, let's address these in follow-ups.

richm self-assigned this Apr 8, 2025

richm force-pushed the changes-20250408 branch 4 times, most recently from 5a76e5b to 8b2088f Compare April 11, 2025 00:09

martinpitt marked this pull request as draft April 11, 2025 07:50

martinpitt self-assigned this Apr 11, 2025

richm force-pushed the changes-20250408 branch from 8b2088f to b3691d5 Compare April 11, 2025 14:10

martinpitt mentioned this pull request Apr 14, 2025

ci: Avoid too large inline test logs in QEMU/KVM integration test linux-system-roles/.github#99

Merged

martinpitt marked this pull request as ready for review April 14, 2025 17:20

richm commented Apr 14, 2025

View reviewed changes

.github/workflows/qemu-kvm-integration-tests.yml Outdated Show resolved Hide resolved

martinpitt force-pushed the changes-20250408 branch from 9654530 to a9e0432 Compare April 15, 2025 04:59

martinpitt force-pushed the changes-20250408 branch from a9e0432 to 0f6260e Compare April 15, 2025 06:56

tests: Fix stratis block device clean up

4486220

Always run the "Clean up" tasks, so that the volumes, pool, and most importantly block devices are removed also when a failure occurs. This unbreaks subsequent tests.

martinpitt force-pushed the changes-20250408 branch from 0f6260e to afa98f2 Compare April 15, 2025 07:00

martinpitt reviewed Apr 15, 2025

View reviewed changes

martinpitt force-pushed the changes-20250408 branch from afa98f2 to 181f92d Compare April 15, 2025 09:21

richm commented Apr 15, 2025

View reviewed changes

tests/tests_resize.yml Outdated Show resolved Hide resolved

vojtechtrefny reviewed Apr 15, 2025

View reviewed changes

martinpitt added 3 commits April 15, 2025 15:51

tests: Fix resize test clean up

ab84ee7

Always run the "Clean up" tasks, even on failure, so that the used safe mode disks don't leak into the following tests.

ci: Skip nvme and scsi tests on GitHub

dda7fc1

Running all tests is too much for the GitHub runner capacity, and many of them fail.

martinpitt force-pushed the changes-20250408 branch from 181f92d to dda7fc1 Compare April 15, 2025 13:51

richm merged commit ae35218 into main Apr 15, 2025
18 of 19 checks passed

richm deleted the changes-20250408 branch April 15, 2025 16:48

martinpitt mentioned this pull request Apr 16, 2025

tests: Fix more cleanups; ci: Attach batch.txt and batch.report on failures #519

Merged

		- name: Create encrypted Stratis pool
		- name: Create one Stratis pool with one volume

ci: several changes related to new qemu test, ansible-lint, python versions, ubuntu versions #518

ci: several changes related to new qemu test, ansible-lint, python versions, ubuntu versions #518

Uh oh!

Conversation

richm commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinpitt commented Apr 9, 2025

Uh oh!

martinpitt commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinpitt commented Apr 14, 2025

Uh oh!

Uh oh!

martinpitt commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Unused disk

tests_stratis

tests_resize

tests_lvm_pool_pv_grow.yml

✅ tests_remove_mount.yml

✅ tests_existing_lvm_pool.yml

Uh oh!

martinpitt Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

martinpitt commented Apr 15, 2025

Uh oh!

vojtechtrefny commented Apr 15, 2025

tests_stratis

Uh oh!

Uh oh!

richm commented Apr 15, 2025

Uh oh!

vojtechtrefny Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

martinpitt commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

richm commented Apr 8, 2025 •

edited

Loading

martinpitt commented Apr 11, 2025 •

edited

Loading

martinpitt commented Apr 15, 2025 •

edited

Loading