Skip to content

Conversation

@martinpitt
Copy link
Contributor

@martinpitt martinpitt commented Apr 2, 2025

GitHub workflows recently grew the ability to set up nested QEMU kvm, but without much fanfare (see [1]). It just needs to create the /dev/kvm device node, but the host has it enabled and the kernel supports it.

With that we can run the QEMU tox tests. They are not covered anywhere else, and currently have some bugs, so let's cover them in CI and keep them green.

[1] https://github.com/orgs/community/discussions/8305


I know that this should eventually go to https://github.com/linux-system-roles/.github , but I want to test the run against an actual project. What is the official workflow for that? In recent .github project PRs there are zero test runs -- is there some magic to say "Run this PR against two roles, to prove it works"?

I developed this in my fork, and you can see the results in a recent run. Fedora 42 and CentOS 9 work. CentOS 10 has a broken image ref, I sent linux-system-roles/linux-system-roles.github.io#120 to fix that. Fedora 41 shows an ansible bug where it tries to use libdnf5 to query for an installed package instead of dnf4. (This reproduces fine locally, and I confirmed that 40 and 42 are ok -- should we just ignore that? This is tracked in https://issues.redhat.com/browse/RHELMISC-10110)

This was an initial step towards https://issues.redhat.com/browse/RHEL-85668 to figure out /dev/kvm in a GitHub workflow, but I believe it is useful in its own right. I don't see any CI coverage of the tox tests, neither in workflows nor in tmt. Of course I may have missed something, but given the above two errors there is clearly some coverage missing.

@richm @spetrosi Do you think this is useful, or does it duplicate something?

@martinpitt

This comment was marked as resolved.

@martinpitt
Copy link
Contributor Author

martinpitt commented Apr 2, 2025

Hmm, given how broken linux-system-roles/cockpit#200 is -- ISTM that we should either spend some time to actually fix them and get them into CI (I'm happy to help there of course), or drop them. I'm not yet sure what the primary upstream way of testing the roles is -- is that through tmt/TF?

Update: At least some of the failures also happen in existing tests, see linux-system-roles/cockpit#200 (comment) -- they just lag behind a bit, they don't test F42

Update 2: Discussed in chat, tox tests are current, valid, and desired. Let's fix them.

@martinpitt martinpitt marked this pull request as ready for review April 2, 2025 13:39
@codecov
Copy link

codecov bot commented Apr 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@3ffcc73). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #46   +/-   ##
=======================================
  Coverage        ?   52.31%           
=======================================
  Files           ?        1           
  Lines           ?      346           
  Branches        ?        0           
=======================================
  Hits            ?      181           
  Misses          ?      165           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


- name: Run tox integration tests
run: |
if ! tox -e ${{ env.TOX_ENV }} -- --image-name ${{ matrix.image }} --make-batch --log-level=debug --
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge here will be to publish the logs for each test in a way so that you can easily see from a high level which test failed, then examine that log for that particular test to find the error that caused the test to fail.

Looking at the artifacts/ansible.log is much more difficult and confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you do --make-batch, by default, the log for each test is written to tests/tests_name_of_test.yml.log

You can control the location of the log file, but then you cannot use --make-batch - you will need to create your own batch file https://github.com/linux-system-roles/tox-lsr?tab=readme-ov-file#batch-file-batch-report-batch-id and add a --log-file .... for each test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point -- we can parse the results and only cat the failing ones, with some easy-to-search separator like ====== , WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to think about this some more.

Maybe there is a way to upload logs to https://dl.fedoraproject.org/pub/alt/linuxsystemroles/logs/ like we do for the testing farm logs. If not, then maybe there is a way to have separate log files in some sort of github action result artifact. https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow If not, then we can use github action collapseable sections https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#grouping-log-lines

and at the top, have a summary e.g.

  • tests_foo.yml - PASS
  • tests_bar.yml - FAILED
  • tests_xxx.yml - PASS
  • ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh action artifacts are a great thing to transport files from one workflow/job to another, but they are unwieldy for human consumption: they always need a token, and are hard to find.

Uploading logs to dl.f.o would be possible, but makes it harder for contributors as that wouldn't work on forks -- you'd have to change the workflow.

Summary and grouping sounds nice! I re-enabled Fedora-41 for the time being to get a failed test run (which is more interesting for the logging). Of course I'll re-disable this for the final version.

You can see a successful c10s and a failing f41 run here.

@martinpitt martinpitt requested a review from Copilot April 4, 2025 14:23

This comment was marked as outdated.

@martinpitt martinpitt force-pushed the gh-tox-kvm branch 4 times, most recently from 7124555 to bacf7c6 Compare April 7, 2025 08:45
@martinpitt martinpitt requested review from Copilot and richm April 7, 2025 08:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

.github/workflows/ansible-tox.yml:57

  • The command line for running tox includes an extra trailing '--' which might lead to unintended parameter parsing. Consider reviewing and removing the extra '--' if it is not required.
tox -e ${{ env.TOX_ENV }} -- --image-name ${{ matrix.image }} --make-batch --log-level=debug --

@martinpitt
Copy link
Contributor Author

[citest]

@martinpitt
Copy link
Contributor Author

Ack, thanks @richm for review and your suggestions! I put back the previous Fedora 41 skipping so that the PR goes green. I updated linux-system-roles/cockpit#200 with the exact same commit/workflow to prove it's not specific to sudo, and also sent it to linux-system-roles/.github#94

@martinpitt
Copy link
Contributor Author

Renamed workflow as per linux-system-roles/.github#94 (review)

@martinpitt

This comment was marked as outdated.

@martinpitt
Copy link
Contributor Author

martinpitt commented Apr 8, 2025

... which fails in CI with

ERROR: Could not find a version that satisfies the requirement fmf (from versions: none)

This must be a regression in tox-lsr itself, investigating. Running tox locally works fine, but I do have fmf installed. But that's a dependency of standard-test-roles-inventory-qemu, and is also present in .tox/qemu-ansible-core-2.16/lib/python3.13/site-packages/. I can locally uninstall both rpms, and it still works (via tox env).

There are only a handful of commits between 3.4.0 and 3.5.0. With the topmost commit 51d7adc7b8f7c9b8b84f982a3e7b63d1d21f5264 (which is tagged 3.5.0) it still works, i.e. with

pip3 install "git+https://github.com/linux-system-roles/tox-lsr@51d7adc7b8f7c9b8b84f982a3e7b63d1d21f5264"

but with

pip3 install "git+https://github.com/linux-system-roles/tox-lsr@3.5.0"

it fails on this fmf error. WTH?? That is the very commit referenced by the release: https://github.com/linux-system-roles/tox-lsr/releases/tag/3.5.0 . I could at most imagine that the @3.5.0 pulls in the auto-generated tarball instead of doing an actual git clone.

Going back to 3.4.0 for the time being, but I also pushed some optimizations which reduced the installation of test deps from 1.5 to 2 mins to ~ 20 seconds.

@martinpitt martinpitt marked this pull request as draft April 8, 2025 10:33
@martinpitt martinpitt removed the request for review from spetrosi April 8, 2025 10:34
@martinpitt martinpitt force-pushed the gh-tox-kvm branch 4 times, most recently from 5bcebba to da6f8de Compare April 8, 2025 10:50
@martinpitt martinpitt marked this pull request as ready for review April 8, 2025 10:54
GitHub workflows recently grew the ability to set up nested QEMU kvm,
but without much fanfare (see [1]). It just needs to create the
`/dev/kvm` device node, but the host has it enabled and the kernel
supports it.

With that we can run the QEMU tox tests. They are not covered anywhere
else, and currently have some bugs, so let's cover them in CI and keep
them green.

[1] https://github.com/orgs/community/discussions/8305
@martinpitt
Copy link
Contributor Author

Bumped tox-lsr to 3.5.1.

@martinpitt
Copy link
Contributor Author

linux-system-roles/.github#94 landed, so this can land as well.

@martinpitt martinpitt merged commit 79b7472 into linux-system-roles:main Apr 8, 2025
21 checks passed
@martinpitt martinpitt deleted the gh-tox-kvm branch April 8, 2025 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants