-
Notifications
You must be signed in to change notification settings - Fork 248
Remove dangling layers. #2686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove dangling layers. #2686
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2686 +/- ##
==========================================
+ Coverage 91.31% 92.07% +0.76%
==========================================
Files 248 248
Lines 49387 49387
Branches 4355 4355
==========================================
+ Hits 45096 45473 +377
+ Misses 3559 3212 -347
+ Partials 732 702 -30
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes dangling Docker layers and improves CI pipeline efficiency by adding cleanup mechanisms and updating build processes. The changes focus on better resource management in GPU-enabled workflows.
- Removes unused environment variables and simplifies matrix configuration
- Switches from
docker buildtodocker buildxfor enhanced build capabilities - Adds comprehensive cleanup steps to prevent accumulation of dangling Docker layers
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| .github/workflows/pytest-gpu.yml | Major refactoring to use buildx, remove unused variables, add cleanup steps, and enhance test configuration with better logging |
| .github/workflows/docker-devito.yml | Removes conditional check that was preventing cleanup on nvidia GPU runners |
…tainer
Background
----------
Each self-hosted runner is pinned to a specific GPU via a host-level
CUDA_VISIBLE_DEVICES and we forward that mask to Docker:
docker run --gpus "device=$CUDA_VISIBLE_DEVICES" …
That flag alone is sufficient—Docker restricts the visible devices for the
container.
Problem
-------
We also injected the same variable into the container’s environment
(-e CUDA_VISIBLE_DEVICES).
Inside the container the CUDA/OpenACC runtime renumbers the visible GPUs
to 0…N-1, so a value like “1” or “2,3” is suddenly invalid and the first
kernel call aborts (`exit 1`) when multiple runners share the host.
Fix
---
* Drop the `-e CUDA_VISIBLE_DEVICES` export from `${{ matrix.flags }}`.
The device list is still enforced by `--gpus`, but the runtime now
starts counting at 0 as expected.
Verified on:
* Two concurrent nvidiagpu runners on a 4-V100 host – full test suite passes.
* amdgpu runner – unchanged.
…may contain spaces.
ec0f3b7 to
6e3ad7e
Compare
No description provided.