Skip to content

Conversation

@ggorman
Copy link
Contributor

@ggorman ggorman commented Jul 25, 2025

No description provided.

@ggorman ggorman requested a review from Copilot July 25, 2025 10:36

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Jul 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.07%. Comparing base (5f399da) to head (6e3ad7e).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2686      +/-   ##
==========================================
+ Coverage   91.31%   92.07%   +0.76%     
==========================================
  Files         248      248              
  Lines       49387    49387              
  Branches     4355     4355              
==========================================
+ Hits        45096    45473     +377     
+ Misses       3559     3212     -347     
+ Partials      732      702      -30     
Flag Coverage Δ
pytest-gpu-aomp-amdgpuX 72.87% <ø> (+0.49%) ⬆️
pytest-gpu-nvc-nvidiaX 73.99% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mloubout mloubout added the CI continuous integration label Jul 25, 2025
@ggorman ggorman requested a review from Copilot August 1, 2025 09:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes dangling Docker layers and improves CI pipeline efficiency by adding cleanup mechanisms and updating build processes. The changes focus on better resource management in GPU-enabled workflows.

  • Removes unused environment variables and simplifies matrix configuration
  • Switches from docker build to docker buildx for enhanced build capabilities
  • Adds comprehensive cleanup steps to prevent accumulation of dangling Docker layers

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
.github/workflows/pytest-gpu.yml Major refactoring to use buildx, remove unused variables, add cleanup steps, and enhance test configuration with better logging
.github/workflows/docker-devito.yml Removes conditional check that was preventing cleanup on nvidia GPU runners

ggorman added 4 commits August 2, 2025 14:46
…tainer

Background
----------
Each self-hosted runner is pinned to a specific GPU via a host-level
CUDA_VISIBLE_DEVICES and we forward that mask to Docker:

  docker run --gpus "device=$CUDA_VISIBLE_DEVICES" …

That flag alone is sufficient—Docker restricts the visible devices for the
container.

Problem
-------
We also injected the same variable into the container’s environment
(-e CUDA_VISIBLE_DEVICES).
Inside the container the CUDA/OpenACC runtime renumbers the visible GPUs
to 0…N-1, so a value like “1” or “2,3” is suddenly invalid and the first
kernel call aborts (`exit 1`) when multiple runners share the host.

Fix
---
* Drop the `-e CUDA_VISIBLE_DEVICES` export from `${{ matrix.flags }}`.
  The device list is still enforced by `--gpus`, but the runtime now
  starts counting at 0 as expected.

Verified on:
* Two concurrent nvidiagpu runners on a 4-V100 host – full test suite passes.
* amdgpu runner – unchanged.
@mloubout mloubout merged commit 1c6040a into main Aug 3, 2025
36 checks passed
@mloubout mloubout deleted the prune-dangling branch August 3, 2025 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI continuous integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants