中文版 | English
Shared Device Group is a Kubernetes scheduler plugin and controller that enables multiple pods to share the same set of GPU devices on a single node. It provides a declarative way to manage GPU sharing for workloads that need coordinated access to specific GPUs.
Shared Device Group allows you to:
- Define GPU groups: Create
SharedDeviceGroupresources that claim specific GPUs on a node - Share GPUs across pods: Multiple pods can reference the same group and share access to those GPUs
- Consistent device allocation: All pods in a group see the same
NVIDIA_VISIBLE_DEVICESenvironment variable - Automatic scheduling: Custom scheduler ensures pods using the same group are placed on the same node
- Resource protection: Prevents other groups from claiming already-allocated GPUs
- Single-node GPU sharing: Groups are bound to a single node, ensuring all pods share devices on the same machine
- Declarative configuration: Kubernetes-native CRD for defining device groups
- Automatic device injection: Webhook injects
NVIDIA_VISIBLE_DEVICESinto pods based on group allocation - Cache-aware scheduling: In-memory device tracker for fast scheduling decisions
- State recovery: Scheduler can recover device allocations after restarts by inspecting running pods
- NOT for multi-tenant isolation: There is no resource quota or access control between different groups. Any user who can create pods can access any SharedDeviceGroup.
- NOT for GPU virtualization: Does not provide GPU partitioning, time-sharing, or MPS (Multi-Process Service). All pods see the full GPU.
- NOT for dynamic rebalancing: Once a group is bound to a node, it cannot be moved. You must delete and recreate to change nodes.
- NOT for single-pod GPU allocation: If you just need to allocate GPUs to individual pods, use Kubernetes' native GPU device plugin instead.
- NOT for cross-node GPU access: All pods in a group must run on the same node where the group is bound.
-
Personal Development Environment
- Individual developers working on multi-GPU training jobs on their own machines
- Running multiple Jupyter notebooks that need to coordinate on specific GPUs
- Development and testing of distributed ML workloads on a single machine
-
All-in-One Workstations
- Single powerful workstation with multiple GPUs
- Multiple related workloads (training, inference, preprocessing) that need to share GPUs
- CI/CD pipelines testing multi-GPU applications on a single node
-
Coordinated GPU Access
- Multiple containers in a workflow that need to see the same GPUs
- Sidecar patterns where main container and sidecars need shared GPU access
- Multi-process applications split across containers
-
Multi-tenant production clusters
- No tenant isolation or resource quotas
- Any user can access any group
- No billing or accounting per user
-
Large-scale GPU clusters
- Groups are node-local only
- No support for GPU pooling across nodes
- Better suited for dedicated GPU cluster management solutions
-
Dynamic GPU scaling
- Groups cannot be resized or moved after binding
- Not suitable for autoscaling GPU resources
-
Scheduler Plugin (
deviceshare-scheduler)- Custom Kubernetes scheduler plugin
- Implements Filter and Score extensions
- Maintains in-memory device tracker for fast lookups
- Handles group binding and device allocation
-
Controller (
deviceshare-controller)- Watches pods with
deviceshare.io/groupannotation - Updates SharedDeviceGroup status with allocated pods
- Cleans up when pods are deleted
- Watches pods with
-
Webhook (
deviceshare-webhook)- Validates SharedDeviceGroup resources
- Ensures resource specifications are valid
- Prevents deletion of groups with active pods
- Injects
NVIDIA_VISIBLE_DEVICESenvironment variable into pods
- Kubernetes cluster (v1.20+)
- Nodes with NVIDIA GPUs (or AMD or Ascend) and nvidia-container-runtime (or ascend-docker-runtime) installed
- cert-manager (for webhook TLS certificates)
See cert-manager installation docs for more options.
# Install cert-manager if not already installed kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml # Verify cert-manager is running kubectl get pods -n cert-manager
- Helm 3
- Label GPU nodes:
kubectl label node <node-name> deviceshare.io/mode=shared- Install with Helm:
helm install shared-device-group deploy/helm/shared-device-group \
--namespace deviceshare-system \
--create-namespace \
--set scheduler.image.repository=ghcr.io/sceneryback/deviceshare/scheduler \
--set controller.image.repository=ghcr.io/sceneryback/deviceshare/controller \
--set webhook.image.repository=ghcr.io/sceneryback/deviceshare/webhook- Verify installation:
kubectl get pods -n deviceshare-systemYou should see:
deviceshare-scheduler-*(scheduler)deviceshare-controller-*(controller)deviceshare-webhook-*(webhook)
apiVersion: deviceshare.io/v1alpha1
kind: SharedDeviceGroup
metadata:
name: my-gpu-group
spec:
resources:
nvidia.com/gpu: 2 # Claim 2 GPUs
schedulingStrategy: binpack # or "spread"apiVersion: v1
kind: Pod
metadata:
name: workload-1
annotations:
deviceshare.io/group: my-gpu-group # Reference the group
spec:
schedulerName: deviceshare-scheduler # Use custom scheduler
containers:
- name: cuda-app
image: nvidia/cuda:12.4.1-base-ubuntu22.04
command: ["nvidia-smi"]
# NVIDIA_VISIBLE_DEVICES will be injected automaticallykubectl get shareddevicegroupsOutput:
NAME PHASE NODE AGE
my-gpu-group Bound gpu-node-1 5m
kubectl get shareddevicegroups my-gpu-group -o yamlstatus:
allocatedPods:
- default/workload-1
nodeName: gpu-node-1
phase: Bound
selectedDevices:
nvidia.com/gpu: "0,1" # GPUs 0 and 1 allocated- binpack: Prefer nodes with fewer available GPUs (pack workloads together)
- spread: Prefer nodes with more available GPUs (spread workloads out)
You can constrain which nodes a group can use:
apiVersion: deviceshare.io/v1alpha1
kind: SharedDeviceGroup
metadata:
name: my-gpu-group
spec:
resources:
nvidia.com/gpu: 2
nodeSelector:
gpu-type: a100 # Only bind to nodes with this labelCheck if the group is bound:
kubectl get shareddevicegroupsIf the group shows no NODE, check scheduler logs:
kubectl logs -n deviceshare-system -l app=deviceshare-schedulerCommon issues:
- No nodes have the
deviceshare.io/mode=sharedlabel - All GPUs on available nodes are already allocated to other groups
- Node selector doesn't match any nodes
The webhook prevents deleting groups with active pods:
# List pods using the group
kubectl get shareddevicegroups <group-name> -o jsonpath='{.status.allocatedPods}'
# Delete the pods first
kubectl delete pod <pod-name>
# Then delete the group
kubectl delete shareddevicegroups <group-name>If you see "available: 0" errors but know GPUs should be free:
# Restart the scheduler to clear cache
kubectl rollout restart deployment deviceshare-scheduler -n deviceshare-systemSee the examples/ directory for more examples:
multi-gpu-group.yaml- Multiple pods sharing 2 GPUssingle-gpu-group.yaml- Single GPU shared across pods
# Build all components
make build
# Build specific component
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o bin/scheduler cmd/scheduler/main.go# Run unit tests
go test ./...
# Build and deploy to local cluster
make docker-build
make deploy- No RBAC restrictions on SharedDeviceGroup access
- No resource quotas or limits per namespace/user
- Any pod can reference any group
- No audit logging of GPU access
Recommended for:
- Single-user development environments
- Trusted internal clusters
- Personal workstations
NOT recommended for:
- Production multi-tenant clusters
- Environments with untrusted users
- Compliance-sensitive workloads
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Apache License 2.0
