Skip to content

sceneryback/shared-device-group

Repository files navigation

Shared Device Group (SDG)

中文版 | English

Overview

Shared Device Group Architecture

Shared Device Group is a Kubernetes scheduler plugin and controller that enables multiple pods to share the same set of GPU devices on a single node. It provides a declarative way to manage GPU sharing for workloads that need coordinated access to specific GPUs.

What It Is

Shared Device Group allows you to:

  • Define GPU groups: Create SharedDeviceGroup resources that claim specific GPUs on a node
  • Share GPUs across pods: Multiple pods can reference the same group and share access to those GPUs
  • Consistent device allocation: All pods in a group see the same NVIDIA_VISIBLE_DEVICES environment variable
  • Automatic scheduling: Custom scheduler ensures pods using the same group are placed on the same node
  • Resource protection: Prevents other groups from claiming already-allocated GPUs

Key Features

  • Single-node GPU sharing: Groups are bound to a single node, ensuring all pods share devices on the same machine
  • Declarative configuration: Kubernetes-native CRD for defining device groups
  • Automatic device injection: Webhook injects NVIDIA_VISIBLE_DEVICES into pods based on group allocation
  • Cache-aware scheduling: In-memory device tracker for fast scheduling decisions
  • State recovery: Scheduler can recover device allocations after restarts by inspecting running pods

What It Is NOT

⚠️ Important Limitations:

  • NOT for multi-tenant isolation: There is no resource quota or access control between different groups. Any user who can create pods can access any SharedDeviceGroup.
  • NOT for GPU virtualization: Does not provide GPU partitioning, time-sharing, or MPS (Multi-Process Service). All pods see the full GPU.
  • NOT for dynamic rebalancing: Once a group is bound to a node, it cannot be moved. You must delete and recreate to change nodes.
  • NOT for single-pod GPU allocation: If you just need to allocate GPUs to individual pods, use Kubernetes' native GPU device plugin instead.
  • NOT for cross-node GPU access: All pods in a group must run on the same node where the group is bound.

Use Cases

✅ Ideal Scenarios

  1. Personal Development Environment

    • Individual developers working on multi-GPU training jobs on their own machines
    • Running multiple Jupyter notebooks that need to coordinate on specific GPUs
    • Development and testing of distributed ML workloads on a single machine
  2. All-in-One Workstations

    • Single powerful workstation with multiple GPUs
    • Multiple related workloads (training, inference, preprocessing) that need to share GPUs
    • CI/CD pipelines testing multi-GPU applications on a single node
  3. Coordinated GPU Access

    • Multiple containers in a workflow that need to see the same GPUs
    • Sidecar patterns where main container and sidecars need shared GPU access
    • Multi-process applications split across containers

❌ NOT Suitable For

  1. Multi-tenant production clusters

    • No tenant isolation or resource quotas
    • Any user can access any group
    • No billing or accounting per user
  2. Large-scale GPU clusters

    • Groups are node-local only
    • No support for GPU pooling across nodes
    • Better suited for dedicated GPU cluster management solutions
  3. Dynamic GPU scaling

    • Groups cannot be resized or moved after binding
    • Not suitable for autoscaling GPU resources

Architecture

Components

  1. Scheduler Plugin (deviceshare-scheduler)

    • Custom Kubernetes scheduler plugin
    • Implements Filter and Score extensions
    • Maintains in-memory device tracker for fast lookups
    • Handles group binding and device allocation
  2. Controller (deviceshare-controller)

    • Watches pods with deviceshare.io/group annotation
    • Updates SharedDeviceGroup status with allocated pods
    • Cleans up when pods are deleted
  3. Webhook (deviceshare-webhook)

    • Validates SharedDeviceGroup resources
    • Ensures resource specifications are valid
    • Prevents deletion of groups with active pods
    • Injects NVIDIA_VISIBLE_DEVICES environment variable into pods

Installation

Prerequisites

  • Kubernetes cluster (v1.20+)
  • Nodes with NVIDIA GPUs (or AMD or Ascend) and nvidia-container-runtime (or ascend-docker-runtime) installed
  • cert-manager (for webhook TLS certificates)
    # Install cert-manager if not already installed
    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
    
    # Verify cert-manager is running
    kubectl get pods -n cert-manager
    See cert-manager installation docs for more options.
  • Helm 3

Quick Start

  1. Label GPU nodes:
kubectl label node <node-name> deviceshare.io/mode=shared
  1. Install with Helm:
helm install shared-device-group deploy/helm/shared-device-group \
  --namespace deviceshare-system \
  --create-namespace \
  --set scheduler.image.repository=ghcr.io/sceneryback/deviceshare/scheduler \
  --set controller.image.repository=ghcr.io/sceneryback/deviceshare/controller \
  --set webhook.image.repository=ghcr.io/sceneryback/deviceshare/webhook
  1. Verify installation:
kubectl get pods -n deviceshare-system

You should see:

  • deviceshare-scheduler-* (scheduler)
  • deviceshare-controller-* (controller)
  • deviceshare-webhook-* (webhook)

Usage

1. Create a SharedDeviceGroup

apiVersion: deviceshare.io/v1alpha1
kind: SharedDeviceGroup
metadata:
  name: my-gpu-group
spec:
  resources:
    nvidia.com/gpu: 2  # Claim 2 GPUs
  schedulingStrategy: binpack  # or "spread"

2. Create pods that use the group

apiVersion: v1
kind: Pod
metadata:
  name: workload-1
  annotations:
    deviceshare.io/group: my-gpu-group  # Reference the group
spec:
  schedulerName: deviceshare-scheduler  # Use custom scheduler
  containers:
    - name: cuda-app
      image: nvidia/cuda:12.4.1-base-ubuntu22.04
      command: ["nvidia-smi"]
      # NVIDIA_VISIBLE_DEVICES will be injected automatically

3. Check group status

kubectl get shareddevicegroups

Output:

NAME           PHASE   NODE          AGE
my-gpu-group   Bound   gpu-node-1    5m

4. Verify device allocation

kubectl get shareddevicegroups my-gpu-group -o yaml
status:
  allocatedPods:
    - default/workload-1
  nodeName: gpu-node-1
  phase: Bound
  selectedDevices:
    nvidia.com/gpu: "0,1"  # GPUs 0 and 1 allocated

Configuration

Scheduling Strategies

  • binpack: Prefer nodes with fewer available GPUs (pack workloads together)
  • spread: Prefer nodes with more available GPUs (spread workloads out)

Node Selector

You can constrain which nodes a group can use:

apiVersion: deviceshare.io/v1alpha1
kind: SharedDeviceGroup
metadata:
  name: my-gpu-group
spec:
  resources:
    nvidia.com/gpu: 2
  nodeSelector:
    gpu-type: a100  # Only bind to nodes with this label

Troubleshooting

Pods stuck in Pending

Check if the group is bound:

kubectl get shareddevicegroups

If the group shows no NODE, check scheduler logs:

kubectl logs -n deviceshare-system -l app=deviceshare-scheduler

Common issues:

  • No nodes have the deviceshare.io/mode=shared label
  • All GPUs on available nodes are already allocated to other groups
  • Node selector doesn't match any nodes

Group won't delete

The webhook prevents deleting groups with active pods:

# List pods using the group
kubectl get shareddevicegroups <group-name> -o jsonpath='{.status.allocatedPods}'

# Delete the pods first
kubectl delete pod <pod-name>

# Then delete the group
kubectl delete shareddevicegroups <group-name>

Stale device allocations

If you see "available: 0" errors but know GPUs should be free:

# Restart the scheduler to clear cache
kubectl rollout restart deployment deviceshare-scheduler -n deviceshare-system

Examples

See the examples/ directory for more examples:

  • multi-gpu-group.yaml - Multiple pods sharing 2 GPUs
  • single-gpu-group.yaml - Single GPU shared across pods

Development

Building

# Build all components
make build

# Build specific component
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o bin/scheduler cmd/scheduler/main.go

Testing

# Run unit tests
go test ./...

# Build and deploy to local cluster
make docker-build
make deploy

Security Considerations

⚠️ This project is NOT designed for multi-tenant environments:

  • No RBAC restrictions on SharedDeviceGroup access
  • No resource quotas or limits per namespace/user
  • Any pod can reference any group
  • No audit logging of GPU access

Recommended for:

  • Single-user development environments
  • Trusted internal clusters
  • Personal workstations

NOT recommended for:

  • Production multi-tenant clusters
  • Environments with untrusted users
  • Compliance-sensitive workloads

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

Apache License 2.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors