Scheduling GPUs for RISC-V Nodes: Kubernetes Patterns for Heterogeneous Hardware
kubernetesgpurisc-v

Scheduling GPUs for RISC-V Nodes: Kubernetes Patterns for Heterogeneous Hardware

oopensoftware
2026-02-07
9 min read
Advertisement

Practical kubelet, device-plugin, and scheduling patterns to run GPU workloads on RISC-V nodes with NVLink in 2026.

Hook: Why scheduling GPUs on RISC-V nodes is suddenly urgent for platform teams

Platform and DevOps teams face a new wave of complexity in 2026: RISC-V servers with NVLink-attached GPUs are moving from research labs into production designs (SiFive announced NVLink Fusion integration in late 2025). That promises lower-cost, open-stack AI infrastructure — but it also introduces heterogeneity that breaks many out-of-the-box Kubernetes GPU workflows. If you run AI/ML or HPC workloads, you need concrete kubelet, device plugin, and scheduling patterns to get predictable performance and safe operations.

What you'll get from this guide

  • Actionable kubelet and node configuration for RISC-V + NVLink GPU nodes
  • Device plugin deployment recipes (DaemonSet + RBAC) for heterogeneous clusters
  • Scheduling patterns using node affinity, taints, topology hints, and resource requests
  • Validation, monitoring, and operational checklists for performance and reliability

By late 2025 and into 2026, two trends reshaped the landscape:

  • SiFive and NVLink Fusion announcements signaled vendor investment in RISC-V CPU + Nvidia GPU interconnects. That means NVLink-attached GPUs can appear on nodes with a RISC-V architecture rather than x86_64—introducing architecture-level scheduling considerations (Forbes coverage, Jan 2026).
  • Device plugin and topology improvements in Kubernetes have matured. Topology Manager, CPU Manager, and device plugins now provide the primitives needed for NUMA- and interconnect-aware allocation if they are configured and the plugin exposes topology hints properly.

High-level strategy

Follow this multi-layered approach:

  1. Expose GPUs and NVLink topology via a vendor device plugin running as a DaemonSet.
  2. Configure kubelet policies (CPU manager, Topology Manager) on RISC-V GPU nodes so resource alignment happens at allocation time.
  3. Schedule pods using node labels/affinity, taints/tolerations, and topology-aware resource requests (MIG-like or GPU-count requests).
  4. Validate with microbenchmarks, GPU peer-to-peer tests, and observability to detect mis-scheduling or NUMA mismatches.

The Kubernetes device plugin framework is architecture-agnostic. But practical deployment requires a vendor runtime and a plugin that:

  • Registers extended resource names (for example nvidia.com/gpu or vendor-specific names like riscv.nvidia.com/gpu).
  • Exposes topology hints (NUMA node, NVLink group IDs) so the Kubelet's Topology Manager can make informed allocations.
  • Supports per-device isolation features such as MIG or partial GPU sharing if available.

DaemonSet template (practical example)

Deploy your device plugin as a DaemonSet. This example is a compact pattern — adapt the container image and mounts to your vendor runtime. The plugin must create a socket under /var/lib/kubelet/device-plugins.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: riscv-gpu-device-plugin
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: riscv-gpu-device-plugin
  template:
    metadata:
      labels:
        name: riscv-gpu-device-plugin
    spec:
      hostNetwork: false
      tolerations:
        - key: "node.kubernetes.io/not-ready"
          operator: "Exists"
          effect: "NoSchedule"
      containers:
        - name: device-plugin
          image: registry.example.com/vendor/riscv-gpu-plugin:2026-01
          securityContext:
            privileged: true
          env:
            - name: KUBELET_SOCKET
              value: "/var/lib/kubelet/device-plugins"
          volumeMounts:
            - name: dev
              mountPath: /dev
            - name: dp-dir
              mountPath: /var/lib/kubelet/device-plugins
      volumes:
        - name: dev
          hostPath:
            path: /dev
        - name: dp-dir
          hostPath:
            path: /var/lib/kubelet/device-plugins

Key points:

  • Run as privileged if driver/userland needs direct device access.
  • Mount the kubelet device plugin directory; the plugin must register via a UNIX socket there.

2) Kubelet configuration: enable topology-aware allocation

RISC-V + NVLink GPU nodes often have complex device topologies (GPUs grouped by NVLink bridges, NUMA domains). Configure kubelet flags to align CPU and device placement:

  • --cpu-manager-policy=static — isolates CPUs for guaranteed pods.
  • --topology-manager-policy=single-numa or best-effort — match devices and CPUs on the same NUMA domain when possible.
  • --topology-manager-scope=node — default works; tune if vendor documentation recommends otherwise.
  • Reserve system and kubelet CPUs using --kube-reserved and --system-reserved and pin the remainder for workloads.

Example systemd drop-in for kubelet

[Service]
Environment="KUBELET_EXTRA_ARGS=--cpu-manager-policy=static \
--topology-manager-policy=single-numa \
--kube-reserved=cpu=500m,memory=1Gi \
--system-reserved=cpu=500m,memory=1Gi"

Tune the reserved values to match your node sizing. The key is to enable the Topology Manager to let the kubelet align CPU allocations with device allocations provided by the plugin.

3) Naming and resource model: extended resources and compatibility

Choose resource names carefully:

  • Use established names like nvidia.com/gpu if the vendor plugin and tooling (container runtimes, metrics exporters) recognize them.
  • For vendor-specific stacks, expose names like riscv.vendor.io/gpu, but provide mappings in your platform's admission controllers or scheduler extensions to simplify pod specs.

Example pod request (2 GPUs):

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
  - name: trainer
    image: registry.example.com/my-ml:latest
    resources:
      limits:
        nvidia.com/gpu: 2

If you use a vendor-specific resource name, substitute accordingly.

NVLink groups GPUs into fast interconnect domains. To get the expected performance you must place pods so that requested GPUs reside within the same NVLink domain or NUMA domain. Use these patterns:

Node labels + required NodeAffinity

Label GPU-capable RISC-V nodes with architecture and NVLink capability:

kubectl label node riscv-01 kubernetes.io/arch=riscv64
kubectl label node riscv-01 gpu.vendor/nvlink=true
kubectl label node riscv-01 gpu.vendor/nvlink-groups=2

Then require those labels via nodeAffinity in the pod spec.

Taints and tolerations for exclusive pools

If some RISC-V nodes have NVLink and others do not, isolate them with a taint and allow only specific workloads to run there:

kubectl taint nodes riscv-01 nvlink=required:NoSchedule

Pods that need NVLink include a toleration and nodeAffinity. This prevents accidental scheduling of non-NVLink jobs on precious NVLink hardware.

Topology Manager + plugin-provided hints

Make sure the device plugin returns topology hints. The Kubelet Topology Manager will attempt to satisfy those hints and align CPUs. This is the most robust mechanism to keep GPUs from being scattered across NVLink groups.

PodGroups and gang-scheduling for multi-GPU multi-pod jobs

For distributed training spanning multiple pods but requiring NVLink adjacency within each node, use Gang-scheduling (via Volcano or Kube-batch) and nodeSelectors to ensure the gang’s pods land on appropriately provisioned nodes.

5) Container runtime and userland compatibility (practical notes)

GPU runtimes historically were x86-focused. By 2026 expect vendor toolchains to provide RISC-V-compatible runtimes, but validate these components:

  • GPU driver kernel modules for your RISC-V kernel tree.
  • Userland libraries and container runtime hooks (nvidia-container-toolkit equivalents) compiled for riscv64.
  • Containerd configuration: ensure the runtime runs unprivileged containers can access /dev and the device nodes that the plugin allocates.

If a vendor plugin requires a custom runtime class, create a RuntimeClass and document it in your platform templates.

6) Validation and testing checklist

Before letting production workloads loose, validate along these axes:

  1. Registration: Check the device plugin socket is registered in the kubelet device plugin directory and that kubectl describe node shows the extended resource.
  2. Topology hints: Use plugin logs to verify the topology hints returned for each GPU device (NUMA IDs and NVLink group IDs).
  3. Allocation behavior: Schedule a pod that requests multiple GPUs and verify they are on the same NVLink group. Use vendor tools or /proc accesses to confirm peer-to-peer access is active.
  4. Performance: Run a small training job or microbench (e.g., NCCL bandwidth test or vendor-provided P2P test) to compare intra-node NVLink bandwidth vs. non-NVLink placements.
  5. NUMA alignment: Confirm CPU pinning and memory locality by inspecting top, numactl, and process CPU maps.

Example validation commands

# Check plugin registration
ls -l /var/lib/kubelet/device-plugins/

# Confirm node reports resource
kubectl describe node riscv-01 | grep -A3 -i "Allocatable"

# Run GPU peer-to-peer check (vendor tool)
kubectl run -it --rm --restart=Never p2p-test --image=registry.example.com/tools/gpu-test:2026 -- /usr/local/bin/p2p_test

7) Observability: what to monitor

Monitor these signals closely:

  • Device plugin health: restart counts, socket liveness
  • GPU utilization and memory: analogous to nvidia-smi; vendor exporters should populate Prometheus metrics
  • Pod placement failures: scheduling events and insufficient resources
  • Topology Manager conflicts: kubelet logs will show conflicts where it cannot meet device + CPU alignment
  • Peer-to-peer errors: driver logs if NVLink is not configured or devices are isolated

8) Common failure modes and fixes

Failure: Pods get scheduled but performance is poor

Likely cause: GPUs not in the same NVLink domain or CPU pinned to remote NUMA. Fixes:

  • Ensure the device plugin provides topology hints and that kubelet Topology Manager is enabled.
  • Set --cpu-manager-policy=static and reserve system CPUs.
  • Prefer nodeAffinity to force placement on NVLink-capable nodes when fine-grained placement is required.

Failure: Device plugin doesn't register on RISC-V nodes

Likely cause: missing runtime/libraries or plugin binary not built for riscv64. Fixes:

  • Confirm plugin image architecture matches the node architecture (riscv64). Use multi-arch manifests or rebuild for riscv64.
  • Verify kernel driver modules and /dev entries exist on the host.

9) Migration & hybrid clusters: handling mixed x86 and RISC-V fleets

Most clusters in 2026 will be heterogeneous. Use these tactics:

  • Label and taint nodes by architecture: kubernetes.io/arch=riscv64 or x86_64.
  • Admission hooks to translate generic GPU requests into architecture-specific resource names when scheduling.
  • Capacity-aware autoscaling: Cluster Autoscaler must recognize extended resources on the RISC-V node groups so scale-up respects GPU resource types.

10) Security and compliance considerations

GPU nodes often run privileged components (device plugin, drivers). Reduce risk:

  • Run device plugins with minimal privileges; only escalate if absolutely required.
  • Use PodSecurity and SELinux/AppArmor where supported on RISC-V kernels.
  • Control who can request GPUs with RBAC, LimitRanges, and admission policies to avoid resource exhaustion.

Also consider regional and data residency constraints when choosing where to run sensitive workloads — see guidance on compliance for cloud teams.

Advanced patterns and future-proofing

Looking toward late 2026 and beyond:

  • Scheduler Extenders / Multi-scheduler can implement fine-grained NVLink-aware placement algorithms if vendor plugins don't expose adequate topology hints.
  • MIG-like virtualization of GPUs will be increasingly supported on RISC-V GPUs — expose fractional resources via the plugin.
  • Hardware-aware autoscaling will let you scale GPU node pools with NVLink capacity constraints in mind (scale nodes that provide contiguous NVLink groups).
  1. Install vendor kernel modules and userland on the RISC-V node image.
  2. Deploy the vendor device plugin DaemonSet (riscv-built image) and confirm registration.
  3. Configure kubelet flags: cpu-manager=static, topology-manager=single-numa, reserve system CPUs.
  4. Label nodes with NVLink capability: gpu.vendor/nvlink=true.
  5. Deploy pod with nodeAffinity and resource limit nvidia.com/gpu: 2 (or vendor-specific name).
  6. Run vendor P2P tests to validate NVLink connectivity and measure baseline performance.

References and further reading

SiFive announced NVLink Fusion integration for RISC-V in late 2025 — this is driving vendor stacks and requires updated device plugin and kubelet approaches (Forbes, Jan 2026).

Final takeaways (actionable)

  • Don't treat RISC-V GPU nodes like regular nodes. They need device plugins built for riscv64, kubelet topology configuration, and explicit scheduling controls.
  • Enable Topology Manager + CPU Manager on GPU nodes to get predictable NVLink performance.
  • Expose topology hints from your device plugin. This is how kubelet ensures GPUs and CPUs are colocated within NUMA/NVLink domains.
  • Use labels, taints, and affinity to protect NVLink resources and control placement.

Call to action

Ready to deploy RISC-V + NVLink GPU nodes in your cluster? Download our checklist, example DaemonSet manifest, and kubelet drop-in (riscv-ready) from the opensoftware.cloud reference repo. If you need hands-on help, contact our engineering team for a targeted review and a 90-minute migration plan tailored to your topology and workloads.

Advertisement

Related Topics

#kubernetes#gpu#risc-v
o

opensoftware

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-13T10:23:51.740Z