Skip to content

Conversation

@mresvanis
Copy link
Contributor

@mresvanis mresvanis commented Jan 15, 2026

Description

This PR enables Fabric Manager (FM) configuration for vm-passthrough workloads using the Shared NVSwitch virtualization model.

It enables users to configure the Fabric Manager mode (i.e. FABRIC_MODE=[0,1,2], 0 - full-passthrough, 1 - shared NVSwitch, 2 - vGPU) through the ClusterPolicy CRD, providing better support for NVIDIA multi-GPU systems in virtualized environments.

In the FM shared NVSwitch virtualization model the NVIDIA driver on the host is used for the NVSwitch devices, while the GPU devices are bound to the vfio-pci driver. The goal is for the GPU devices to be passed-through to kubevirt VMs, while the respective fabric is managed on the host.

Depends on / relates to: NVIDIA/gpu-driver-container#538

Changes

ClusetrPolicy API

  • add FabricManagerSpec to the ClusterPolicy CRD with support for two modes:
    • full-passthrough (FABRIC_MODE=0) - default mode.
    • shared-nvswitch (FABRIC_MODE=1) - shared NVSwitch virtualization mode.
  • update all CRD manifests across bundle, config, and deployment directories to include the new Fabric Manager configuration fields.

Controller logic

  • enable driver installation when using vm-passthrough with FM shared NVSwitch mode and pass an env var to the driver container to indicate the selected fabric mode (the driver container is the one configuring and starting the FM).
  • integrate FM configuration checks into the state manager workflow.

Driver state management

  • add logic to detect and handle Fabric Manager shared NVSwitch mode.
  • update driver startup probe behavior for vm-passthrough and FM shared NVSwitch mode case.
  • adjust the driver startup probe to accommodate Fabric Manager requirements in vm-passthrough with shared NVSwitch mode.

Sandbox validator

  • add driver validation as the first init container when FM shared NVSwitch mode.
  • add wait flow to the vfio-pci validation.

VFIO manager

  • wait for driver to be ready when FM shared NVSwitch mode - this step is required because we need a mapping of GPU physical module ID to the respective PCIe address, as FM identifies GPUs by their physical module ID. The latter can be found via nvidia-smi, which requires the driver to be loaded and bound to the GPU devices. Once that's done we can bind the GPU devices to vfio-pci.
  • replace init container with vfio-manage unbind --all when FM shared NVSwitch mode.

Checklist

  • No secrets, sensitive information, or unrelated changes
  • Lint checks passing (make lint)
  • Generated assets in-sync (make validate-generated-assets)
  • Go mod artifacts in-sync (make validate-modules)
  • Test cases are added for new code paths

Testing

TBD

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@LandonTClipp
Copy link

How coincidental that I resolved to implement something like this and 2 hours ago you submitted this draft!

I want to ask what the plan is for the CDI-side. The ideal scenario is that the fabricmanager can be spawned as a Kata container, which means we need to inject the NVSwitch VFIO cdevs just like how we do for passthrough GPUs. When I tried to use GPU operator a few months ago, this was simply not possible at the time so I used libvirt instead. Does the GPU Operator CDI already expose the NVswitches to k8s now? I apologize if my knowledge is a little out of date.

…-passthrough

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
Signed-off-by: Michail Resvanis <mresvani@redhat.com>
Signed-off-by: Michail Resvanis <mresvani@redhat.com>
When clusterPolicy.fabricManager.mode=shared-nvswitch and
workload=vm-passthrough, the vfio-manager now preserves the
NVIDIA driver for fabric management while enabling GPU device
passthrough to VMs.

Changes:
- Modify TransformVFIOManager to detect shared-nvswitch mode.
- Replace driver uninstall init container with device unbind init
  container.
- Use vfio-manage unbind --all to detach devices from nvidia driver.
- Keep nvidia driver loaded for fabric management functionality.
- Add comprehensive unit tests for both normal and shared-nvswitch
  modes.

The new flow for shared-nvswitch mode for the vfio-manager:
1. InitContainer: vfio-manage unbind --all (unbind from nvidia driver)
2. Container: vfio-manage bind --all (bind to vfio-pci)

This enables simultaneous fabric management and VM passthrough capabilities.
Signed-off-by: Michail Resvanis <mresvani@redhat.com>
@mresvanis mresvanis force-pushed the fabric-manager-configuration branch 2 times, most recently from c53ceaa to 70c5d78 Compare January 22, 2026 12:00
Signed-off-by: Michail Resvanis <mresvani@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants