Skip to content

Commit d456835

Browse files
kumara1-inteldependabot[bot]pchand20
authored
Enable gpu passthrough (#182)
Signed-off-by: kumar anand <anand.kumar@intel.com> Signed-off-by: Kumar, Anand <anand.kumar@intel.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: pchandra <prakash1.chandra@intel.com>
1 parent b270497 commit d456835

File tree

12 files changed

+564
-1
lines changed

12 files changed

+564
-1
lines changed

baremetal/config-v3.toml.tmpl

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,13 @@
66
privileged_without_host_devices = true
77
pod_annotations = ["io.katacontainers.*"]
88
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.kata-qemu.options]
9-
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
9+
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
10+
11+
[plugins."io.containerd.nri.v1.nri"]
12+
disable = false
13+
disable_connections = false
14+
plugin_config_path = "/etc/nri/conf.d"
15+
plugin_path = "/opt/nri/plugins"
16+
plugin_registration_timeout = "5s"
17+
plugin_request_timeout = "2s"
18+
socket_path = "/var/run/nri/nri.sock"
Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# GPU Enablement in Trusted Compute Environment
2+
This document provides a step-by-step guide to enable GPU passthrough in a Trusted Compute environment using Kata Containers. It includes instructions for GPU driver management and running a sample application.
3+
## Table of Contents
4+
5+
- [Overview](#overview)
6+
- [Prerequisites](#prerequisites)
7+
- [GPU Driver Management](#gpu-driver-management)
8+
- [Install Trusted Compute Package](#install-trusted-compute-package)
9+
- [Deploy sample application](#deploy-sample-application)
10+
- [Pod Deployment and Verification](#4-pod-deployment-and-verification)
11+
- [Revert GPU Binding](#5-revert-gpu-binding)
12+
13+
## Overview
14+
15+
### What is GPU Passthrough?
16+
17+
GPU passthrough allows a virtual machine to directly access a physical GPU, bypassing the host operating system’s drivers. By leveraging VFIO (Virtual Function I/O), the GPU is securely assigned to the VM, enabling workloads inside Kata Containers to utilize the GPU with minimal overhead and near-native performance. This is essential for compute-intensive applications, such as AI/ML inference or graphics workloads, in a trusted and isolated environment.
18+
19+
## Prerequisites
20+
21+
### Hardware Requirements
22+
23+
- Intel CPU with virtualization support (VT-x and VT-d)
24+
- Intel integrated GPU (tested on Asus PE3000G EN)
25+
- IOMMU enabled in BIOS/UEFI
26+
27+
### Software Requirements
28+
29+
- Kubernetes or K3s cluster
30+
- Trusted Compute version 1.4.7 or newer
31+
- Linux kernel with:
32+
- IOMMU enabled
33+
- VFIO support
34+
- DRM/i915 driver support
35+
36+
## GPU Driver Management
37+
38+
To enable GPU passthrough, unbind the GPU and bind it to the `vfio-pci` device driver.
39+
40+
#### 1. Find GPU PCI Address and Device ID
41+
42+
Identify your GPU's PCI address and device ID:
43+
44+
```bash
45+
$ lspci -nn | grep VGA
46+
# Example output:
47+
# 0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-P [Iris Xe Graphics] [8086:a7a0] (rev 04)
48+
```
49+
50+
#### 2. Record GPU Render Device Paths
51+
52+
Before unbinding the GPU, list the available DRI devices and record their major and minor numbers. These details are required later for pod annotations.
53+
54+
```bash
55+
$ ls /dev/dri/
56+
# Example output:
57+
#card0 renderD128
58+
59+
Take note of the device paths `/dev/dri/card0` and `/dev/dri/renderD128`.
60+
61+
#Record Major and Minor Numbers
62+
For each device, record the major and minor numbers. This information is required when specifying device access in your pod specification.
63+
64+
$ ls -l /dev/dri/card0
65+
$ ls -l /dev/dri/renderD128
66+
67+
#Example output:
68+
crw-rw---- 1 root video 226, 0 ... /dev/dri/card0
69+
crw-rw---- 1 root video 226, 128 ... /dev/dri/renderD128
70+
71+
#Here, `226` is the major number and `0` or `128` is the minor number for each device, respectively.
72+
```
73+
74+
#### 3. Unbind Bind to vfio-pci Device driver
75+
76+
Replace the variables with your actual GPU PCI address and device ID if different from the example.
77+
78+
```bash
79+
$ export GPU_PCI="0000:00:02.0"
80+
$ export GPU_DID="8086 a7a0"
81+
82+
# Unbind the GPU from the i915 driver
83+
$ echo "$GPU_PCI" | sudo tee /sys/bus/pci/drivers/i915/unbind
84+
85+
# Load the vfio-pci module
86+
$ sudo modprobe vfio-pci
87+
88+
# Register the GPU device ID with vfio-pci
89+
$ echo "$GPU_DID" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
90+
91+
# Bind the GPU to vfio-pci
92+
$ echo "$GPU_PCI" | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
93+
```
94+
95+
#### 4. Verify VFIO Binding
96+
97+
```bash
98+
# After binding, verify that the VFIO device files exist:
99+
$ ls -l /dev/vfio
100+
101+
#You should see at least two entries: a control file (usually `vfio`) and a directory or file with a group number (e.g., `1`).
102+
# The group number should correspond to the IOMMU group of your GPU.
103+
#If these files are present, the GPU is successfully bound to VFIO and ready for passthrough.
104+
105+
# Verify driver binding
106+
$ lspci -nnk -d 8086:a7a0
107+
108+
# Output should show:
109+
# Kernel driver in use: vfio-pci
110+
```
111+
112+
## Install Trusted Compute Package
113+
114+
115+
#### 1. Install on Edge Manageability Framework (EMF) Cluster
116+
117+
1. **Set up the EMF cluster:**
118+
Follow the [Edge Infrastructure Setup Guide](https://docs.openedgeplatform.intel.com/edge-manage-docs/3.1/user_guide/set_up_edge_infra/index.html) to prepare your EMF cluster.
119+
120+
2. **Deploy the Trusted Compute package:**
121+
Refer to the [Trusted Compute Package Deployment Guide](https://docs.openedgeplatform.intel.com/edge-manage-docs/3.1/user_guide/package_software/extensions/trusted_compute_package.html#deploy-trusted-compute-package) for deployment instructions.
122+
123+
3. **Access the EMF cluster from your local machine:**
124+
Use the `kubeconfig.yaml` file downloaded from the EMF cluster to configure access from your local environment.
125+
For detailed steps, refer to [Organize Cluster Access with a Kubeconfig File](https://docs.openedgeplatform.intel.com/edge-manage-docs/3.1/user_guide/set_up_edge_infra/clusters/accessing_clusters.html).
126+
127+
#### 2. Install on Standalone System
128+
129+
Follow the [Trusted Compute k3s Installation on Standalone Ubuntu Edge Node](https://github.com/open-edge-platform/trusted-compute/blob/main/docs/trusted_compute_baremetal.md) guide for instructions on installing Trusted Compute on a standalone system.
130+
131+
## Deploy sample application
132+
The GPU passthrough can be verified by deploying [an OpenCL image](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/v0.34.0/demo/intel-opencl-icd) which runs **clinfo** outputting the GPU capabilities (detected by driver installed to the image).
133+
134+
#### 1. Make the image available to the cluster:
135+
Clone & Build image:
136+
```bash
137+
# Use desired release tag or main
138+
$ git clone https://github.com/intel/intel-device-plugins-for-kubernetes -b v0.34.0
139+
$ cd intel-device-plugins-for-kubernetes
140+
$ make intel-opencl-icd
141+
142+
#Tag and push the intel-opencl-icd image to a repository available in the cluster.
143+
$ docker tag intel/intel-opencl-icd:devel <repository>/intel/intel-opencl-icd:latest
144+
$ docker push <repository>/intel/intel-opencl-icd:latest
145+
```
146+
147+
#### 2. Preparing NRI Annotations
148+
149+
To enable GPU passthrough with Trusted compute, you must specify device annotations in your pod specification.
150+
151+
**Steps to Prepare NRI Annotation:**
152+
153+
1. **Identify Required Device Files:**
154+
```bash
155+
#List VFIO devices and IOMMU group:
156+
$ ls /dev/vfio
157+
#Output will show `vfio` and one or more numbers (e.g., `1`). The number is your IOMMU group (`n`).
158+
```
159+
2. **Get Device Major and Minor Numbers:**
160+
161+
```bash
162+
#For each device, run:
163+
$ ls -l /dev/vfio/vfio
164+
$ ls -l /dev/vfio/<n> # Replace <n> with your IOMMU group number
165+
166+
#Example output:
167+
crw-rw-rw- 1 root root 10, 196 ... /dev/vfio/vfio # Major 10, minor 196
168+
crw------- 1 root root 510, 0 ... /dev/vfio/1 # Major 510, minor 0
169+
```
170+
171+
3. **Add Devices to Pod Annotation:**
172+
173+
In your pod spec, under `metadata.annotations`, add each device with its `path`, `type` (`c` for character device), `major`, `minor`, and `file_mode` (e.g., `666` for read/write).
174+
175+
176+
4. **Add the following annotations (required):**
177+
- `default_memory`: Specifies the minimum memory (in MB) allocated to the Kata VM. Our sample application requires at least 4096 MB (4 GB), but you can increase this value based on your application's requirements.
178+
- `pcie_root_port`: Sets the number of PCIe root ports for device hot-plug support. The GPU will be hot-plugged to the VM on a PCIe root port, so the minimum required value for `pcie_root_port` is 1.
179+
- `hot_plug_vfio`: Enables hot-plugging of VFIO devices to the specified PCIe root port.
180+
181+
Refer to the sample application pod specification below for a complete example.
182+
183+
## Pod Specification for sample application
184+
185+
Create a pod specification with proper GPU device annotations:
186+
187+
```yaml
188+
apiVersion: v1
189+
kind: Pod
190+
metadata:
191+
name: intel-opencl-icd
192+
annotations:
193+
io.katacontainers.config.hypervisor.default_memory: "4096" #minimum default_memory requiedment is 4GB
194+
io.katacontainers.config.hypervisor.pcie_root_port: "1" #minimum pcie_root_port is 1
195+
io.katacontainers.config.hypervisor.hot_plug_vfio: "root-port" #gpu is hot_plused to root-port
196+
devices.noderesource.dev/container.intel-opencl-icd: | #intel-opencl-icd is container name
197+
- path: /dev/vfio/vfio
198+
type: c
199+
major: 10
200+
minor: 196
201+
file_mode: 666
202+
- path: /dev/vfio/1
203+
type: c
204+
major: 510
205+
minor: 0
206+
file_mode: 666
207+
- path: /dev/dri/card0
208+
type: c
209+
major: 226
210+
minor: 0
211+
file_mode: 666
212+
- path: /dev/dri/renderD128
213+
type: c
214+
major: 226
215+
minor: 128
216+
file_mode: 666
217+
spec:
218+
runtimeClassName: kata-qemu
219+
restartPolicy: Never
220+
containers:
221+
- name: intel-opencl-icd
222+
image: <repository>/intel/intel-opencl-icd:latest #update based on your repository
223+
imagePullPolicy: Always
224+
resources: # modify as per your requirement
225+
requests:
226+
cpu: 1
227+
memory: "4Gi"
228+
limits:
229+
cpu: 2
230+
memory: "8Gi"
231+
```
232+
233+
## Pod Deployment and Verification
234+
235+
After deploying the `intel-opencl-icd` pod, verify GPU access by checking the pod logs:
236+
237+
```bash
238+
$ kubectl logs intel-opencl-icd
239+
```
240+
241+
If GPU passthrough is successful, the output should display information about the detected GPU and its capabilities, as reported by the `clinfo` tool inside the container.
242+
243+
If you encounter errors or the GPU is not detected, review the previous steps to ensure that the device files are correctly annotated and the GPU is properly bound to the `vfio-pci` driver.
244+
245+
## Revert GPU Binding
246+
247+
After you are done using GPU passthrough, you may want to revert the GPU binding from `vfio-pci` back to the default GPU driver so the host regains access to the GPU.
248+
249+
To restore the GPU to the host, perform the following steps:
250+
251+
```bash
252+
# 1. Unbind GPU from vfio-pci
253+
echo "$GPU_PCI" | sudo tee /sys/bus/pci/drivers/vfio-pci/unbind
254+
255+
# 2. Remove GPU device ID from vfio-pci
256+
echo "$GPU_DID" | sudo tee /sys/bus/pci/drivers/vfio-pci/remove_id
257+
258+
# 3. Rebind GPU to i915 driver
259+
echo "$GPU_PCI" | sudo tee /sys/bus/pci/drivers/i915/bind
260+
261+
# 4. Verify the GPU is back on i915
262+
lspci -nnk -d $GPU_DID
263+
# Output should show:
264+
# Kernel driver in use: i915
265+
266+
This restores the GPU to its original state for use by the host system.
267+
```

helm/trusted-workload/Chart.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,6 @@ dependencies:
3333
- name: cc-runtimeclass
3434
version: 0.1.0
3535
repository: file://./charts/cc-runtimeclass
36+
- name: nri-device-injector
37+
version: 0.1.0
38+
repository: file://./charts/nri-device-injector
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Patterns to ignore when building packages.
2+
/tests
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: BSD-3-Clause
3+
---
4+
apiVersion: v2
5+
name: nri-device-injector
6+
description: A Helm chart for Kubernetes
7+
8+
# A chart can be either an 'application' or a 'library' chart.
9+
#
10+
# Application charts are a collection of templates that can be packaged into versioned archives
11+
# to be deployed.
12+
#
13+
# Library charts provide useful utilities or functions for the chart developer. They're included as
14+
# a dependency of application charts to inject those utilities and functions into the rendering
15+
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
16+
type: application
17+
18+
# This is the chart version. This version number should be incremented each time you make changes
19+
# to the chart and its templates, including the app version.
20+
# Versions are expected to follow Semantic Versioning (https://semver.org/)
21+
version: 0.1.0
22+
23+
# This is the version number of the application being deployed. This version number should be
24+
# incremented each time you make changes to the application. Versions are not expected to
25+
# follow Semantic Versioning. They should reflect the version the application is using.
26+
# It is recommended to use it with quotes.
27+
appVersion: "1.16.0"
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: BSD-3-Clause
3+
---
4+
apiVersion: apps/v1
5+
kind: DaemonSet
6+
metadata:
7+
labels:
8+
app.kubernetes.io/name: nri-plugin-device-injector
9+
name: nri-plugin-device-injector
10+
namespace: kube-system
11+
spec:
12+
selector:
13+
matchLabels:
14+
app.kubernetes.io/name: nri-plugin-device-injector
15+
template:
16+
metadata:
17+
labels:
18+
app.kubernetes.io/name: nri-plugin-device-injector
19+
spec:
20+
securityContext:
21+
runAsNonRoot: true
22+
containers:
23+
- args:
24+
- -idx
25+
- "10"
26+
image: ghcr.io/containerd/nri/plugins/device-injector:v0.10.0
27+
imagePullPolicy: IfNotPresent
28+
name: plugin
29+
resources:
30+
requests:
31+
cpu: 2m
32+
memory: 5Mi
33+
securityContext:
34+
allowPrivilegeEscalation: false
35+
capabilities:
36+
drop:
37+
- ALL
38+
readOnlyRootFilesystem: true
39+
volumeMounts:
40+
- mountPath: /var/run/nri/nri.sock
41+
name: nri-socket
42+
priorityClassName: system-node-critical
43+
volumes:
44+
- hostPath:
45+
path: /var/run/nri/nri.sock
46+
type: Socket
47+
name: nri-socket

0 commit comments

Comments
 (0)