Commit ef4f40c
Device Plugin Resource Naming Strategy
The resource_naming_strategy is a new flag which can be passed to the
device plugin daemonset. The supported values for the flag are
"single" and "mixed"
Terms to understand before viewing the changes in this commit:
Homogeneous Node:
If all GPUs in a node are following the same compute and memory
partition style, the node is considered homogeneous
Heterogeneous Node:
If the GPUs on a node have different different compute and memory
partition styles, the node is considered heterogeneous (Put simply,
if node is not homogeneous)
Behaviour of Resource Naming Strategy in different node types:
Homogeneous Node:
-> If node is homogeneous and resource naming strategy is "single",
one plugin is started using the DevicePluginManager with the last
name as “gpu”.
If node is homogeneous and resource naming strategy is "mixed",
one plugin is started using the DevicePluginManager with the last
name as the partition style present on the node.
-> The ListAndWatch function remains almost the same as it was before.
It reports resources under a single resource name(the name will
either be "gpu" or the partition style present on the node(cpx_nps1)
depending on strategy)
Heterogeneous:
-> If node is heterogeneous and resource naming strategy is "mixed", we
invoke the DevicePluginManager to start multiple plugins for
different partitionTypes under the names “spx-nps1, “cpx-nps1”, etc.
We use the devicesCount map to start plugins for the partitionTypes
that are present in the map
-> ListAndWatch sends the devices to the plugin for their respective
resource type depending on its partitionType. Each device has
computePartition and memoryPartition fields in its object as shown
before, which is used to identify which plugin to report the
resource under. (amd.com/spx-nps1,amd.com/cpx-nps1, etc..)
Note:
-> If node is heterogeneous, "single" strategy is not supported as
multiple resource types getting reported under a single resource
name wouldn't be mathematically accurate as to how many true gpus of
each type there are
-> For nodes where partitioning is not supported(MI200), irrespective
of strategy, the resources will get reported under "amd.com/gpu"
-> If the flag is not set by user, default value is "single". This is
to maintain backwards compatibility with older resource name before
strategy was introduced (amd.com/gpu)1 parent 4e0e694 commit ef4f40c
File tree
2 files changed
+120
-14
lines changed- cmd/k8s-device-plugin
- docs/user-guide
2 files changed
+120
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
36 | 54 | | |
37 | 55 | | |
38 | 56 | | |
39 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
40 | 62 | | |
41 | | - | |
42 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
43 | 78 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
50 | 87 | | |
51 | 88 | | |
52 | 89 | | |
53 | | - | |
| 90 | + | |
54 | 91 | | |
55 | 92 | | |
56 | 93 | | |
| |||
68 | 105 | | |
69 | 106 | | |
70 | 107 | | |
| 108 | + | |
71 | 109 | | |
| 110 | + | |
72 | 111 | | |
73 | 112 | | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
74 | 118 | | |
75 | 119 | | |
76 | 120 | | |
| |||
96 | 140 | | |
97 | 141 | | |
98 | 142 | | |
99 | | - | |
100 | | - | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
101 | 151 | | |
102 | 152 | | |
103 | 153 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
139 | 140 | | |
140 | 141 | | |
141 | 142 | | |
142 | | - | |
| 143 | + | |
143 | 144 | | |
144 | | - | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
145 | 195 | | |
146 | 196 | | |
147 | 197 | | |
148 | 198 | | |
149 | 199 | | |
150 | 200 | | |
151 | 201 | | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
152 | 208 | | |
153 | 209 | | |
154 | 210 | | |
| |||
0 commit comments