@@ -71,32 +71,30 @@ GAS supports certain node labels as a means to allow telemetry based GPU selecti
7171descheduling of PODs using a certain GPU. You can create node labels with the
7272[ Telemetry Aware Scheduling] ( ../../telemetry-aware-scheduling/README.md ) labeling strategy,
7373which puts them in its own namespace. In practice the supported labels need to be in the
74- ` telemetry.aware.scheduling.POLICYNAME/ ` namespace, where the POLICYNAME may be anything .
74+ ` telemetry.aware.scheduling.POLICYNAME/ ` [ ^ 1 ] namespace.
7575
76- The node label ` gas-deschedule-pods-GPUNAME ` where the GPUNAME can be e.g. card0, card1, card2...
77- which corresponds to the gpu names under /dev/dri, will result in GAS labeling the PODs which
78- use the named GPU with the ` gpu.aware.scheduling/deschedule-pod=gpu ` label. You may then
79- use with a kubernetes descheduler to pick the pods for descheduling. So TAS labels the node, and
80- based on the node label GAS finds and labels the PODs. Descheduler can be configured to
81- deschedule the pods based on pod labels.
76+ The node label ` gas-deschedule-pods-GPUNAME ` [ ^ 2 ] will result in GAS labeling the PODs which
77+ use the named GPU with the ` gpu.aware.scheduling/deschedule-pod=gpu ` label. So TAS labels the node,
78+ and based on the node label GAS finds and labels the PODs. You may then use a kubernetes descheduler
79+ to pick the pods for descheduling via their labels.
8280
83- The node label ` gas-disable-GPUNAME ` where the GPUNAME can be e.g. card0, card1, card2... which
84- corresponds to the gpu names under /dev/dri, will result in GAS stopping the use of the named
85- GPU for new allocations.
81+ The node label ` gas-disable-GPUNAME ` [ ^ 2 ] will result in GAS stopping the use of the named GPU for new
82+ allocations.
8683
87- The node label ` gas-prefer-gpu=GPUNAME ` where the GPUNAME can be e.g. card0, card1, card2...
88- which corresponds to the gpu names under /dev/dri, will result in GAS trying to use the named
84+ The node label ` gas-prefer-gpu=GPUNAME ` [ ^ 2 ] will result in GAS trying to use the named
8985GPU for new allocations before other GPUs of the same node.
9086
91- Note that the value of the labels starting with gas-deschedule-pods-GPUNAME and
92- gas-disable-GPUNAME doesn't matter. You may use e.g. "true" as the value. The only exception to
87+ Note that the value of the labels starting with ` gas-deschedule-pods-GPUNAME ` [ ^ 2 ] and
88+ ` gas-disable-GPUNAME ` [ ^ 2 ] doesn't matter. You may use e.g. "true" as the value. The only exception to
9389the rule is ` PCI_GROUP ` which has a special meaning, explained separately. Example:
9490` gas-disable-card0=PCI_GROUP ` .
9591
92+ [ ^ 1 ] : POLICYNAME is defined by the name of the TASPolicy. It can vary.
93+ [ ^ 2 ] : GPUNAME can be e.g. card0, card1, card2… which corresponds to the gpu names under ` /dev/dri ` .
94+
9695### PCI Groups
9796
98- If GAS finds a node label ` gas-disable-GPUNAME=PCI_GROUP ` where the GPUNAME can be e.g. card0,
99- card1, card2... which corresponds to the gpu names under /dev/dri, the disabling will impact a
97+ If GAS finds a node label ` gas-disable-GPUNAME=PCI_GROUP ` [ ^ 2 ] the disabling will impact a
10098group of GPUs which is defined in the node label ` gpu.intel.com/pci-groups ` . The syntax of the
10199pci group node label is easiest to explain with an example: ` gpu.intel.com/pci-groups=0.1_2.3.4 `
102100would indicate there are two pci-groups in the node separated with an underscore, in which card0
@@ -105,7 +103,7 @@ find the node label `gas-disable-card3=PCI_GROUP` in a node with the previous ex
105103label, GAS would stop using card2, card3 and card4 for new allocations, as card3 belongs in that
106104group.
107105
108- ` gas-deschedule-pods-GPUNAME ` supports the PCI-GROUP value similarly, the whole group in which
106+ ` gas-deschedule-pods-GPUNAME ` [ ^ 2 ] supports the PCI_GROUP value similarly, the whole group in which
109107the named gpu belongs, will end up descheduled.
110108
111109The PCI group feature allows for e.g. having a telemetry action to operate on all GPUs which
0 commit comments