Today I find nvidia-smi
program doesn’t display GPU
devices in order:
$ nvidia-smi -L
GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-...)
GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-...)
GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-...)
GPU 3: Tesla V100-PCIE-16GB (UUID: GPU-...)
The above output displays V100
‘s device ID is 3
. While CUDA-Z display V100
‘s device ID is 0
:
My own lscuda also display it as the 0th
device:
$ ./lscuda
CUDA Runtime Version: 9.1
CUDA Driver Version: 9.1
GPU(s): 4
GPU Device ID: 0
Name: Tesla V100-PCIE-16GB
......
That is deliberate. nvidia-smi uses the order in which GPUs are registered with the driver at boot time. CUDA on the other hand uses an order (for historic reasons) where ID 0 is the best compute GPU in the system. The two orderings can be different. If you want a more consistent order in CUDA, too, you can for example do “export CUDA_DEVICE_ORDER=PCI_BUS_ID” or use the device UUIDs to match between the nvidia-smi and cuda order. See here for more discussion: https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus
Hi Chris,
Got it! Thanks very much for your comments!
Best Regards
Nan Xiao