CUDA_VISIBLE_DEVICES 环境变量说明

先看下来自 Acceleware 的说明：
If you are writing GPU enabled code, you would typically use a device query to select the desired GPUs. However, a quick and easy solution for testing is to use the environment variable CUDA_VISIBLE_DEVICES to restrict the devices that your CUDA application sees. This can be useful if you are attempting to share resources on a node or you want your GPU enabled executable to target a specific GPU.

Environment Variable Syntax	Results
CUDA_VISIBLE_DEVICES=1	Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1	Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES="0,1"	Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3	Devices 0, 2, 3 will be visible; device 1 is masked

CUDA will enumerate the visible devices starting at zero. In the last case, devices 0, 2, 3 will appear as devices 0, 1, 2. If you change the order of the string to “2,3,0”, devices 2,3,0 will be enumerated as 0,1,2 respectively. If CUDA_VISIBLE_DEVICES is set to a device that does not exist, all devices will be masked. You can specify a mix of valid and invalid device numbers. All devices before the invalid value will be enumerated, while all devices after the invalid value will be masked.
To determine the device ID for the available hardware in your system, you can run NVIDIA’s deviceQuery executable included in the CUDA SDK.
什么意思呢？就是说可以通过CUDA_VISIBLE_DEVICES 环境变量来限制CUDA程序所能使用的GPU设备。CUDA应用运行时，CUDA将遍历当前可见的设备，并从零开始为可见设备编号。如果为CUDA_VISIBLE_DEVICES 设置了不存在的设备，所有实际设备将被隐藏，CUDA 应用将无法使用GPU设备；如果设备序列是存在和不存在设备的混合，那么不存在设备前的所有存在设备将被重新编号，不存在设备之后的所有设备将被屏蔽。当前上下文中可见的（重新编号后的）设备可使用CUDA SDK搭配的deviceQuery程序来查看。

举例来看：

Eg.1 在未设置 CUDA_VISIBLE_DEVICES 时运行 deviceQuery 的输出：

Detected 2 CUDA Capable device(s)

Device 0: "Tesla K20c"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.5
  ...
  ...
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla K20c"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.5
  ...
  ...
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla K20c (GPU0) -> Tesla K20c (GPU1) : Yes
> Peer access from Tesla K20c (GPU1) -> Tesla K20c (GPU0) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 2, Device0 = Tesla K20c, Device1 = Tesla K20c
Result = PASS

说明在我的设备上有两块 Tesla K20c显卡，分别安装在 PCI 0/3/0 和 0/4/0 号总线上，Device ID 分别为0、1.

Eg.2 设置 CUDA_VISIBLE_DEVICES=0 时输出：

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K20c"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.5
  ...
  ...
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla K20c
Result = PASS

Eg.3 设置 CUDA_VISIBLE_DEVICES=1 时输出：

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K20c"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.5
  ...
  ...
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla K20c
Result = PASS

Eg.4 设置 CUDA_VISIBLE_DEVICES=1，0 时的输出：

Detected 2 CUDA Capable device(s)

Device 0: "Tesla K20c"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.5
  ...
  ...
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla K20c"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.5
  ...
  ...
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla K20c (GPU0) -> Tesla K20c (GPU1) : Yes
> Peer access from Tesla K20c (GPU1) -> Tesla K20c (GPU0) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 2, Device0 = Tesla K20c, Device1 = Tesla K20c
Result = PASS

Eg.5 设置 CUDA_VISIBLE_DEVICES=2 时的输出：

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

Eg.6 设置 CUDA_VISIBLE_DEVICES=2,1,0 时的输出：

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

Eg.7 设置 CUDA_VISIBLE_DEVICES=1,2,0 时的输出：

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K20c"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.5
  ...
  ...
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla K20c
Result = PASS

注意查看输出中设备的PCI_BUS_ID.
可见设置 CUDA_VISIBLE_DEVICES 环境变量确实使得指定设备对CUDA应用可见，可见设备的指定顺序直接影响可见设备重索引的顺序（这一点可以通过设备所在总线号来判别 Device PCI Domain ID / Bus ID / location ID），指定不存在的设备将会隐藏所有设备，存在设备和不存在设备混合指定，首个不存在设备前的存在设备将视为可见设备被重新索引，不存在设备后的存在设备将被隐藏。

在实际使用中，通过灵活设置 CUDA_VISIBLE_DEVICES 环境变量可为CUDA应用分配相应的硬件资源。
注：CUDA_VISIBLE_DEVICES 指定的设备号可能与nvidia-smi给出的设备号不一致，详见CUDA_DEVICE_ORDER 环境变量说明，这种不一致性就可能导致本打算分配一个空闲设备（根据nvidia-smi数据）给CUDA应用结果分配的却是正在使用中的设备的情况。

注意：对于在代码内通过代码修改可见设备的情况，只有在代码访问GPU设备之前设置CUDA_VISIBLE_DEVICES变量才有效。 如果你模型保存之前没有转换到CPU上，那么模型重加载的时候会直接加载到GPU设备中，具体加载到哪个设备依赖于模型的device属性，一般默认为 cuda:0，即加载到系统的第一块显卡。如果我们在重加载模型前设置CUDA_VISIBLE_DEVICES，就能起到设备屏蔽的左右，而如果是模型重加载完后才设置 CUDA_VISIBLE_DEVICES，设置无效，因为GPU设备已经被访问了。