GPU机器安装TensorFlow的注意事项

使用支持gpu的tensorflow的前提是安装了正确版本的CUDA和cuDNN。

关于CUDA和cuDNN的安装可以参考NVIDIA官网和网上各种安装教程，在此不再赘述。本文想要强调的重点是要安装支持自己的GPU的版本，然后根据CUDA版本安装正确版本的cuDNN，最后根据安装的CUDA和cuDNN的版本选择正确的tensorflow版本安装，否则安装了tensorflow但是也无法使用GPU，程序跑起来只是在使用CPU。

1.1 关于CUDA：
tensorflow-gpu 1.5版本及以上要求CUDA版本为9.0
查看本机CUDA版本方法：

$ cat /usr/local/cuda/version.txt
>> CUDA Version 8.0.61

1.2 关于cuDNN：
tensorflow-gpu 1.3及以上版本要求cudnn版本为V6及以上
查看本机cuDNN版本方法：

$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
>>> #define CUDNN_MAJOR      6
#define CUDNN_MINOR      0
#define CUDNN_PATCHLEVEL 21
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

1.3 关于tensorflow
由上可以看出本机看装了CUDA 8和cuDNN V6，根据这两个版本，选择tensorflow的版本为1.2，使用pip来安装tensorflow：

sudo pip install tensorflow-gpu==1.2

如果之前安装了高版本的tensorflow，那么要通过pip来全部删除：

sudo pip uninstall tensorflow

想要测试tensorflow是否可以使用GPU：

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

看到输出相关GPU信息说明GPU可用了：

2018-07-18 11:56:40.180612: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180702: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180721: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180736: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180749: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.406153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-18 11:56:40.406783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Tesla K20m
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:02:00.0
Total memory: 4.94GiB
Free memory: 4.87GiB
2018-07-18 11:56:40.538537: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x43d07e0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-07-18 11:56:40.538896: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-18 11:56:40.539341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties: 
name: Quadro K620
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:03:00.0
Total memory: 1.95GiB
Free memory: 1.34GiB
2018-07-18 11:56:40.539420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2018-07-18 11:56:40.539441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2018-07-18 11:56:40.539462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 
2018-07-18 11:56:40.539477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y N 
2018-07-18 11:56:40.539491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   N Y 
2018-07-18 11:56:40.539538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0)
2018-07-18 11:56:40.539562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1017] Ignoring gpu device (device: 1, name: Quadro K620, pci bus id: 0000:03:00.0) with Cuda multiprocessor count: 3. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0
2018-07-18 11:56:40.614997: I tensorflow/core/common_runtime/direct_session.cc:265] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0

2 查看Nvidia显卡信息及使用情况
2.1 Ubuntu中查看显卡信息：

lspci | grep -i vga

输出：

03:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K620] (rev a2)

2.2 Ubuntu中查看nvidia GPU：

lspci | grep -i nvidia

输出：

02:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K620] (rev a2)
03:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)

输出中的02:00.0、03:00.0和03:00.1是显卡的代号
如果想要查看指定显卡的详细信息可以通过以下命令，这里以第一个显卡为例：

lspci -v -s 02:00.0

输出：

02:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1)
        Subsystem: NVIDIA Corporation Device 1015
        Physical Slot: 2
        Flags: bus master, fast devsel, latency 0, IRQ 100
        Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        Capabilities: <access denied>
        Kernel driver in use: nvidia

2.3 Ubuntu中查看Nvidia的显卡信息和使用情况
Nvidia自带了一个nvidia-smi的命令行工具，会显示显存使用情况：

nvidia-smi

输出：

Wed Jul 18 12:12:07 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 00000000:02:00.0 Off |                  Off |
| N/A   47C    P8    16W / 225W |      1MiB /  5061MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K620         Off  | 00000000:03:00.0  On |                  N/A |
| 34%   44C    P8     1W /  30W |    598MiB /  1995MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1785      G   /usr/bin/X                                   336MiB |
|    1      3321      G   compiz                                       252MiB |
|    1     79153      G   /usr/lib/firefox/firefox                       1MiB |
|    1    119338      G   /usr/lib/firefox/firefox                       1MiB |
|    1    120313      G   /usr/lib/firefox/firefox                       1MiB |
+-----------------------------------------------------------------------------+

表头释义：

Fan：显示风扇转速，数值在0到100%之间，是计算机的期望转速，如果计算机不是通过风扇冷却或者风扇坏了，显示出来就是N/A；
Temp：显卡内部的温度，单位是摄氏度；
Perf：表征性能状态，从P0到P12，P0表示最大性能，P12表示状态最小性能；
Pwr：能耗表示；
Bus-Id：涉及GPU总线的相关信息；
Disp.A：是Display Active的意思，表示GPU的显示是否初始化；
Memory Usage：显存的使用率；
Volatile GPU-Util：浮动的GPU利用率；
Compute M：计算模式；
下边的Processes显示每块GPU上每个进程所使用的显存情况。

2.4 周期性显示GPU的使用情况
有时我们希望不仅知道那一固定时刻的GPU使用情况，我们希望一直掌握其动向，此时我们就希望周期性地输出，比如每 10s 就更新显示。这时候就需要用到 watch命令，来周期性地执行nvidia-smi命令了。

了解watch 的功能

whatis watch

输出：

watch (1)            - execute a program periodically, showing output fullscreen

作用：周期性执行某一命令，并将输出显示。

watch的基本用法是：

watch [options]  command

最常用的参数是 -n，后面指定是每多少秒来执行一次命令。

监视显存：我们设置为每 1s 显示一次显存的情况：

watch -n 5 nvidia-smi

输出：

Every 5,0s: nvidia-smi                                                                                                                              Wed Jul 18 12:20:24 2018

Wed Jul 18 12:20:24 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 00000000:02:00.0 Off |                  Off |
| N/A   43C    P8    16W / 225W |      1MiB /  5061MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K620         Off  | 00000000:03:00.0  On |                  N/A |
| 34%   44C    P8     1W /  30W |    595MiB /  1995MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1785      G   /usr/bin/X                                   336MiB |
|    1      3321      G   compiz                                       248MiB |
|    1     79153      G   /usr/lib/firefox/firefox                       1MiB |
|    1    119338      G   /usr/lib/firefox/firefox                       1MiB |
|    1    120313      G   /usr/lib/firefox/firefox                       1MiB |
+-----------------------------------------------------------------------------+

3 使用指定的GPU
3.1 tensorflow中使用指定的GPU（”CUDA_VISIBLE_DEVICES”）
3.1.1 通过命令行执行Python程序时指定使用的GPU
如果电脑有多个GPU，tensorflow默认全部使用。如果想只使用部分GPU，可以设置CUDA_VISIBLE_DEVICES。在执行python程序时，可以通过：

CUDA_VISIBLE_DEVICES=1 python example.py

以下为一些使用指导：

Environment Variable Syntax      Results

CUDA_VISIBLE_DEVICES=1           Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1         Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES="0,1"       Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3       Devices 0, 2, 3 will be visible; device 1 is masked
CUDA_VISIBLE_DEVICES=""          No GPU will be visible

3.1.2 在Python代码中指定使用的GPU
在Python代码中添加以下内容：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

3.1.3 设置tensorflow使用的显存大小
3.1.3.1 定量设置显存
默认tensorflow是使用GPU尽可能多的显存。可以通过下面的方式，来设置使用的GPU显存：

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

上面分配给tensorflow的GPU显存大小为：GPU实际显存*0.7。
可以按照需要，设置不同的值，来分配显存。

3.1.3.2 按需设置显存
上面的只能设置固定的大小。如果想按需分配，可以使用allow_growth参数

gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

3.2 Pytorch中使用指定的GPU
PyTorch默认使用从0开始的GPU，如果GPU0正在运行程序，需要指定其他GPU。

有如下两种方法来指定需要使用的GPU。

3.2.1 使用CUDA_VISIBLE_DEVICES（类似tensorflow）
3.2.1.1 通过命令行执行Python程序时指定使用的GPU

CUDA_VISIBLE_DEVICES=1 python example.py

3.2.1.2 在Python代码中指定使用的GPU
在Python代码中添加以下内容：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

3.2.2 使用torch.cuda.set_device

import torch
torch.cuda.set_device(id)

该函数见 pytorch-master\torch\cuda__init__.py。

不过官方建议使用CUDA_VISIBLE_DEVICES，不建议使用 set_device 函数。

4 待解决问题
在Python代码中设置了同时使用两个GPU：

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0, 1"

但是在跑程序的时候，出现了只使用一个gpu，另一个gpu被忽略的情况，如下所示：

Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0)
Ignoring gpu device (device: 1, name: Quadro K620, pci bus id: 0000:03:00.0) with Cuda multiprocessor count: 3. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.

貌似是多线程的问题，留待解决。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,980评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,178评论 2赞 380
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,868评论 0赞 336
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,498评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,492评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,521评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,910评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,569评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,793评论 1赞 296
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,559评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,639评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,342评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,931评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,904评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,144评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,833评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,350评论 2赞 342

GPU机器安装TensorFlow的注意事项

推荐阅读更多精彩内容