主机配置:
CPU:E3 1230 V5+
GPU:EVGA GTX1080 8G SC ACX 3.0
内存:DDR4 2133 8G 两根
主板:技嘉X150M-PRO-ECC
选择Ubuntu 16 LTS,因为它是一个长期支持版本,而且我的硬件比较新,可能驱动方面在支持和兼容性上面可能会更好
另外选择这块主板一个原因是M.2的SSD接口,结果兼容性问题很严重,网上很多都在吐槽二次启动问题,没想到中标了,最后放弃了M.2,老老实实用SATA3.0。
概览
- 安装Ubuntu 16.0.4
- 配置系统编译环境
- 编译安装TensorFlow
安装Ubuntu 16.0.4
由于本人是两块硬盘,准备安装双系统。先在第一块硬盘装好win10,然后把下载好的Ubuntu ISO 文件烧写到U盘,修改系统BIOS,把U盘启动顺序设置到第一然后重启,重启完成后根据安装提示一步一步往下走就行,在选择系统语言的步骤最好选择英文,少折腾。
可能会遇到的问题:
- 安装完成重启后黑屏
- 安装完成后登录无法进入桌面
安装完成重启后黑屏
由于我是双系统,在开机后显示引导菜单时候按e按钮进入编辑grub,找到quiet splash,修改为 quiet splash nomodeset,就是在末尾添加nomodeset,然后按F10键引导。如果进入到登录界面,按住ctrl+alt+f1,进入命令行登录,输入用户名密码后,编辑sudo vi /etc/default/grub 文件,找到如下行:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
修改为:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"
保存后,重启
sudo reboot
安装ubuntu16.0.4后无法进入桌面
不要急,按住ctrl+alt+f1,进入命令行登录,然后第一件事,更新source,大局域网,你懂的:)
sudo vi /etc/apt/sources.list
如果你不习惯,或者是linux小白,可以用nano编辑器来修改:
sudo nano /etc/apt/sources.list
添加mirrors.163.com的源,ubuntu 16的代号 xenial
deb http://mirrors.163.com/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-backports main restricted universe multiverse
我个人感觉电信宽带用163的镜像源会更快一点,如果是教育网,可以用中科大的源。
修改完成后保存,apt update,然后upgrade
sudo apt-get update
sudo apt-get upgrade
然后升级内核(安装好后是4.4,建议升级到4.6.7),此步骤可以跳过
先看看内核版本:
uname -r
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6.7/linux-headers-4.6.7-040607_4.6.7-040607.201608160432_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6.7/linux-headers-4.6.7-040607-generic_4.6.7-040607.201608160432_amd64.deb
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6.7/linux-image-4.6.7-040607-generic_4.6.7-040607.201608160432_amd64.deb
sudo dpkg -i linux-*.deb sudo update-grub
sudo reboot now
重启完成后开始安装显卡驱动了。(我这个地方是gtx1080的显卡,选择nvidia-367驱动)
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev
sudo reboot
完成后重启,应该能进入桌面了,电脑分辨率也正常了。(我的带鱼屏2560X1080)
配置系统编译环境
下载安装CUDA 8.0.44(Nvidia下载 或者 百度网盘下载)
sudo sh cuda_8.0.44_linux.run
开始安装后会不断询问安装内容,请一定要注意
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.**?
(y)es/(n)o/(q)uit: n
这个步骤一定要选择no,否者前面最新的显卡驱动就白装了(如果实在不小心踩了这个坑,没关系,把前面步骤的显卡驱动重新装一次,安装前先卸载)
完成后注意看提示,如果有问题可以参考这篇blog(我没遇到)
配置环境变量:
nano ~/.bashrc
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
如果在桌面的Terminal配置环境变量,完成后exit下,再进入让环境变量生效,如果在系统命令行模式,可以手动执行以下上面的export两行命令。
完成后开始安装Cudnn 5.1,官方下载地址 或者 百度网盘地址
下载完成后,解压复制到目录(如果CUDA8.0是默认安装路径,这个地方就不用修改路径了)
tar xvf cudnn-8.0-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
好了,现在输入命令看看是否正常显示显卡信息:
nvidia-smi
然后进入刚刚CUDA安装的sample目录,默认是~/下,然后make,编译完成后输入
./NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release/deviceQuery
应该会正常显示详细设备信息:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8110 MBytes (8504279040 bytes)
(20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1848 MHz (1.85 GHz)
Memory Clock rate: 5005 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
Result = PASS
编译安装TensorFlow
好了,现在开始安装TensorFlow的编译环境了,如果不想自己编译,这里下载我编译好的whl
本文篇幅有点长,所以Bazel安装配置可以看官方手册,点这里传送
官方下载有点慢,可以到这里下载 bazel 0.3.1版本
继续安装
如果你是python 2.7
sudo apt-get install python-numpy swig python-dev python-wheel python-pip
或者是3.x
sudo apt-get install python3-numpy swig python3-dev python3-wheel python3-pip
拉取TensorFlow代码:
git clone https://github.com/tensorflow/tensorflow
切到最新的 r0.11分支
git checkout r0.11
开始配置:
$./configure
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
/usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 6.1
...
...
...
配置完成后,编译GPU版本whl。不想编译的同学可以到这里下载我编译好的whl
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install /tmp/tensorflow_pkg/tensorflow-0.11.0rc0-py2-none-any.whl
到这里就全部完成了,完成后可以跑一下google的测试集验证下,点这里传送
大家安装如果有任何疑问可以给我留言:)