在Ubuntu18.04上安装TensorFlow GPU版本(cuda10.0)

 

注:此文主要用于记录我自己的安装过程,不建议在生产环境下使用,为了自己省心,请使用那些已经非常成熟的方法

 

首先请确认安装了64位版本的python,TensorFlow不支持32位版本

 

1,升级更新系统

sudo apt-get update 
sudo apt-get upgrade

2,验证你的GPU是否支持CUDA

lspci | grep -i nvidia

请结合Nvidia官网自行验证

3,验证你的Linux版本是否受支持

uname -m && cat /etc/*release

查看x86_64一行,表示支持cuda 9.1

4,安装依赖环境

sudo apt-get install build-essential 
sudo apt-get install cmake git unzip zip
sudo apt-get install python-dev python3-dev python-pip python3-pip

5,安装Linux内核头文件

sudo apt-get install linux-headers-$(uname -r)

6,安装 CUDA 10.0

如果之前安装过cuda,先执行卸载

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*

之后进行安装

安装cuda推荐参考Nvidia官网的安装方式

官方安装:网页 (推荐使用的是deb(network)方式,也可以自行选择)

懒得看官网:

对于Ubuntu 16.04:

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list

对于Ubuntu 18.04

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list

两者皆可

sudo apt-get update 
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-0 cuda-drivers

 

7,重启系统载入驱动

8,打开终端

echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashr
source ~/.bashrc
sudo ldconfig
nvidia-smi

检查驱动版本,可能为:396.26

 

如果nvidia-smi无效可能说明你的内核不受支持

你可以通过以下示例检查cuda的安装

cuda-install-samples-10.0.sh ~
cd ~/NVIDIA_CUDA-10.0_Samples/5_Simulations/nbody
make
./nbody

9,安装cuDNN

前往https://developer.nvidia.com/cudnn登录账号后下载相应版本,我写此文时下载的版本是

cuDNN v7.4.2 Library for Linux [ cuda 10.0]

打开终端,前往相应的下载路径

tar -xf cudnn-10.0-linux-x64-v7.4.2.20.tgz #此处自行修改对应文件名
sudo cp -R cuda/include/* /usr/local/cuda-10.0/include
sudo cp -R cuda/lib64/* /usr/local/cuda-10.0/lib64

10,安装NCCL 2.3.5

前往https://developer.nvidia.com/nccl/nccl-download登录账号后下载相应版本

打开终端,前往相应的下载路径

tar -xf nccl_2.3.5-2+cuda10.0_x86_64.txz
cd nccl_2.3.5-2+cuda10.0_x86_64
sudo cp -R * /usr/local/cuda-10.0/targets/x86_64-linux/
sudo ldconfig

11,安装依赖环境
对于未使用虚拟环境

pip install -U --user pip six numpy wheel mock

pip3 install -U --user pip six numpy wheel mock
pip install -U --user keras_applications==1.0.5 --no-deps

pip3 install -U --user keras_applications==1.0.5 --no-deps
pip install -U --user keras_preprocessing==1.0.3 --no-deps

pip3 install -U --user keras_preprocessing==1.0.3 --no-deps

对于使用虚拟环境

pip install -U pip six numpy wheel mock
pip install -U keras_applications==1.0.5 --no-deps
pip install -U keras_preprocessing==1.0.3 --no-deps
12,配置TensorFlow

下载 bazel:

cd ~/
wget https://github.com/bazelbuild/bazel/releases/download/0.17.2/bazel-0.17.2-installer-linux-x86_64.sh
chmod +x bazel-0.17.2-installer-linux-x86_64.sh
./bazel-0.17.2-installer-linux-x86_64.sh --user
echo 'export PATH="$PATH:$HOME/bin"' >> ~/.bashrc

此处排雷:在我的环境下,只有0.17.2版本工作正常,其他更新版本出错,请自行斟酌

刷新环境变量
source ~/.bashrc
sudo ldconfig
开始编译 TensorFlow
cd ~/
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout r1.12
./configure
输入python路径
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: Y

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N

Do you wish to build TensorFlow with ROCm support? [y/N]: N

Do you wish to build TensorFlow with CUDA support? [y/N]: Y

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.4.2
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/local/cuda-10.0
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 2.3.5
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.0] 5.0
Do you want to use clang as CUDA compiler? [y/N]: N
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N

13,通过buzel编译 TensorFlow

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

此处会耗费相当长一段时间,大概3-4小时,同时最初对于网络环境有要求,请自行解决并多次尝试

bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg
安装TensorFlow
cd tensorflow_pkg

对于处在虚拟环境中

pip install tensorflow*.whl

否则重建一个新的虚拟环境
请对应好python版本

14,验证TensorFlow安装

python

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
参考How to install Tensorflow GPU with CUDA 10.0 for python on Ubuntu