Ubuntu 16.04下1.5版TensorFlow-gpu升级记录

使用pip升级TensorFlow时,发现TF已经升级至1.5版本。以下记录更新步骤 系统信息:

1
2
Ubuntu 16.04 LTS x86_64
Python 3.5.4 :: Anaconda custom (64-bit)
# 1、更新TensorFlow 如果直接使用pip install -U升级,会出现futures的错误:
1
2
3
Collecting futures>=3.1.1 (from tensorflow-tensorboard<1.6.0,>=1.5.0->tensorflow)
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1f/9e/7b2ff7e965fc654592269f2906ade1c7d705f1bf25b7d469fa153f7d19eb/futures-3.2.0.tar.gz
Unknown requires Python '>=2.6, <3' but the running Python is 3.5.4
解决办法先安装futures的3.1.1版本,然后安装TensorFlow 1.5。注意不能使用'-U'参数更新:
1
2
pip install futures==3.1.1
pip install tensorflow-gpu==1.5.0
安装完成执行以下测试语句,报错说找不到libcublas.so.9.0
1
2
import tensorflow as tf
print(tf.__version__)
错误:
1
2
ImportError: libcublas.so.9.0: cannot open shared object file: 
No such file or directory

2、更新cuda 9cudnn 7

(1) 下载以下两个文件至本地:

cuda-repo-ubuntu1604_9.0.176-1_amd64.deb 7fa2af80.pub

(2) 执行以下两个命令:

1
2
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-key add ./7fa2af80.pub

nvidia文档说明如下:

1
2
sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub

(3) 给apt-get设置代理:

网址http://developer.download.nvidia.com/无法通过ipv6访问,设置代理:

1
sudo vi /etc/apt/apt.conf
添加代理服务器配置后保存:
1
2
Acquire::http::Proxy "http://127.0.0.1:8122";
Acquire::https::Proxy "http://127.0.0.1:8122";

(4) 执行更新动作

1
sudo apt-get update

(5) 列出需要更新的软件版本

1
sudo apt-cache policy cuda

命令格式:

1
sudo apt-cache policy <package name>
输出:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
cuda:
已安装:8.0.61-1
候选: 9.1.85-1
版本列表:
9.1.85-1 500
500 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages
9.0.176-1 500
500 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages
100 /var/lib/dpkg/status
*** 8.0.61-1 500
500 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages
8.0.44-1 500
500 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages

(6) 选取指定的版本

1
sudo apt-get update cuda=9.0.176-1

(7) 创建软链接并验证安装

创建软链接:

1
2
cd /usr/local/
sudo ln -s cuda-9.0 cuda
安装完成之后,使用以下命令验证安装情况,注意,需要设定环境变量(可在~/.bashrc中添加):
1
2
3
4
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

显示驱动基本情况,执行nvidia-smi,输出:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Sun Feb  4 11:36:36 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12 Driver Version: 390.12 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 19C P5 26W / 250W | 0MiB / 11176MiB | 2% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
执行nvcc -V,输出:
1
2
3
4
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
## (8) 安装cudnn 然后进入选定目录,执行以下命令:
1
2
3
4
5
6
wget -c http://developer.download.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-linux-x64-v7.tgz
tar -zxvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

(9) 更新完成后,恢复原有的配置

去除apt-get代理:

1
sudo mv /etc/apt/apt.conf /etc/apt/apt.conf.with_proxy
去除nvidia更新源,将cuda.list的内容注释掉:
1
sudo vi /etc/apt/sources.list.d/cuda.list

(10) 如果以后需要更新,则将第(9)步的内容恢复即可。

参考文档

[1] Failed install on Windows
[2] nvidia文档
[3] Configure proxy for APT?
[4] How to install specific version of some package
[5] 深度学习服务器环境配置