Linux配置cuda + cudnn深度学习环境

1009人浏览 2024-04-13 17:37:30

1.查看Linux系统版本信息

cat /etc/issue

2.查看显卡型号

lspci |grep -i nvidia

# 如果出现的信息看不出是什么型号，可能是由于驱动问题无法显示显卡的具体型号，如出现类似如下信息：

01:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)

# 当出现这种情况，需要将出现的数字（2204）输入如下网站，可以查询出对应的硬件型号

http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci

3.根据1，2得到的信息，查找合适的驱动版本，并下载。网站如下：

https://www.nvidia.cn/Download/index.aspx?lang=cn#

4.卸载当前显卡驱动，安装新的显卡驱动。

1. 屏蔽开源驱动 nouveau
sudo vim /etc/modprobe.d/blacklist.conf
在文件最后部分插入以下两行内容：
blacklist nouveau
options nouveau modeset=0
2. 删除旧NVIDIA驱动
sudo nvidia-uninstall
sudo apt-get purge nvidia-*
sudo apt-get remove --purge nvidia-\*
3. 重启电脑
sudo reboot
4. 关闭图形界面
sudo service lightdm stop
如果提示 unit lightdm.service not loaded
则先安装LightDm： sudo apt install lightdm
安装完毕后跳出一个界面，选择lightdm，再sudo service lightdm stop
5. 执行安装驱动
sudo chmod a+x NVIDIA....(自行下载的驱动名)
sudo ./NVIDIA.....(自行下载的驱动名)
遇到Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later
选择No
遇到The CC version check failed: ......
则退出安装，选择正确的gcc版本
sudo update-alternatives --config gcc
遇到Nvidia's 32-bit compatibility libraries?
选择No
遇到Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up
选择Yes
6.安装完毕
nvidia-smi

5.查看cuda版本与显卡驱动版本的对应关系

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

6.下载安装cuda。在选择版本时，可以参考用的深度学习框架（如pytorch）支持到cuda哪个版本。

选择runfile格式。

cuda版本选择

runfile安装过程中会提醒是否继续(Existing package manager installation of the driver found. It is strongly recommended that you remove this before continuing.)，选择继续continue，在后续选择安装内容时去除驱动选择。

添加到系统环境变量中：

echo '# CUDA Soft Link' >> ~/.bashrc
echo 'export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc

后测试一下CUDA是否成功安装，运行nvcc -V：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0

7.下载安装cudnn

https://developer.nvidia.com/cudnn

下载之后解压：

tar zxvf + 文件名

然后执行（版本号需要替换）：

sudo cp cuda/include/cudnn.h /usr/local/cuda-10.1/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.1/lib64
sudo chmod a+r /usr/local/cuda-10.1/include/cudnn.h
sudo chmod a+r /usr/local/cuda-10.1/lib64/libcudnn*

8.测试cuda和cudnn是否可用的代码

import torch
print(2.0)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
print(device)
x = torch.Tensor([2.1])
xx = x.cuda()
print(xx)
# CUDNN TEST
from torch.backends import cudnn
print('cudann is ' + str(cudnn.is_acceptable(xx)))

本文分类：Linux
本文链接：https://m.fuwuqi.cn/linux/j7NkpGjygE.html

上一篇 > Linux下memcpy性能测试
下一篇 > 超详细的Linux守护进程介绍和工作原理说明

导航栏目

服务器系统

服务器技术

编程语言

数据库

Linux配置cuda + cudnn深度学习环境

相关推荐