GPU Linux虚拟主机GN7型安装配置文档 定稿

  我现在配的虚拟主机缺了颗GPU,一些使用GPU的算法无法在线演示,有点美中不足。网上搜了一圈,腾讯云现在有个推广活动,花点小钱就可以配一个实验了,比较便宜,其它一些厂的个人担负不起,所以在腾讯云上买了一个实例,试用一个月,以完成配置测试的实验。

  完全没有用过Ubuntu,大概需要重装很多次才能搞定。进度很难估计,先用一个月看看。所以要写下每一步的详细文档,以便随时重装。我将安装tensorflow2.6、pytorch1.11.0与HanLP2.1,它们的版本不会冲突。对应的是CUDA11.2与cuDNN8.5,cuDNN8.5适配CUDA11.X(最后改回cuDNN8.1了),Python 3.9。然后会在Rstudio中通过reticulate包、tensorflow包与keras包调用它们。如果有时间,也会测试一下R语言实现的torch包,它提供了类似PyTorch的功能,直接调用libtorch

  腾讯云GPU计算型虚拟主机 GN7,搭载 NVIDIA T4 GPU,8核CPU+32G RAM+100G SSD+1颗T4,带宽5M,¥80/试用一个月,试用计划GPU实验室入门教程

一、从镜像安装操作系统。

  不同的GPU驱动版本,可选的CUDA版本不同,要选460.106.00版。

公共镜像:Ubuntu Server 18.04.1 LTS64位

后台自动安装GPU驱动

GPU 驱动版本:460.106.00

CUDA版本: 11.2.2

cuDNN版本: 8.2.1

用户名: ubuntu

网址 :172.16.XX.XX(内)106.52.XX.XX(公)

  安装完成,用SecureCRT或PuTTY连接,它的SSH服务器启用了更新的密钥交换算法,SecureCRT要升级到9.0版以上。

1、登录机器后,先启用root账户,参阅资料。设置root账户密码:

$sudo passwd root

账户切换:

$su root
#su ubuntu

如果要允许root在SSH登录,参阅资料1资料2

# vi /etc/ssh/sshd_config

找到这一段:

# Authentication:
#LoginGraceTime 2m
#PermitRootLogin prohibit-password
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

改成这样:

# Authentication:
#LoginGraceTime 2m
#PermitRootLogin prohibit-password
PermitRootLogin yes
StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

重启SSH服务器:

# systemctl restart sshd.service

为了方便后面安装软件,关闭sudo命令的PATH限制,参阅资料,用wq!存盘:

# vi /etc/sudoers
Defaults        env_reset
Defaults        mail_badpass
# Defaults      secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"

然后用 -E选项运行sudo命令可以继承当前用户的环境变量设置,这样安装软件也可以不用登录到root用户,比如后面用conda命令安装Python软件包:

(gpu) ubuntu@VM-0-14-ubuntu:~$ sudo -E conda list hanlp
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name                    Version                   Build  Channel
hanlp                     2.1.0b42                 pypi_0    pypi
hanlp-common              0.0.18                   pypi_0    pypi
hanlp-downloader          0.0.25                   pypi_0    pypi
hanlp-trie                0.0.5                    pypi_0    pypi

2、大约需要10~15分钟进行安装,可以用以下命令查看当前安装进程:

root@VM-0-14-ubuntu:~# ps aux | grep -i install
root      8158  0.0  0.0  13776  1156 pts/0    S+   08:50   0:00 grep --color=auto -i install

如上面所示,里面没有nv_driver_install.sh及nv_cuda_install.sh,则表示驱动安装已经完成。

3、验证GPU驱动安装成功。

root@VM-0-14-ubuntu:~# nvidia-smi
Sat Oct 29 08:52:11 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:08.0 Off |                    0 |
| N/A   28C    P8     8W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

4、验证CUDA 安装成功。上面入门教程写的不适用于这个配置组合,/usr/local/cuda是到/usr/local/cuda-11.2的链接。

root@VM-0-14-ubuntu:~# cat  /usr/local/cuda/version.txt
cat: /usr/local/cuda/version.txt: No such file or directory
root@VM-0-14-ubuntu:~# find / -name cuda
/usr/local/cuda-11.2/targets/x86_64-linux/include/cuda
/usr/local/cuda-11.2/targets/x86_64-linux/include/thrust/system/cuda
/usr/local/cuda
root@VM-0-14-ubuntu:~# cd /usr/local/cuda
root@VM-0-14-ubuntu:/usr/local/cuda# ls
bin                DOCS      extras   lib64    nsight-compute-2020.3.1  nsight-systems-2020.4.3  nvvm       README   share  targets  version.json
compute-sanitizer  EULA.txt  include  libnvvp  nsightee_plugins         nvml                     nvvm-prev  samples  src    tools
root@VM-0-14-ubuntu:/usr/local/cuda# cd bin
root@VM-0-14-ubuntu:/usr/local/cuda/bin# ./nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

5、验证cuDNN安装,上面入门教程写的同样不适用,系统从镜像安装cuDNN没有成功。

root@VM-0-14-ubuntu:/usr/local/cuda/bin# cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
cat: /usr/include/cudnn_version.h: No such file or directory

二、手工安装cuDNN,参阅资料

cuDNN下载要登录Nvidia的网站,所以用下面的命令是不行的:

wget https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz

1、在笔记本上下载好,再用SecureFX从SSH端口传到服务器上,解压安装。Linux上验证过的CUDA与cuDNN等的匹配关系参阅该资料

tensorflow-cuda-cudnn-python版本对照表
# tar -xvf cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
# cd cudnn-linux-x86_64-8.5.0.96_cuda11-archive
# cp lib/* /usr/local/cuda/lib64/
# cp include/* /usr/local/cuda/include/
# chmod a+r /usr/local/cuda/lib64/*
# chmod a+r /usr/local/cuda/include/*

2、将CUDA目录加入全局环境变量:

# vi /etc/profile
export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-11.2

3、source /etc/profile使它生效,或者logout再login,验证cuDNN安装:

root@VM-0-14-ubuntu:/usr/local/cuda/bin# source /etc/profile
root@VM-0-14-ubuntu:/usr/local/cuda/bin# echo $PATH
/usr/local/cuda-11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
root@VM-0-14-ubuntu:/usr/local/cuda/bin# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
root@VM-0-14-ubuntu:/usr/local/cuda/bin# cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

三、安装Anaconda

1、下载安装Anaconda,装在/usr/local/anaconda3目录。

$ wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2022.10-Linux-x86_64.sh
$ sudo bash Anaconda3-2022.10-Linux-x86_64.sh

安装完成,选择运行 conda init:

done
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>> yes
modified      /usr/local/anaconda3/condabin/conda
modified      /usr/local/anaconda3/bin/conda
modified      /usr/local/anaconda3/bin/conda-env
no change     /usr/local/anaconda3/bin/activate
no change     /usr/local/anaconda3/bin/deactivate
no change     /usr/local/anaconda3/etc/profile.d/conda.sh
no change     /usr/local/anaconda3/etc/fish/conf.d/conda.fish
no change     /usr/local/anaconda3/shell/condabin/Conda.psm1
no change     /usr/local/anaconda3/shell/condabin/conda-hook.ps1
no change     /usr/local/anaconda3/lib/python3.9/site-packages/xontrib/conda.xsh
no change     /usr/local/anaconda3/etc/profile.d/conda.csh
modified      /root/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup, 
   set the auto_activate_base parameter to false: 

conda config --set auto_activate_base false

Thank you for installing Anaconda3!

===========================================================================

Working with Python and Jupyter is a breeze in DataSpell. It is an IDE
designed for exploratory data analysis and ML. Get better data insights
with DataSpell.

DataSpell for Anaconda is available at: https://www.anaconda.com/dataspell

编辑全局变量脚本,把设置conda环境的脚本加到最后,以便所有用户都可用。

# vi /etc/profile
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/usr/local/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/usr/local/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/usr/local/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/usr/local/anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

运行~/.bashrc使conda base环境生效,或者logout再login。

# source ~/.bashrc

2、root安装tensorflow-gpu 2.6。

# conda create --name gpu python=3.9
# pip install ipykernel
# python -m ipykernel install --user --name gpu
# conda activate gpu
# pip install tensorflow-gpu==2.6

3、ubuntu用户测试安装。

(base) ubuntu@VM-0-14-ubuntu:~$ conda activate gpu
(gpu) ubuntu@VM-0-14-ubuntu:~$ python
Python 3.9.13 (main, Oct 13 2022, 21:15:33) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_built_with_cuda() 
True
>>> a = tf.constant(1.)
2022-10-29 18:14:29.577429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.585025: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.585898: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.587034: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-29 18:14:29.587744: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.588624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.589442: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.245462: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.246301: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.247122: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.247901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13803 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:08.0, compute capability: 7.5
>>> b = tf.constant(2.)
>>> print(a+b)
tf.Tensor(3.0, shape=(), dtype=float32)
>>> 

四、配置Jupyter Notebook

  Jupyter Notebook的安装配置要简单一点,先配起它来验证GPU环境的安装,参阅资料

1、安装Anaconda3时base环境已经安装了Jupyter Notebook,但上面建立的虚拟环境"gpu"里面没有安装,要安装一下,先用conda activate激活环境再装。

(base) root@VM-0-14-ubuntu:~# conda activate gpu
(gpu) root@VM-0-14-ubuntu:~# conda list jupyter
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name                    Version                   Build  Channel
(gpu) root@VM-0-14-ubuntu:~# conda install  jupyter notebook
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /usr/local/anaconda3/envs/gpu

  added / updated specs:
    - jupyter
    - notebook


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    asttokens-2.0.5            |     pyhd3eb1b0_0          20 KB
......
Proceed ([y]/n)? y


Downloading and Extracting Packages
soupsieve-2.3.2.post | 65 KB     | ################################################################################################################################################## | 100% 
......
asttokens-2.0.5      | 20 KB     | ################################################################################################################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Retrieving notices: ...working... done
 

2、为用户ubuntu配置Jupyter Notebook。

1)产生配置文件。

(base) ubuntu@VM-0-14-ubuntu:~$ jupyter notebook --generate-config
Writing default config to: /home/ubuntu/.jupyter/jupyter_notebook_config.py

2)产生登录口令的Hash。

(base) ubuntu@VM-0-14-ubuntu:~$ python
Python 3.9.13 (main, Aug 25 2022, 23:26:10) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from notebook.auth import passwd
>>> passwd()
Enter password: 
Verify password: 
'argon2:$argon2id$v=19$m=10240,t=10,p=xxxxxxxxxxxxxxxxxxx'
>>> 

3、编辑配置文件,拷贝上面登录口令的Hash到配置文件。

$ vi ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.ip='*'                     # 就是设置所有ip皆可访问  
c.NotebookApp.password = 'argon2:$argon2id$v=19$m=10240,t=10,p=xxxxxxxxxxxxxxxxxxx'  # 上面复制的那个sha密文'  
c.NotebookApp.open_browser = False       # 禁止自动打开浏览器  
c.NotebookApp.port =8888                 # 端口
c.NotebookApp.notebook_dir = '/home/ubuntu/jupyternotebook'  #设置Notebook启动进入的目录

4、启动Jupyter Notebook,注意要先激活使用"gpu"环境,用的是它。

(base) ubuntu@VM-0-14-ubuntu:~$ conda activate gpu
(gpu) ubuntu@VM-0-14-ubuntu:~$ conda list jupyter
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name                    Version                   Build  Channel
jupyter                   1.0.0            py39h06a4308_8  
jupyter_client            7.3.5            py39h06a4308_0  
jupyter_console           6.4.3              pyhd3eb1b0_0  
jupyter_core              4.11.1           py39h06a4308_0  
jupyter_server            1.18.1           py39h06a4308_0  
jupyterlab                3.4.4            py39h06a4308_0  
jupyterlab_pygments       0.1.2                      py_0  
jupyterlab_server         2.15.2           py39h06a4308_0  
jupyterlab_widgets        1.0.0              pyhd3eb1b0_1  
(gpu) ubuntu@VM-0-14-ubuntu:~$ jupyter notebook &
[1] 16510
(gpu) ubuntu@VM-0-14-ubuntu:~$ [W 07:53:21.094 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 2022-10-30 07:53:21.326 LabApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'password' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'password' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'notebook_dir' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'notebook_dir' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[I 2022-10-30 07:53:21.333 LabApp] JupyterLab extension loaded from /usr/local/anaconda3/envs/gpu/lib/python3.9/site-packages/jupyterlab
[I 2022-10-30 07:53:21.333 LabApp] JupyterLab application directory is /usr/local/anaconda3/envs/gpu/share/jupyter/lab
[I 07:53:21.337 NotebookApp] Serving notebooks from local directory: /home/ubuntu/jupyternotebook
[I 07:53:21.337 NotebookApp] Jupyter Notebook 6.4.12 is running at:
[I 07:53:21.337 NotebookApp] http://VM-0-14-ubuntu:8888/
[I 07:53:21.337 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

5、浏览器访问,输入上面设置的密码登录,然后新建一个测试的notebook测试GPU环境的安装。

import tensorflow as tf
tf.test.is_built_with_cuda() 
a = tf.constant(1.)
b = tf.constant(2.)
print(a+b)
JupyterNotebook测试tensorflow-gpu安装

6、新建一个测试的notebook测试keras与cuDNN。

import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers,optimizers, datasets
from tensorflow.keras.models import load_model
from matplotlib import pyplot as plt
import numpy as np

# 一、数据集处理
 
# 构建模型
(x_train_raw, y_train_raw),(x_test_raw,y_test_raw) = datasets.mnist.load_data()
print(y_train_raw[0])                                         # 5
print(x_train_raw.shape, y_train_raw.shape)                   # (60000,28,28)6万张训练集
print(x_test_raw.shape, y_test_raw.shape)                     # (10000,28,28)1万张测试集
 
num_classes = 10
y_train= keras.utils.to_categorical(y_train_raw,num_classes)  # 将分类标签变为独热码(onehot)
y_test = keras.utils.to_categorical(y_test_raw,num_classes)
print(y_train[0])                                             # [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 
# 数据可视化,看看测试的数据
plt.figure()
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.imshow(x_train_raw[i])
    plt.axis('off')
plt.show()

# 二、构建并编译全连接神经网络
 
# 编译全连接层
x_train = x_train_raw.reshape(60000,784)                     # 将28*28的图像展开成784*1的向量
x_test = x_test_raw.reshape(10000,784)                       # 将图像像素值归一化0~1
x_train= x_train.astype('float32')/255
x_test = x_test.astype('float32')/255                        
    
model = keras.Sequential([                                   # 创建模型。模型包括3个全连接层和两个RELU激活函数
    layers.Dense(512,activation='relu', input_dim = 784),    # 降维处理
    layers.Dense(256,activation='relu'),
    layers.Dense(124,activation='relu'),
    layers.Dense(num_classes,activation='softmax')
])

# 三、训练网络
 
Optimizer = optimizers.Adam(0.001)
model.compile(loss=keras.losses.categorical_crossentropy,
    optimizer=Optimizer,                                     # Adam优化器 
    metrics=['accuracy']
)
model.fit(x_train,y_train,                                   # 训练集数据标签
    batch_size=128,                                          # 批大小 
    epochs=10,                                               # 训练的轮次
    verbose=1)                                               # 输出日志

# 四、测试模型
 
score = model.evaluate(x_test,y_test,verbose=0)
print('Test loss:', score[0])                                # 损失函数: 0.0853068439
print('Test accuracy:', score[1])                             # 精确度: 0.9767
 
test_loss,test_acc = model.evaluate(x=x_test,y=y_test)
print("Test Accuracy %.2f"%test_acc)                         # 精确度: 0.9   

# 五、保存模型
 
model.save('./final_DNN_mode1.h5')                 # 保存DNN模型

# 六、加载保存的模型
new_model = load_model('./final_DNN_mode1.h5')
new_model.summary()

# 七、CNN 模型测试 -----------------------------------------------------------------------------------------------------

# 将数据扩充维度,以适应CNN模型
X_train=x_train.reshape(60000,28,28,1)
X_test=x_test.reshape(10000,28,28,1)

# 定义卷积神经网络模型
model=keras.Sequential([                                   # 创建网络序列
    layers.Conv2D(filters=32,kernel_size = 5,strides = (1,1), padding ='same',activation = tf.nn.relu,input_shape = (28,28,1)),
                                                             # 添加第一层卷积层和池化层
    layers.MaxPool2D(pool_size=(2,2),strides = (2,2),padding = 'valid'),
                                                             # 添加第二层卷积层和泄化层
    layers.Conv2D(filters=64, kernel_size = 3, strides=(1, 1),padding='same', activation = tf.nn.relu),
    layers.MaxPool2D(pool_size=(2,2),strides = (2,2),padding = 'valid'),
                                                             # 添加dropout层 以减少过拟合
    layers.Dropout(0.25),                     # 随机丢弃神经元的比例    
    layers.Flatten(),
                                                             # 添加两层全连接层
    layers.Dense(units=128,activation = tf.nn.relu),
    layers.Dropout(0.5),
    layers.Dense(units=10,activation = tf.nn.softmax)
])  

# 编译并训练模型
Optimizer = optimizers.Adam(0.001)
model.compile(Optimizer,loss="categorical_crossentropy",metrics=['accuracy'])
model.fit(x=X_train,y=y_train,epochs=5,batch_size=128)       # 轮次为5

# 保存CNN模型
model.save('./final_CNN_model.h5')                  
# 加载保存的模型
new_model = load_model('./final_CNN_model.h5')

# 八、测试数据进行可视化测试
 
# @matplotlib.inline
def res_Visual(n):
    # 参阅 https://blog.csdn.net/yiyihuazi/article/details/122323349
    # keras 2.6删除了predict_classes()函数
    # final_opt_a=new_model.predict_classes(X_test[0:n])        # 通过模型预测测试集
    # 用下面的语句代替
    predicts = new_model.predict(X_test[0:n])
    final_opt_a = np.argmax(predicts, axis=1)
    
    fig, ax = plt.subplots(nrows=int(n/5), ncols=5)
    ax = ax.flatten()
    print('前{}张图片预测结果为:'.format(n))
    for i in range(n): 
        print(final_opt_a[i],end='.')
        if int((i+1)%5)==0:
            print('\t')
 
        # 图片可视化展示
        img = X_test[i].reshape((28,28))                       # 读取每行数据,格式为Ndarry
        plt.axis("off")
        ax[i].imshow(img,cmap='Greys',interpolation='nearest') # 可视化
        ax[i].axis("off")
    print('测试集前{}张图片为:'.format(n))
    
    
res_Visual(20) 

keras要降低版本到2.6.0,否则出错,参阅资料

ImportError: cannot import name 'dtensor' from 'tensorflow.compat.v2.experimental' 

(gpu) root@VM-0-14-ubuntu:~# conda list keras
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name                    Version                   Build  Channel
keras                     2.10.0                   pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
(gpu) root@VM-0-14-ubuntu:~# pip install keras==2.6

  测试程序前面DNN全连接神经网络的部分通过了,后面使用cuDNN的CNN卷积神经网络部分没有通过,cuDNN8.5的版本可能过高,参阅资料。需要降回经过测试确认的8.1版。报错:

OP_REQUIRES failed at conv_ops.cc:1276 : Not found: No algorithm worked!

7、降低cuDNN版本到8.1。笔记本上下载并用SecureFX通过SSH传到服务器上,拷贝并替换cuDNN8.5的文件。

# tar -xvf cudnn-11.2-linux-x64-v8.1.1.33.tgz
# cd cuda
# cp -f lib64/* /usr/local/cuda/lib64/
# cp -f include/* /usr/local/cuda/include/
# chmod a+r /usr/local/cuda/lib64/*
# chmod a+r /usr/local/cuda/include/*

  在全局环境变量中加入下面的设置,否则跑CNN测试时可能会报错说申请的内存过大,导致算法运行失败:

# vi /etc/profile
TF_GPU_ALLOCATOR=cuda_malloc_async

  更新动态链接库的Cache,否则链接不对,重启系统:

# ldconfig -X
# reboot now

  用ubuntu用户登录,激活"gpu"环境并启动Jupyter Notebook:

$ conda activate gpu
$ jupyter notebook &

8、重新运行刚才的notebook测试GPU环境的安装,通过。

1、加载tensorflow识别手写数字例子数据集
2、构建并编译DNN神经网络
3、训练网络
4、测试模型
5、CNN 模型测试
6、测试数据进行可视化

五、安装Pytorch与HanLP

  我在同一个虚拟环境"gpu"中安装Tensorflow、Pytorch与HanLP,因为要跑HanLP2.1,它同时支持两个后端。

1、安装Pytorch。

(gpu) root@VM-0-14-ubuntu:~# conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /usr/local/anaconda3/envs/gpu

  added / updated specs:
    - cudatoolkit=11.3
    - pytorch==1.11.0
    - torchaudio==0.11.0
    - torchvision==0.12.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cudatoolkit-11.3.1         |       h2bc3f7f_2       549.3 MB
......
  torchvision        pytorch/linux-64::torchvision-0.12.0-py39_cu113 None


Proceed ([y]/n)? y


Downloading and Extracting Packages
lame-3.100           | 323 KB    | ################################################################################################################################################## | 100% 
......
######################################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: | By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

done
Retrieving notices: ...working... done

2、安装HanLP。

(gpu) root@VM-0-14-ubuntu:~# pip install hanlp
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting hanlp
......
Successfully built hanlp-common hanlp-trie hanlp-downloader phrasetree
Installing collected packages: toposort, tokenizers, phrasetree, tqdm, regex, pyyaml, pynvml, hanlp-common, filelock, huggingface-hub, hanlp-trie, hanlp-downloader, transformers, hanlp
Successfully installed filelock-3.8.0 hanlp-2.1.0b42 hanlp-common-0.0.18 hanlp-downloader-0.0.25 hanlp-trie-0.0.5 huggingface-hub-0.10.1 phrasetree-0.0.8 pynvml-11.4.1 pyyaml-6.0 regex-2022.9.13 tokenizers-0.11.6 toposort-1.5 tqdm-4.64.1 transformers-4.23.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

安装fasttext,这是HanLP一些Tensorflow预训练模型要用的:

(gpu) root@VM-0-14-ubuntu:~# pip install fasttext
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting fasttext
  Downloading http://mirrors.tencentyun.com/pypi/packages/f8/85/e2b368ab6d3528827b147fdb814f8189acc981a4bc2f99ab894650e05c40/fasttext-0.9.2.tar.gz (68 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 332.3 kB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting pybind11>=2.2
  Using cached http://mirrors.tencentyun.com/pypi/packages/1d/53/e6b27f3596278f9dd1d28ef1ddb344fd0cd5db98ef2179d69a2044e11897/pybind11-2.10.1-py3-none-any.whl (216 kB)
Requirement already satisfied: setuptools>=0.7.0 in /usr/local/anaconda3/envs/gpu/lib/python3.9/site-packages (from fasttext) (65.5.0)
Requirement already satisfied: numpy in /usr/local/anaconda3/envs/gpu/lib/python3.9/site-packages (from fasttext) (1.23.3)
Building wheels for collected packages: fasttext
  Building wheel for fasttext (setup.py) ... done
  Created wheel for fasttext: filename=fasttext-0.9.2-cp39-cp39-linux_x86_64.whl size=299146 sha256=4dee6f6dc5fb53404fb5cbb69c2cc3a2faef7f3af0500567ad49dc01f26d89d7
  Stored in directory: /root/.cache/pip/wheels/ca/08/ee/d0dd871c6c089c4c3971722067bd577f8827c9b4d5d6f2477a
Successfully built fasttext
Installing collected packages: pybind11, fasttext

3、测试PyTorch及HanLP。

  先简单测试下,后面会继续测试。

import torch

print(torch.__version__)
print(torch.cuda.is_available())
pytorch识别出GPU
# 先运行Tensorflow模型再运行PyTorch模型就成功,如果前面先运行过PyTorch模型,这里就会失败。
import hanlp
tokenizer = hanlp.load(hanlp.pretrained.tok.LARGE_ALBERT_BASE)
text = 'NLP统计模型没有加规则,聪明人知道自己加。英文、数字、自定义词典统统都是规则。'
print(tokenizer(text))

# 后面的测试不受运行顺序的影响

import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH) # 世界最大中文语料库
HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。'])

import hanlp
HanLP = hanlp.pipeline() \
    .append(hanlp.utils.rules.split_sentence, output_key='sentences') \
    .append(hanlp.load('FINE_ELECTRA_SMALL_ZH'), output_key='tok') \
    .append(hanlp.load('CTB9_POS_ELECTRA_SMALL'), output_key='pos') \
    .append(hanlp.load('MSRA_NER_ELECTRA_SMALL_ZH'), output_key='ner', input_key='tok') \
    .append(hanlp.load('CTB9_DEP_ELECTRA_SMALL', conll=0), output_key='dep', input_key='tok')\
    .append(hanlp.load('CTB9_CON_ELECTRA_SMALL'), output_key='con', input_key='tok')
HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。')

HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。').pretty_print()

import hanlp

tok = hanlp.load(hanlp.pretrained.tok.COARSE_ELECTRA_SMALL_ZH)
tok(['商品和服务。', '阿婆主来到北京立方庭参观自然语义科技公司。'])

tok_fine = hanlp.load(hanlp.pretrained.tok.FINE_ELECTRA_SMALL_ZH)
tok_fine('阿婆主来到北京立方庭参观自然语义科技公司')

pos = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
pos(["我", "的", "希望", "是", "希望", "张晚霞", "的", "背影", "被", "晚霞", "映红", "。"])
分词与词性标注
管道操作
打印语法树
各种预训练模型模型分词

六、安装配置JupyterHub

  Linux GPU虚拟主机作为科研、开发、测试或生产环境,多用户是很自然的,Jupyter Notebook是单用户的,JupyterHub则提供了一层多用户的代理,让大家可以通过它登录系统,使用各自的Jupyter Notebook或Jupyter Lab,后者是前者的下一代版本。

Jupyterhub统一代理各用户的Jupyterlab,从而实现多用户服务

  根据该帖子,如果曾经运行过Jupyter Notebook,那么它在$HOME/.jupyter下的配置文件会与Jupyterhub要启动的用户Jupyter Lab或Jupyter Notebook Server冲突,导致服务进程不能启动,代理转发失败,这是个BUG?所以如果曾经运行过Jupyter Notebook,像前面那样,要先删除那个目录。这个问题搞了两天,几乎要崩溃,还是stackoverflow给力。

  参阅资料1参阅资料2参阅资料3参阅资料4

1、安装并升级node.js与npm。

# #从软件源获取最新软件列表,更新系统软件
# apt-get update 
# apt-get upgrade
# #安装依赖
# apt install -y npm nodejs

升级node.js,不要安装最新的18版,兼容性有问题,会报错,参阅资料,JupyterHub要求版本10以上,而Ubuntu18安装的是版本8。

##----- 先清除 npm cache
# npm cache clean -f 
##----- 安装 n 模块
# npm install -g n

升级node.js:

root@VM-0-14-ubuntu:~# n 16.18.0    # 指定版本16.18.0
  installing : node-v16.18.0
       mkdir : /usr/local/n/versions/node/16.18.0
       fetch : https://nodejs.org/dist/v16.18.0/node-v16.18.0-linux-x64.tar.xz
     copying : node/16.18.0
   installed : v16.18.0 (with npm 8.19.2)

Note: the node command changed location and the old location may be remembered in your current shell.
         old : /usr/bin/node
         new : /usr/local/bin/node
If "node --version" shows the old version then start a new shell, or reset the location hash with:
hash -r  (for bash, zsh, ash, dash, and ksh)
rehash   (for csh and tcsh)

root@VM-0-14-ubuntu:~# hash -r
root@VM-0-14-ubuntu:~# node -v
v16.18.0
root@VM-0-14-ubuntu:~# npm -v
8.19.2

2、安装configurable-http-proxy。

可以用npm装:

npm install -g configurable-http-proxy

不过推荐用conda装,会把其它依赖包一起装上,它也会安装一个node.js版本11,也可以用,注意要切换并安装到相应的虚拟环境中,这里是"gpu"。

(gpu) root@VM-0-14-ubuntu:~# conda install configurable-http-proxy
(gpu) root@VM-0-14-ubuntu:~# conda list configurable-http-proxy
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name                    Version                   Build  Channel
configurable-http-proxy   4.0.1                   node6_0  
(gpu) root@VM-0-14-ubuntu:~# configurable-http-proxy -V
4.0.1
(gpu) root@VM-0-14-ubuntu:~# 

3、在虚拟环境中安装JupyterHub等。

(gpu) root@VM-0-14-ubuntu:~# conda install jupyter jupyterlab jupyterhub
(gpu) root@VM-0-14-ubuntu:~# conda list jupyter
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name                    Version                   Build  Channel
jupyter                   1.0.0            py39h06a4308_8  
jupyter_client            7.3.5            py39h06a4308_0  
jupyter_console           6.4.3              pyhd3eb1b0_0  
jupyter_core              4.11.1           py39h06a4308_0  
jupyter_server            1.18.1           py39h06a4308_0  
jupyter_telemetry         0.1.0                      py_0  
jupyterhub                2.0.0              pyhd3eb1b0_0  
jupyterlab                3.4.4            py39h06a4308_0  
jupyterlab_pygments       0.1.2                      py_0  
jupyterlab_server         2.15.2           py39h06a4308_0  
jupyterlab_widgets        1.0.0              pyhd3eb1b0_1  

4、配置JupyterHub。

新建目录/etc/jupyterhub,在该目录下新建一个配置文件,编辑文件。

(gpu) root@VM-0-14-ubuntu:~#  mkdir /etc/jupyterhub
(gpu) root@VM-0-14-ubuntu:~# cd /etc/jupyterhub
(gpu) root@VM-0-14-ubuntu:/etc/jupyterhub# jupyterhub --generate-config
Writing default config to: jupyterhub_config.py
(gpu) root@VM-0-14-ubuntu:/etc/jupyterhub# vi  jupyterhub_config.py

内容如下:

# Added by Jean 2022/10/31
c.Authenticator.whitelist = {'ubuntu'}   # 允许使用Jupyterhub的用户列表,逗号分隔。
c.Authenticator.admin_users = {'ubuntu'}  #Jupyterhub的管理员用户列表
c.Spawner.notebook_dir = '/home/{username}'  #浏览器登录后进入用户的主目录
c.Spawner.default_url = '/lab'    # 使用Jupyterlab而不是Notebook
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'

5、用root用户后台启动JupyterHub。

(gpu) root@VM-0-14-ubuntu:/etc/jupyterhub# jupyterhub  -f /etc/jupyterhub/jupyterhub_config.py  &

6、在浏览器中访问,输入的是Linux系统中已有的用户名,网址是http://ip:8000,后面再配SSL加密。

JupyterHub中运行Jupyter Lab

JupyterHub里可以打开终端窗口,执行各种操作,用户的身份就是登录的用户。如果SSH端口被屏蔽,这样就可以通过HTTP端口建立隧道。执行su命令就可以root。

(base) ubuntu@VM-0-14-ubuntu:~$ su --help
Usage: su [options] [LOGIN]

Options:
  -c, --command COMMAND         pass COMMAND to the invoked shell
  -h, --help                    display this help message and exit
  -, -l, --login                make the shell a login shell
  -m, -p,
  --preserve-environment        do not reset environment variables, and
                                keep the same shell
  -s, --shell SHELL             use SHELL instead of the default in passwd

(base) ubuntu@VM-0-14-ubuntu:~$ su --preserve-environment
Password: 
(base) root@VM-0-14-ubuntu:~# 
JupyterHub中打开终端窗口

7、配置SSL加密。

  这是配好后SSL加密连接登录的截图,可以打开网址前面的锁图标看证书链的内容,前面的截图可见,如果是非加密连接,网址前面显示的是“不安全”提示。此处自签的数字证书是签给IP,因为这个虚拟主机还没有申请域名。

用自签证书给JupyterHub建立SSL加密通道

1)先讲讲JupyterHub配置。在配置文件中增加两行指出使用的服务器密钥文件和证书文件即可,后面再讲用openssl自建CA及签发该数字证书。因为是root用户,server.key没有指定访问密码。

# Added by Jean for SSL 2022/03/19
c.JupyterHub.ssl_key = '/root/cert/server.key'
c.JupyterHub.ssl_cert = '/root/cert/server.crt'

重启JupyterHub后,把自建CA的根证书拷出并导入浏览器(后面讲),用https://ip:8000访问即可,如上图所示。

2)自建CA签发自签服务器证书。

参阅资料

(gpu) root@VM-0-14-ubuntu:~# cd /root
(gpu) root@VM-0-14-ubuntu:~# mkdir cert
(gpu) root@VM-0-14-ubuntu:~# cd cert
(gpu) root@VM-0-14-ubuntu:~/cert# mkdir demoCA && cd demoCA
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# mkdir private newcerts
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# touch index.txt
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# echo '01' > serial
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# cd private
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# openssl genrsa -out cakey.pem 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
...............................................................................+++++
....................+++++
e is 65537 (0x010001)
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# openssl req -sha256 -new -x509 -days 3650 -key cakey.pem -out cacert.pem \
>              -subj "/C=CN/ST=GD/L=ZhuHai/O=Jean/OU=Study/CN=RootCA"
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# ls
cacert.pem  cakey.pem
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# cd .. && mv ./private/cacert.pem ./
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# ls
cacert.pem  index.txt  newcerts  private  serial

上面的命令执行了一系列的操作:

A、在root用户的HOME目录/root下新建了/root/cert目录。

B、然后在其下建立了自建CA的目录结构./demoCA,因为openssl默认的配置文件中,建在当前目录的./demoCA目录下。

C、然后产生了CA的密钥cakey.pem。

D、签发了CA的自签数字证书cacert.pem,然后移动到./demoCA目录下。后面自建CA签发服务器证书时会到那里找CA根证书,这是openssl默认的配置。

E、最后列出了demoCA的目录结构。

可以找出openssl默认的配置文件看一下,自建CA在当前目录的./demoCA目录下:

(gpu) root@VM-0-14-ubuntu:~# find / -name openssl.cnf
/usr/lib/ssl/openssl.cnf
/usr/local/anaconda3/pkgs/openssl-1.1.1q-h7f8727e_0/ssl/openssl.cnf
/usr/local/anaconda3/ssl/openssl.cnf
/usr/local/anaconda3/envs/gpu/ssl/openssl.cnf
/usr/local/anaconda3/envs/hub/ssl/openssl.cnf
/etc/ssl/openssl.cnf
(gpu) root@VM-0-14-ubuntu:~# vi /usr/lib/ssl/openssl.cnf
####################################################################
[ ca ]
default_ca      = CA_default            # The default ca section

####################################################################
[ CA_default ]

dir             = ./demoCA              # Where everything is kept
certs           = $dir/certs            # Where the issued certs are kept
crl_dir         = $dir/crl              # Where the issued crl are kept
database        = $dir/index.txt        # database index file.
#unique_subject = no                    # Set to 'no' to allow creation of
                                        # several certs with same subject.
new_certs_dir   = $dir/newcerts         # default place for new certs.

certificate     = $dir/cacert.pem       # The CA certificate
serial          = $dir/serial           # The current serial number
crlnumber       = $dir/crlnumber        # the current crl number
                                        # must be commented out to leave a V1 CRL
crl             = $dir/crl.pem          # The current CRL
private_key     = $dir/private/cakey.pem# The private key
RANDFILE        = $dir/private/.rand    # private random number file

x509_extensions = usr_cert              # The extensions to add to the cert

F、生成服务器证书的密钥与证书请求。

参考帖子1帖子2,要先执行下面的命令产生/root/.rnd文件,否则产生服务器密钥的命令会出错。

openssl rand -out /root/.rnd -hex 256

  切换到./demoCA的父目录/root/cert,然后执行下面的命令产生服务器证书的密钥与证书请求,产生证书请求用配置文件/usr/lib/ssl/openssl.cnf,额外增加了认证的主体别名,Chrome浏览器使用主体别名来检查证书的主体别名与网址是否一致。因为用https://ip访问,这里的主体别名为IP.1:106.52.33.185,表示是该证书认证的第一个IP,还可以有IP.2等等。如果是认证域名,可以是DNS.1 = jeanye.cn等等,如此类推。产生证书请求文件server.csr。

(gpu) root@VM-0-14-ubuntu:~/cert# openssl genrsa -out server.key 2048
(gpu) root@VM-0-14-ubuntu:~/cert# openssl req -new \
>     -sha256 \
>     -key server.key \
>     -subj "/C=CN/ST=GD/L=ZhuHai/O=Jean/OU=Study/CN=106.52.33.185" \
>     -reqexts SAN \
>     -config <(cat /usr/lib/ssl/openssl.cnf \
>         <(printf "[SAN]\nsubjectAltName=IP.1:106.52.33.185")) \
>     -out server.csr

G、签署服务器证书。

  openssl会在默认子目录./demoCA中找到cakey.pem与cacert.pem,按照证书请求文件server.csr的请求,使用配置文件/usr/lib/ssl/openssl.cnf,以及与请求一样的证书扩展(主体别名)签署证书,输出成server.crt。

(gpu) root@VM-0-14-ubuntu:~/cert# openssl ca -in server.csr \
>         -md sha256 \
>     -extensions SAN \
>     -config <(cat /usr/lib/ssl/openssl.cnf \
>         <(printf "[SAN]\nsubjectAltName=IP.1:106.52.33.185")) \
>      -out server.crt
Using configuration from /dev/fd/63
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 1 (0x1)
        Validity
            Not Before: Nov  2 09:47:58 2022 GMT
            Not After : Nov  2 09:47:58 2023 GMT
        Subject:
            countryName               = CN
            stateOrProvinceName       = GD
            organizationName          = Jean
            organizationalUnitName    = Study
            commonName                = 106.52.33.185
        X509v3 extensions:
            X509v3 Subject Alternative Name: 
                IP Address:106.52.33.185
Certificate is to be certified until Nov  2 09:47:58 2023 GMT (365 days)
Sign the certificate? [y/n]:y


1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated
(gpu) root@VM-0-14-ubuntu:~/cert# ls
demoCA  server.crt  server.csr  server.key

H、自建CA根证书导入浏览器。

  把自建CA的根证书/root/cert/demoCA/cacert.pem下载到客户端(比如Win10),在浏览器(比如Chrome)中导入到受信任根证书颁证机构中。

Google浏览器:

  设置->隐私设置和安全性->安全->高级->管理证书->受信任根证书颁证机构->导入->下一步->浏览->所有文件(*.*)

导入自建CA根证书到浏览器受信任根证书颁发机构列表

I、浏览器中输入网址https://106.52.33.185:8000访问,输入用户名/密码登录。

输入用户名/密码登录,启动自己的Jupyter Lab实例

8、配置JupyterHub为开机自启动服务。

1)建立服务配置文件。

先看看conda虚拟环境"gpu"的PATH设置:

(gpu) root@VM-0-14-ubuntu:~# echo $PATH
/usr/local/anaconda3/envs/gpu/bin:/usr/local/anaconda3/condabin:/usr/local/cuda-11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
(gpu) root@VM-0-14-ubuntu:~# 

然后新建一个系统守护进程的配置文件:

(gpu) root@VM-0-14-ubuntu:~# vi /etc/systemd/system/jupyterhub.service

内容如下,几个要点。

A、以root运行。

B、设定PATH路径,因为开机启动进程没有登录的过程,不会执行/etc/profile等设置环境变量,把上面的PATH拷进去。

C、用全路径引用执行jupyterhub。

[Unit]
Description=Jupyterhub service
After=syslog.target network.target

[Service]
User=root
Environment="PATH=/usr/local/anaconda3/envs/gpu/bin:/usr/local/anaconda3/condabin:/usr/local/cuda-11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/local/anaconda3/envs/gpu/bin/jupyterhub -f /etc/jupyterhub/config.py

[Install]
WantedBy=multi-user.target

然后让服务配置文件生效:

(gpu) root@VM-0-14-ubuntu:~# systemctl enable jupyterhub.service

然后可以用下面几个命令来管理服务:

# systemctl status jupyterhub.service
# systemctl start jupyterhub.service
# systemctl stop jupyterhub.service

用下面的命令来查看服务的日志:

(gpu) root@VM-0-14-ubuntu:~# journalctl -u jupyterhub.service -f

上面Jupyterhub的配置文件中,日志也另外输出到以下的文件:

c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'

所以也可以打开日志文件来看。

这样,每次服务器重启,Jupyterhub都会自动启动了。

本篇到此结束,Linux GPU虚拟主机与GPU、Python深度学习运行与开发环境相关的部分就配好了,Rstudio、Shiny等其它部分另起文章。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,088评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,715评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,361评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,099评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 60,987评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,063评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,486评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,175评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,440评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,518评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,305评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,190评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,550评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,880评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,152评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,451评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,637评论 2 335

推荐阅读更多精彩内容