下面记录在ubuntu16.04中,使用自己的图像分类数据集来训练VGGNet-16的整个流程。
第一步:准备自己的图像分类数据集
准备过程参考:caffe数据集格式转换——图像格式到LMDB/LEVELDB。
最终得到了vgg_train_lmdb和vgg_val_lmdb两个文件夹。
在caffe/models/目录下新建一个名为vggnet的文件夹,并将vgg_train_lmdb和vgg_val_lmdb两个文件夹放到caffe/models/vggnet/目录下:
数据集准备完成。
第二步:准备VGGNet-16模型文件vggnet_train_val.prototxt
从caffe官网的Model Zoo中可以找到VGG16的deploy文件:VGG_ILSVRC_16_layers_deploy.prototxt。
可能是该模型文件创建的时间比较早,模型中很多命名方式都不太习惯,比如里面的layer type全部是大写字母,新版的caffe,layer type命名方式一般只是首字母大写;另外还有一些和/caffe/models/目录下提供的caffe模型代码风格不一样,于是自己根据VGGNet论文内容,并结合自己的数据集,将vggnet_train_val.prototxt重新写了一下:
vggnet_train_val.prototxt:
name: "VGGNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
mirror: true
}
data_param {
source: "models/vggnet/vgg_train_lmdb" #注意训练集文件的路径
batch_size: 32 #训练批次大小根据自己的显卡显存而定,我开始设为64导致out of memory,于是改成32
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
mirror: false
}
data_param {
source: "models/vggnet/vgg_val_lmdb" #注意验证集文件的路径
batch_size: 32
backend: LMDB
}
}
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "conv1_1"
top: "conv1_1"
}
layer {
name: "conv1_2"
type: "Convolution"
bottom: "conv1_1"
top: "conv1_2"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1_2"
type: "ReLU"
bottom: "conv1_2"
top: "conv1_2"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1_2"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1"
top: "conv2_1"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2_1"
type: "ReLU"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_2"
type: "Convolution"
bottom: "conv2_1"
top: "conv2_2"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2_2"
type: "ReLU"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2_2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv3_1"
type: "Convolution"
bottom: "pool2"
top: "conv3_1"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3_1"
type: "ReLU"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_2"
type: "Convolution"
bottom: "conv3_1"
top: "conv3_2"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3_2"
type: "ReLU"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv3_3"
type: "Convolution"
bottom: "conv3_2"
top: "conv3_3"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3_3"
type: "ReLU"
bottom: "conv3_3"
top: "conv3_3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3_3"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv4_1"
type: "Convolution"
bottom: "pool3"
top: "conv4_1"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu4_1"
type: "ReLU"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_2"
type: "Convolution"
bottom: "conv4_1"
top: "conv4_2"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu4_2"
type: "ReLU"
bottom: "conv4_2"
top: "conv4_2"
}
layer {
name: "conv4_3"
type: "Convolution"
bottom: "conv4_2"
top: "conv4_3"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu4_3"
type: "ReLU"
bottom: "conv4_3"
top: "conv4_3"
}
layer {
name: "pool4"
type: "Pooling"
bottom: "conv4_3"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv5_1"
type: "Convolution"
bottom: "pool4"
top: "conv5_1"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu5_1"
type: "ReLU"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "conv5_2"
type: "Convolution"
bottom: "conv5_1"
top: "conv5_2"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu5_2"
type: "ReLU"
bottom: "conv5_2"
top: "conv5_2"
}
layer {
name: "conv5_3"
type: "Convolution"
bottom: "conv5_2"
top: "conv5_3"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu5_3"
type: "ReLU"
bottom: "conv5_3"
top: "conv5_3"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5_3"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
inner_product_param {
num_output: 4096
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
}
param {
lr_mult: 1
}
inner_product_param {
num_output: 4096
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10 #注意将fc8层改成自己的图像类别数目
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
第三步:准备VGGNet-16模型求解器文件:vggnet_solver.prototxt
这一步比较简单,参考其他模型的solver文件即可:
vggnet_solver.prototxt:
net: "models/vggnet/vggnet_train_val.prototxt"
test_iter: 10
test_interval: 500 #每经过500次训练,进行一次验证查看accuracy
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 1000
display: 20
max_iter: 2000 #只是做做练习,2000次迭代就够了
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000 #每经过1000次迭代训练保存一次快照
snapshot_prefix: "models/vggnet/vggnet_train"
solver_mode: GPU
第四步:准备VGGNet-16模型训练shell脚本
编写train_vggnet.sh脚本文件。
train_vggnet.sh:
#!/usr/bin/env sh
set -e
./build/tools/caffe train \
--solver=models/vggnet/vggnet_solver.prototxt $@
所有的准备工作完成!当然你得先配置好caffe-GPU环境。现在/caffe/models/vggnet/目录下有这些文件:
第五步:执行训练
在caffe/目录下打开bash,执行以下命令:
bash ./models/vggnet/train_vggnet.sh
下面是训练过程中打印出的验证集的Accuracy情况:
I0316 22:12:22.387151 12448 solver.cpp:351] Iteration 0, Testing net (#0)
I0316 22:12:23.484271 12448 solver.cpp:418] Test net output #0: accuracy = 0.10625
...
I0316 22:15:08.977102 12448 solver.cpp:351] Iteration 500, Testing net (#0)
I0316 22:15:10.174433 12448 solver.cpp:418] Test net output #0: accuracy = 0.340625
...
I0316 22:17:58.450613 12448 solver.cpp:351] Iteration 1000, Testing net (#0)
I0316 22:17:59.427794 12448 solver.cpp:418] Test net output #0: accuracy = 0.359375
...
I0316 22:20:45.280406 12448 solver.cpp:351] Iteration 1500, Testing net (#0)
I0316 22:20:46.432967 12448 solver.cpp:418] Test net output #0: accuracy = 0.459375
...
I0316 22:23:34.968350 12448 solver.cpp:351] Iteration 2000, Testing net (#0)
I0316 22:23:35.955927 12448 solver.cpp:418] Test net output #0: accuracy = 0.51875
最后:结果评估和总结
从打印出的Accuracy结果来看,模型顺利地开始训练,并且随着训练次数的增多,模型的top-1准确率在慢慢提升。说明整个分类模型训练的准备流程是没问题的。
实际上,由于训练数据实在很少,只准备了ImageNet2012图像分类数据集大小的1/100,加上VGGNet-16差不多1.4亿的模型参数,不难猜想模型肯定会出现严重的过拟合现象,不过没关系,这次练习的重点是掌握使用自己的图像数据集来完成图像分类模型训练的所有流程。
总结一下,整个分类模型包含以下步骤:
- 第一步:准备自己的图像分类数据集。准备数据集的格式是按照ImageNet竞赛图像分类项目中的数据集格式来制作的。
- 第二步:准备VGGNet-16模型文件vggnet_train_val.prototxt。如果你的显卡显存有限,完全可以选择/caffe/models/中官方提供的几个分类模型如AlexNet或者GoogLeNet,甚至使用LeNet都行。只要注意数据层和最后分类层的参数修改。
- 第三步:准备VGGNet-16模型求解器文件:vggnet_solver.prototxt。求解器参数可以适当改小一些,只是做练习,没必要花太多的时间来训练模型。
- 第四步:准备VGGNet-16模型训练shell脚本。这一步完全可以省略,看个人习惯了,直接在bash中输入以下命令来执行模型训练:
./build/tools/caffe train \
--solver=models/vggnet/vggnet_solver.prototxt
- 第五步:执行训练。