Tensorflow 模型训练

Tensorflow 环境搭建

Windows GPU 版安装

依赖软件包

Tensorflow 1.5.0/1.6.0
Cuda v9.0
cudnn v7.0.5 for cuda 9.0

cuDNN v7.0.5 解压后将文件(bin、include、lib)拷贝到 CUDA 安装目录（NVIDIA GPU Computing Toolkit/CUDA/v9.0）下

各个版本需要保持一致，不然会存在版本不一致问题，注意选择正确的系统版本

python 环境安装（训练环境/开发环境）

训练环境建议安装 Anaconda , 它是一个流行的进行数据科学研究的 python 平台，预安装了很多库，可以很方便的管理多个版本的 python 环境，实现 python 环境的自由切换

Tensorflow 底层使用了 gRPC 框架，使用 Protocol Buffers 数据交换协议，protoc 工具是一个编译器，可以很方便将 proto 协议文件编译成供多个语言版本使用

此处使用 3.4.0 版本，新版本编译命令可能不同，为避免后续出现错误，可以直接使用 3.4.0 版本

安装

下载Anaconda并安装
配置环境变量 安装目录\Anaconda3;安装目录\Anaconda3\Scripts;安装目录\Anaconda3\Library\bin; 到 path（系统环境变量）中；
配置国内源

  # 添加Anaconda的TUNA镜像
  conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  # 设置搜索时显示通道地址
  conda config --set show_channel_urls yes

安装 python 环境

#查看系统当前已有的Python环境，
conda info --envs
#安装指定版本的 python 环境
conda create --name py35 python=3.5
#切换 python 环境
activate py35
#切回原来的Python环境
deactivate py35
#删除环境
conda remove --name py35 --all

python 3 环境下的 Tensorflow 安装

install Tensorflow

# For CPU
pip install tensorflow==1.6
# For GPU
pip install tensorflow-gpu==1.6

users can install dependencies using pip:

pip install Cython
pip install pillow
pip install lxml
pip install jupyter
pip install matplotlib

模型训练项目的编译准备

Protobuf Compilation

protoc object_detection/protos/*.proto --python_out=.

Add Libraries to PYTHONPATH

1. 在你的Anaconda3安装路/Anaconda3/Lib/site-packages 下新建一个txt文件 
（我这里的安装路径是C:\ProgramData\Anaconda3\Lib\site-packages）；如果安装有其他 python 环境，则在对应的环境目录（Anaconda3\envs\py35\Lib\site-packages）下新建一个txt文件 。

2. 在新建的txt文件中写入自己对应的 Tensorflow object_detection 工程的目录路径：
F:\project\project
F:\project\project\slim

3. 将文件名改为 tensorflow_model.pth (注意这里的后缀一定要以pth结尾）

Testing the Installation

#From tensorflow/models/research/
python object_detection/builders/model_builder_test.py

模型训练

样本标注

使用 label_images 工具用于标记图片，生成 Pascal voc 格式的标注文件

生成 tensorflow 支持的 tfrecord 文件

工作目录结构

|- template
|  |- annotations (标注文件)
|  |- images （样本图片）
|  |- label_maps
|  |  |- *.pbtxt （标注映射文件，id 从 1 开始）

脚本工具 - tfrecord_util.py 【python 3 环境】


import os
import io
import tensorflow as tf

from PIL import Image

from object_detection.utils import dataset_util
from object_detection.utils import label_map_util
from collections import namedtuple
import glob
import pandas as pd
import xml.etree.ElementTree as ET


current_path = 'template所在目录'
train_path = os.path.join(current_path, "template")
# 图片标注文件目录
annotations_dir = os.path.join(train_path, "annotations")
# 图片目录
images_path = os.path.join(train_path, "images")
# 映射文件
labels_path = os.path.join(train_path, "label_maps")
labels_file = os.path.join(labels_path, "mscoco_label_map.pbtxt")
# csv 文件(全路径)
csv_file = os.path.join(train_path, "temp_csv_name.csv")
# record 文件(全路径)
tf_record_file = os.path.join(train_path, "tf_record_file.record")
# ---------------------------------------------------------------------- xml operator

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            # if member[0].text != 'a_hn_101':
            #     continue

            file_path = root.find('path').text
            filename = file_path.split("/")[-1].split("\\")[-1]
            value = (filename,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )

            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

# ---------------------------------------------------------------------- tfrecord operator


classes_num = 100

label_map = label_map_util.load_labelmap(labels_file)
print("success loading label map file["+str(labels_file)+"]")
# print('\n-------------label_map------------------\n')
# print(label_map)
# categories array [{'id':id,'name':name},···]
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=classes_num, use_display_name=True)
# category_index  dic  {id : {'id':id,'name':name}, ···}
# category_index = label_map_util.create_category_index(categories)

# category_index  dic  {name : {'id':id,'name':name}, ···}
category_index = {}
for cat in categories:
    category_index[cat['name']] = cat
print(category_index)
print("success generating categories dic")


def class_text_to_int(row_label):
    if row_label in category_index.keys():
        # print(str(category_index[row_label]['id']))
        return category_index[row_label]['id']
    else:
        # print(row_label)
        return 0


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):

    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    # image_format = b'jpg'
    if image.format != 'JPEG':
        print(group.filename)
        raise ValueError('Image format not JPEG')
    else:
        image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():

        if class_text_to_int(row['class']) == 0:
            print(group.filename)
            # print(row['class'].encode('utf8'))
            continue
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example

# ----------------------------------------------------------------------


def generate_tf_record_file(recreate=True):
    """
    generate the tensorflow record file from label xml files which belongs sample images
    :param recreate:  if create a new record file
    :return:  tf_record_file path
    """
    if recreate:
        # 1. 读取图片标注文件目录下的所有 xml 文件，并转化为 csv 文件
        xml_df = xml_to_csv(annotations_dir)
        xml_df.to_csv(csv_file, index=None)
        print('Successfully converted xml['+str(annotations_dir)+'] to csv['+str(csv_file)+'].')

        print(csv_file)
        # 2. 将 csv 文件转 record 文件
        examples = pd.read_csv(csv_file)
        grouped = split(examples, 'filename')

        writer = tf.python_io.TFRecordWriter(tf_record_file)
        for group in grouped:
            try:
                tf_example = create_tf_example(group, images_path)
            except:
                print(group.filename)
                continue
            writer.write(tf_example.SerializeToString())
        writer.close()
        print('Successfully created the TFRecords: {}'.format(tf_record_file))
        return tf_record_file

    else:
        # TODO - look up the exist file
        return None

def main(_):
    my_tf_record_file = generate_tf_record_file()
    print(my_tf_record_file)

if __name__ == '__main__':
    tf.app.run()

模型训练相关配置

配置文件 faster_rcnn_resnet101.config

# Faster R-CNN with Resnet-101 (v1) configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 23
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 1024
        max_dimension: 1280  
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.6
    first_stage_max_proposals: 400
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.7
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  # fine_tune_checkpoint: "F:/project/project/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
  # from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  #num_steps: 10000
  data_augmentation_options {
    random_adjust_brightness {
      max_delta: 0.1
    }
  }
  data_augmentation_options {
    random_image_scale {
      min_scale_ratio: 0.8
      max_scale_ratio: 1.2
    }
  }
  #data_augmentation_options {
  #  random_crop_to_aspect_ratio {
  #  }
  #}

  #data_augmentation_options {
  #  random_adjust_contrast {
  #      min_delta: 0.5
  #      max_delta: 1.5
  #  }
  #}
  #data_augmentation_options {
  #  random_adjust_saturation {
  #    min_delta: 0.5
  #    max_delta: 1.5
  #  }
  #}
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "D:/Workspace/train_dir/all/tf_record_file_23_3035_20180724.record"
  }
  label_map_path: "D:/Workspace/train_dir/all/mscoco_label_map_23.pbtxt"
  shuffle: true
}

eval_config: {
  # num_examples: 1
  num_visualizations: 200
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 2
  visualization_export_dir: "D:/Workspace/train_dir/all/20180724/eval/exportfrcnn"
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "D:/Workspace/train_dir/all/tf_record_file_23_3035_20180724_eval.record"
  }
  label_map_path: "D:/Workspace/train_dir/all/mscoco_label_map_23.pbtxt"
  shuffle: true
  num_readers: 5
  num_epochs: 1
}

配置文件主要分为 5 个部分：

model ：定义神经网络模型结构，及相关超参数
train_config：训练相关配置
train_input_reader：训练样本输入相关配置
eval_config：模型评估相关配置
eval_input_reader：评估样本输入相关配置

model 部分

num_classes 对应待检测物体的总数（一共有多少个标注样本）
keep_aspect_ratio_resizer.min_dimension、keep_aspect_ratio_resizer.max_dimension 控制样本输入缩放后的大小
feature_extractor.first_stage_features_stride 第一阶段特征提取步长，训练时可以保持 16 不变，如果样本中 sku 比较密集，多是远拍，sku 比较小，16 的情况下的训练效果不佳，可以考虑减小该值为 8
grid_anchor_generator.height_stride、grid_anchor_generator.width_stride 物体框训练时的滑动步长，训练时可以保持 16 不变，如果样本中 sku 比较密集，多是远拍，sku 比较小，如果样本中 sku 比较密集，多是远拍，sku 比较小，16 的情况下的训练效果不佳，可以考虑减小该值为 8
first_stage_nms_iou_threshold 第一阶段框 IOU 阈值，可以适当减小来增大查全率，但相应准确率可能降低，范围 0~1
first_stage_max_proposals 第一阶段选取得推荐框的个数，可以适当增大来增大查全率，但相应准确率可能降低
batch_non_max_suppression.iou_threshold 第二阶段 IOU 阈值，可以适当减小来增大查全率，但相应准确率可能降低，范围 0~1
batch_non_max_suppression.max_detections_per_class 每类样本的最大检测数量
batch_non_max_suppression.max_total_detections 所有样本的最大检测数量

train_config 部分

initial_learning_rate 初始学习率， 0.0003、0.0002都可以
data_augmentation_options 数据增强选项
- random_adjust_brightness 随机调节亮度
- random_image_scale 随机缩放图片大小
- random_crop_to_aspect_ratio 随机裁剪到指定比例大小
  以上几类增强比较常用

train_input_reader 部分

tf_record_input_reader.input_path 指定 tfrecord 文件路径
label_map_path 指定标注映射文件路径
shuffle 是否打乱样本原有顺序，随机输入训练

eval_config 部分

num_visualizations 评估导出图片数量，根据评估输入样本决定，不用太大，主要用于评估结果的可视化
visualization_export_dir 指定评估图片的保存路径

eval_input_reader 部分

tf_record_input_reader.input_path 指定 tfrecord 文件路径
label_map_path 指定标注映射文件路径
shuffle 是否打乱样本原有顺序，随机输入训练
num_epochs 评估样本几次，一般不用改

模型训练

训练：

# object_detection 工程所在目录下，执行如下命令
python object_detection/train.py  --logtostderr --pipeline_config_path=F:/Workspaces/hongniu3sku/train/faster_rcnn_resnet101_20180530.config  --train_dir=F:/Workspaces/hongniu3sku/train/train_data/train/20180530

# pipeline_config_path ：训练配置文件所在路径
# train_dir ： 训练所产生的中间文件保存目录

评估：

# object_detection 工程所在目录下，执行如下命令
python object_detection/eval.py --logtostderr  --pipeline_config_path=F:/Workspaces/hongniu3sku/train/faster_rcnn_resnet101_20180530.config  --checkpoint_dir=F:/Workspaces/hongniu3sku/train/train_data/train/20180530  --eval_dir=F:/Workspaces/hongniu3sku/train/train_data/eval/20180530

# pipeline_config_path ：训练配置文件所在路径
# checkpoint_dir ： 指定训练时所产生的中间文件的保存目录
# eval_dir： 评估时所产生的中间文件保存目录

导出模型：

# object_detection 工程所在目录下，执行如下命令
python object_detection/export_inference_graph.py --input_type image_tensor --pipeline_config_path=F:/Workspaces/hongniu3sku/train/faster_rcnn_resnet101_20180530.config  --trained_checkpoint_prefix=F:/Workspaces/hongniu3sku/train/train_data/train/20180530/model.ckpt-157978  --output_directory=F:/Workspaces/hongniu3sku/train/train_data/export/20180530

# pipeline_config_path ：训练配置文件所在路径
# trained_checkpoint_prefix：指定模型导出使用的中间文件 ，model.ckpt-【数字】 对应导出哪一步的参数到最终模型中
# output_directory：指定模型最终的导出目录

最终导出的文件有：

|- saved_model
|  |- variables
|  |- saved_model.pb   (tensorflow serving 使用的模型文件)
|- checkpoint （检查点临时文件）
|- frozen_inference_graph.pb  （冻结参数的用于推理的图文件）
|- model.ckpt.*  （模型数据，参数、结构等）

建议每次训练后 checkpoint、frozen_inference_graph.pb、model.ckpt.* 都保存，方便后续对该模型进行优化

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 194,088评论 5赞 459
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 81,715评论 2赞 371
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 141,361评论 0赞 319
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,099评论 1赞 263
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 60,987评论 4赞 355
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,063评论 1赞 272
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,486评论 3赞 381
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,175评论 0赞 253
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,440评论 1赞 290
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,518评论 2赞 309
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,305评论 1赞 326
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,190评论 3赞 312
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,550评论 3赞 298
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 28,880评论 0赞 17
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,152评论 1赞 250
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,451评论 2赞 341
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,637评论 2赞 335

Tensorflow 模型训练

Tensorflow 环境搭建

依赖软件包

python 环境安装（训练环境/开发环境）

python 3 环境下的 Tensorflow 安装

模型训练项目的编译准备

模型训练

样本标注

生成 tensorflow 支持的 tfrecord 文件

模型训练相关配置

model 部分

train_config 部分

train_input_reader 部分

eval_config 部分

eval_input_reader 部分

模型训练

推荐阅读更多精彩内容