Faster RCNN 和SSD的常用trick (一)

近期计划对Faster RCNN、SSD的一系列trick进行总结。主要分为:框架结构上面的trick、参数设置上的trick、在文字检测特定领域的trick。

首先,是对Faster RCNN、SSD原始框架上的一些改进。
包括:

  1. 通过各种方式来增加上下文信息辅助检测。例如通过空间RNN来引入上下文信息[80],通过放大faster rcnn的候选框来获得上下文信息[81][82][83],用dilation 卷积来获得上下文[84][85],用global pooling来获取上下文信息[86],对每个候选框都加入全局的分类结果来获得上下文信息[87]。Dssd[88]通过反卷积来利用SSD框架同一位置上高层感受野更大的特征增强本层的特征,来加入上下文信息。
  2. 改进分类损失。Sheng Tang等人[85]提出要加入sink类来改善某些背景类容易错分的情况, Tsung-Yi Lin等人[89]提出了focal loss,来改善单阶段框架下,类别数目不匹配的情况。
  3. 训练方式和样本扩增。OHEM[90]通过在线困难样本挖掘训练了更有判别力分类分支网络。A-fast-rcnn[91]采用生成对抗式网络的训练形式,在线产生训练困难的有遮挡或形变的样本。SSD[68]采用了丰富的数据扩增,包括镜像、颜色畸变、尺度缩放和纵横比缩放,极大提高了检测性能。
  4. 增强特征。Hypernet[92]把从高层到底层的多特征融合,然后进行ROIpooling,获得了更高的精度,FPN[93]通过反卷积网络,构建了每层都有相同特征强度的特征金字塔,对多尺度的目标都可以很好地处理。Jiannan Li [94]提出用生成对抗式网络式的训练,将小目标通过ROIpooling得到的特征逼近大目标ROIPooling提出的特征。
  5. 改进proposal 产生方式。J Hosang [95]通过实验表明,proposal方法的recall是影响检测器的性能的决定因素之一。CRAFT[98]通过两级的模型来回归出更好的object proposal。
  6. 改进回归方式,Spyros Gidaris [97]提出窗口微调与多窗口投票。首先利用Fast R-CNN[64]系列框架中对窗口进行回归的这个过程,反复迭代,然后用所有窗口投票,决定最终的目标类别与位置。

[44]郭会文. 监控视频中的快速多目标检测与跟踪研究[D].[S.l.]: 湖南大学, 2013.
[45]Viola P, Jones M. Robust real-time object detection[J]. International Journal of Computer Vision, 2001,4:34–47.
[46] Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 2005,1:886–893.
[47] Dollár P, Tu Z, Perona P, et al. Integral Channel Features.[C]//BMVC. 2009,2:5.
[48] Dollár P, Appel R, Belongie S, et al. Fast Feature Pyramids for Object Detection[J].
[49] Wang X, Han T X, Yan S. An HOG-LBP Human Detector with Partial Occlusion Handling[C]//Computer Vision, 2009 IEEE 12th International Conference on. 2009:32–39.
[50] Wang X, Han T X, Yan S. An HOG-LBP Human Detector with Partial Occlusion Handling[C]//Computer Vision, 2009 IEEE 12th International Conference on. 2009.
[51] Zhang S, Bauckhage C, Cremers A. Informed Haar-like Features Improve Pedestrian Detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014:947–954.
[52]ZhangS,BenensonR,SchieleB. Filtered channel features for pedestrian detection [C] // Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. 2015:1751–1760.
[53] Benenson R, Mathias M, Timofte R, et al. Pedestrian Detection at 100 Frames per Second[C]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. 2012:2903–2910.
[54] Felzenszwalb P F, Girshick R B, McAllester D, et al. Object Detection with DiscriminativelyTrained Part Based Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,32(9):1627–1645.
[55] Girshick R B, Felzenszwalb P F, McAllester D. Discriminatively Trained Deformable Part Models,Release 5[E].
[56] Bourdev L, Brandt J. Robust Object Detection via Soft Cascade[C]//Computer Vision and PatternRecognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 2005,2:236–243.
[57] Zhang C, Viola P A. Multiple-Instance Pruning for Learning Efficient Cascade Detectors[C]//Advances in Neural Information Processing Systems. 2008:1681–1688.
[58] Dollár P, Appel R, Kienzle W. Crosstalk Cascades for Frame-rate Pedestrian Detection[M]//ComputerVision–ECCV 2012.[S.l.]: Springer, 2012:645–659.
[59] Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification with Deep Convolutional Neural Networks[C]//Advances in neural information processing systems. 2012:1097–1105.
[60] Yang B, Yan J, Lei Z, et al. Convolutional Channel Features[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015:82–90.
[61] Li J, Liang X, Shen S, et al. Scale-aware Fast R-CNN for Pedestrian Detection[J]. CoRR, 2015, ab-s/1510.08160. http://arxiv.org/abs/1510.08160.
[62] Girshick R, Donahue J, Darrell T, et al. Region-based Convolutional Networks for Accurate Object Detection and Segmentation[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2016, 38(1):142–158.
[63] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015:1440–1448.
[64] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks[C]//Advances in Neural Information Processing Systems. 2015:91–99.
[65] Sermanet P, Eigen D, Zhang X, et al. Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks[J]. arXiv preprint arXiv:1312.6229, 2013.
[66] Huang L, Yang Y, Deng Y, et al. DenseBox: Unifying Landmark Localization with End to End Object Detection[J]. arXiv preprint arXiv:1509.04874, 2015.
[67] Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-time Object Detection[J]. arXiv preprint arXiv:1506.02640, 2015.
[68] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[J]. arXiv preprint arXiv:1512.02325, 2015.
[69] He K, Zhang X, Ren S, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(9):1904-1916.
[70]Dai J, Li Y, He K, et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks[J]. 2016.
[80] Bell S, Zitnick C L, Bala K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016:2874-2883.
[81] Zeng X, Ouyang W, Yang B, et al. Gated Bi-directional CNN for Object Detection[C]// European Conference on Computer Vision. Springer, Cham, 2016:354-369.
[82]Gidaris S, Komodakis N. Object Detection via a Multi-region and Semantic Segmentation-[Aware CNN Model[C]// IEEE International Conference on Computer Vision. IEEE Computer Society, 2015:1134-1142.
[83]Zagoruyko S, Lerer A, Lin T Y, et al. A MultiPath Network for Object Detection[J]. 2016.
[84]Najibi M, Samangouei P, Chellappa R, et al. SSH: Single Stage Headless Face Detector[J]. 2017.
[85]http://image-net.org/challenges/talks/2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf
[86]He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[C]// Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
[87]Ouyang W, Loy C C, Tang X, et al. DeepID-Net: Deformable deep convolutional neural networks for object detection[C]// Computer Vision and Pattern Recognition. IEEE, 2015:2403-2412.
[88]Fu C Y, Liu W, Ranga A, et al. DSSD : Deconvolutional Single Shot Detector[J]. 2017.
[89]Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[J]. 2017.
[90]Shrivastava A, Gupta A, Girshick R. Training Region-Based Object Detectors with Online Hard Example Mining[J]. 2016:761-769.
[91]Wang X, Shrivastava A, Gupta A. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection[J]. 2017.
[92]Kong T, Yao A, Chen Y, et al. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection[J]. 2016:845-853.
[93] Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[J]. 2016.
[94]Li J, Liang X, Wei Y, et al. Perceptual Generative Adversarial Networks for Small Object Detection[J]. 2017.
[95]Hosang J, Benenson R, Dollár P, et al. What Makes for Effective Detection Proposals?[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(4):814.
[96]Qiaoyong Zhong, Chao Li, Yingying Zhang etal. “Towards Good Practices for Recognition & Detection” (Slides), the Second ImageNet and COCO Visual Recognition Challenges Joint Workshop in conjunction with ECCV 2016,
http://image-net.org/challenges/talks/2016/Hikvision_at_ImageNet_2016.pdf
[97] Gidaris S, Komodakis N. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model[C]// IEEE International Conference on Computer Vision. IEEE Computer Society, 2015:1134-1142.
[98]Zeng X, Ouyang W, Yan J, et al. Crafting GBD-Net for Object Detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, PP(99):1-1.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 201,552评论 5 474
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 84,666评论 2 377
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 148,519评论 0 334
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,180评论 1 272
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,205评论 5 363
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,344评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,781评论 3 393
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,449评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,635评论 1 295
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,467评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,515评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,217评论 3 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,775评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,851评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,084评论 1 258
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,637评论 2 348
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,204评论 2 341

推荐阅读更多精彩内容