近期计划对Faster RCNN、SSD的一系列trick进行总结。主要分为:框架结构上面的trick、参数设置上的trick、在文字检测特定领域的trick。
首先,是对Faster RCNN、SSD原始框架上的一些改进。
包括:
- 通过各种方式来增加上下文信息辅助检测。例如通过空间RNN来引入上下文信息[80],通过放大faster rcnn的候选框来获得上下文信息[81][82][83],用dilation 卷积来获得上下文[84][85],用global pooling来获取上下文信息[86],对每个候选框都加入全局的分类结果来获得上下文信息[87]。Dssd[88]通过反卷积来利用SSD框架同一位置上高层感受野更大的特征增强本层的特征,来加入上下文信息。
- 改进分类损失。Sheng Tang等人[85]提出要加入sink类来改善某些背景类容易错分的情况, Tsung-Yi Lin等人[89]提出了focal loss,来改善单阶段框架下,类别数目不匹配的情况。
- 训练方式和样本扩增。OHEM[90]通过在线困难样本挖掘训练了更有判别力分类分支网络。A-fast-rcnn[91]采用生成对抗式网络的训练形式,在线产生训练困难的有遮挡或形变的样本。SSD[68]采用了丰富的数据扩增,包括镜像、颜色畸变、尺度缩放和纵横比缩放,极大提高了检测性能。
- 增强特征。Hypernet[92]把从高层到底层的多特征融合,然后进行ROIpooling,获得了更高的精度,FPN[93]通过反卷积网络,构建了每层都有相同特征强度的特征金字塔,对多尺度的目标都可以很好地处理。Jiannan Li [94]提出用生成对抗式网络式的训练,将小目标通过ROIpooling得到的特征逼近大目标ROIPooling提出的特征。
- 改进proposal 产生方式。J Hosang [95]通过实验表明,proposal方法的recall是影响检测器的性能的决定因素之一。CRAFT[98]通过两级的模型来回归出更好的object proposal。
- 改进回归方式,Spyros Gidaris [97]提出窗口微调与多窗口投票。首先利用Fast R-CNN[64]系列框架中对窗口进行回归的这个过程,反复迭代,然后用所有窗口投票,决定最终的目标类别与位置。
[44]郭会文. 监控视频中的快速多目标检测与跟踪研究[D].[S.l.]: 湖南大学, 2013.
[45]Viola P, Jones M. Robust real-time object detection[J]. International Journal of Computer Vision, 2001,4:34–47.
[46] Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 2005,1:886–893.
[47] Dollár P, Tu Z, Perona P, et al. Integral Channel Features.[C]//BMVC. 2009,2:5.
[48] Dollár P, Appel R, Belongie S, et al. Fast Feature Pyramids for Object Detection[J].
[49] Wang X, Han T X, Yan S. An HOG-LBP Human Detector with Partial Occlusion Handling[C]//Computer Vision, 2009 IEEE 12th International Conference on. 2009:32–39.
[50] Wang X, Han T X, Yan S. An HOG-LBP Human Detector with Partial Occlusion Handling[C]//Computer Vision, 2009 IEEE 12th International Conference on. 2009.
[51] Zhang S, Bauckhage C, Cremers A. Informed Haar-like Features Improve Pedestrian Detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014:947–954.
[52]ZhangS,BenensonR,SchieleB. Filtered channel features for pedestrian detection [C] // Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. 2015:1751–1760.
[53] Benenson R, Mathias M, Timofte R, et al. Pedestrian Detection at 100 Frames per Second[C]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. 2012:2903–2910.
[54] Felzenszwalb P F, Girshick R B, McAllester D, et al. Object Detection with DiscriminativelyTrained Part Based Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,32(9):1627–1645.
[55] Girshick R B, Felzenszwalb P F, McAllester D. Discriminatively Trained Deformable Part Models,Release 5[E].
[56] Bourdev L, Brandt J. Robust Object Detection via Soft Cascade[C]//Computer Vision and PatternRecognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 2005,2:236–243.
[57] Zhang C, Viola P A. Multiple-Instance Pruning for Learning Efficient Cascade Detectors[C]//Advances in Neural Information Processing Systems. 2008:1681–1688.
[58] Dollár P, Appel R, Kienzle W. Crosstalk Cascades for Frame-rate Pedestrian Detection[M]//ComputerVision–ECCV 2012.[S.l.]: Springer, 2012:645–659.
[59] Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification with Deep Convolutional Neural Networks[C]//Advances in neural information processing systems. 2012:1097–1105.
[60] Yang B, Yan J, Lei Z, et al. Convolutional Channel Features[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015:82–90.
[61] Li J, Liang X, Shen S, et al. Scale-aware Fast R-CNN for Pedestrian Detection[J]. CoRR, 2015, ab-s/1510.08160. http://arxiv.org/abs/1510.08160.
[62] Girshick R, Donahue J, Darrell T, et al. Region-based Convolutional Networks for Accurate Object Detection and Segmentation[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2016, 38(1):142–158.
[63] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015:1440–1448.
[64] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks[C]//Advances in Neural Information Processing Systems. 2015:91–99.
[65] Sermanet P, Eigen D, Zhang X, et al. Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks[J]. arXiv preprint arXiv:1312.6229, 2013.
[66] Huang L, Yang Y, Deng Y, et al. DenseBox: Unifying Landmark Localization with End to End Object Detection[J]. arXiv preprint arXiv:1509.04874, 2015.
[67] Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-time Object Detection[J]. arXiv preprint arXiv:1506.02640, 2015.
[68] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[J]. arXiv preprint arXiv:1512.02325, 2015.
[69] He K, Zhang X, Ren S, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(9):1904-1916.
[70]Dai J, Li Y, He K, et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks[J]. 2016.
[80] Bell S, Zitnick C L, Bala K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016:2874-2883.
[81] Zeng X, Ouyang W, Yang B, et al. Gated Bi-directional CNN for Object Detection[C]// European Conference on Computer Vision. Springer, Cham, 2016:354-369.
[82]Gidaris S, Komodakis N. Object Detection via a Multi-region and Semantic Segmentation-[Aware CNN Model[C]// IEEE International Conference on Computer Vision. IEEE Computer Society, 2015:1134-1142.
[83]Zagoruyko S, Lerer A, Lin T Y, et al. A MultiPath Network for Object Detection[J]. 2016.
[84]Najibi M, Samangouei P, Chellappa R, et al. SSH: Single Stage Headless Face Detector[J]. 2017.
[85]http://image-net.org/challenges/talks/2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf
[86]He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[C]// Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
[87]Ouyang W, Loy C C, Tang X, et al. DeepID-Net: Deformable deep convolutional neural networks for object detection[C]// Computer Vision and Pattern Recognition. IEEE, 2015:2403-2412.
[88]Fu C Y, Liu W, Ranga A, et al. DSSD : Deconvolutional Single Shot Detector[J]. 2017.
[89]Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[J]. 2017.
[90]Shrivastava A, Gupta A, Girshick R. Training Region-Based Object Detectors with Online Hard Example Mining[J]. 2016:761-769.
[91]Wang X, Shrivastava A, Gupta A. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection[J]. 2017.
[92]Kong T, Yao A, Chen Y, et al. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection[J]. 2016:845-853.
[93] Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[J]. 2016.
[94]Li J, Liang X, Wei Y, et al. Perceptual Generative Adversarial Networks for Small Object Detection[J]. 2017.
[95]Hosang J, Benenson R, Dollár P, et al. What Makes for Effective Detection Proposals?[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(4):814.
[96]Qiaoyong Zhong, Chao Li, Yingying Zhang etal. “Towards Good Practices for Recognition & Detection” (Slides), the Second ImageNet and COCO Visual Recognition Challenges Joint Workshop in conjunction with ECCV 2016,
http://image-net.org/challenges/talks/2016/Hikvision_at_ImageNet_2016.pdf
[97] Gidaris S, Komodakis N. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model[C]// IEEE International Conference on Computer Vision. IEEE Computer Society, 2015:1134-1142.
[98]Zeng X, Ouyang W, Yan J, et al. Crafting GBD-Net for Object Detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, PP(99):1-1.