introduction
we formulate a method for joint learning of local and global feature selection losses designed to optimise person re-id when using only generic matching metrics such as the L2 distance.即联合学习局部和全局特征。作者认为learning any matching distance metric is intrinsically learn- ing a global feature transformation across domains,所以其实特征的度量用简单的比如L2就可以了,主要应该聚焦于特征的提取和表达。
传统的手工提取特征主要提取的是局部特征,比如把图像切分成水平的条状来处理。而dl(deep learning)的方法主要提取的是图像的全局特征。但是作者认为这两种处理方式得到的特征都不是最优的,两者结合才好,因为人的视觉系统是同时处理这两种特征(global (contextual) and local (saliency) information)的。仔细想想,还是有那么点道理的。
作者的网络设计也是从这个角度出发,有两个branch,分别提取局部特征和全局特征,但是这个两个branch不是独立的,而是相互影响,共同学习的。这样一个网络的好处在于,不但可以同时提取局部和全局的特征,还可以学习局部和全局的关系,两者相互补足,来解决局部错位等reID中的典型问题。
此外,作者还introduce a structured sparsity based feature selection learning mechanism for improving multi- loss joint feature learning robustness w.r.t. noise and data co- variance between local and global representations.意思大概就是这是一种基于稀疏性的正则化的手段,用来解决噪声影响。
related work
1.saliency learning based models。这些方法不考虑全局特征,主要modelling localised part im- portance. However, these existing methods consider only the patch appearance statistics within individual locations but no global feature representation learning, let alone the correla- tion and complementary information discovery between local and global features as modelled by the JLML.
2.Spatially Constrained Similarity (SCS) model和Multi-Channel Parts (MCP) network 。这两个方法倒是同时考虑了全局特征。SCS主要聚焦于 supervised metric learning。但是SCS不考虑hand-crafted local and global features之间的关系。MCP主要用triplet ranking loss(不懂)来优化,而JLML主要用multiple classification loss,前者存在一定坏处:Critically, this one-loss model learning is likely to impose negative influ- ence on the discriminative feature learning behaviour for both branches due to potential over-low pre-branch independence and over-high inter-branch correlation. This may lead to sub- optimal joint learning of local and global feature selections in model optimisation, as suggested by our evaluation in Section4.3
3.HER model。主要用了regression loss,而JLML主要用的是classification loss。
4.DGD。这篇文章我仔细看过,它用的也是classification loss。和JLML的区别在于 他是one-loss classification 而JLML是 multi-loss classifi- cation
模型设计
(Note that, the ReLU,rectification non-linearity [Krizhevsky et al., 2012] after each conv layer is omitted for brevity.)
两个分支分别提取局部和全局特征。联合学习体现在下面两个方面:
1.low level的特征共享。有两个好处,第一,共享特征,第二,减少参数,防止过拟合,尤其是在reID这个问题上,因为reID的数据集比较小
2.最后把两个512维的特征向量叠加(local and global)
损失函数
这里他们的损失函数的选择不同于大多数现存的deep reID方法,他们的损失函数主要用的是 cross- entropy classification loss function。显存的deep reID方法主要用的contrastive loss,designed to exploit pairwise re-id labels de- fined by both positive and negative pairs, such as the pairwise verification。代表之一是An improved deep learning architecture for person re- identification. In CVPR, 2015.
这么选择损失函数的理由如下(不翻译了,说的还挺有道理的):The motivations for our JLML classification loss based learning are: (i) Significantly simplified training data batch construc- tion, e.g. random sampling with no notorious tricks required, as shown by other deep classification methods [Krizhevsky et al., 2012]. This makes our JLML model more scalable in real-world applications with very large training population sizes when available. This also eliminates the undesirable need for carefully forming pairs and/or triplets in preparing re-id training splits, as in most existing methods, due to the inherent imbalanced negative and positive pair size distribu- tions. (ii) Visual psychophysical findings suggest that rep- resentations optimised for classification tasks generalise well to novel categories [Edelman, 1998]. We consider that re- id tasks are about model generalisation to unseen test iden- tity classes given training data on independent seen identity classes. Our JLML model learning exploits this general clas- sification learning principle beyond the strict pair-wise rela- tive verification loss in existing re-id models.大意就是不要用正负样本这种形式,直接用正样本。DGD这篇文章也是用的一样的思想。
其他
最后就是一些训练细节,以及对模型各种方法有和没有的比较,证明这些方法是有好处的。好处最明显的就是联合global和local特征了:
还有就是两个分支单独学习比一起学习要好:
其他的比如有没有low level的shared feature和metric learning的选择,以及selective feature learning(就是那个看不懂的正则化),作用甚微。