背景知识:作为传统的计算机视觉领域的经典问题,Semantic Segmentation 语义分割是分割问题研究的热门问题。具体来说,我们的目标是对于图像中所有像素点分配给其对应的标签(区别于Object Detection 和 Localization),但是语义分割和Instance Segmentation 分割的最主要的区别在于,Semantic Segmentation 只是关心每个像素所属的类别,而不关心其具体内容实例。具体举例来说,上图Semantic Segmentation 对于草地上的四头牛只关心其属于类别为 "Cow" ,而并不区分其中每个牛的不同个体。
CRF:structured models
最重要的:the ability to explicitly model the dependencies between output variables(CRFs) using thereby the incredible power of CNNs.
最主要的贡献: a joint, maximum likelihood-based, learning procedure for all model parameters. 之前的都是training-in-pieces或者joint learning of restricted model families,例如 Gaussian CRFs 或者 CRFs with a few variables only。
Introduction:
计算机视觉的CNN任务,例如segmentation,stereo或者更复杂的问题scene understanding。Deep models有一些问题,最主要的是这些模型是highly data-oriented,例如给这些dl框架增加先验知识很困难。
Graphical model例如CRF give fairly more possible,他们可以capture for example geometric properties, spatial relations between objects, global properties like e.g. connectivity, shape and many others(例如,几何属性,对象之间的空间关系,诸如连接性、形状等全局属性)
结合CNN和CRF的最重要的优点在于:the ability to explicitly model the dependencies between output variables(CRFs) using thereby the incredible power of CNNs.
过去的三个相关的工作(try to learn CNNs and CRFs on top jointly):
(Lin et al, 2015)最重要的假设是the potentials can be approximated by logarithms of the corresponding marginals. 可能性通过相应边缘的对数来估计。 这个假设很具有约束性,一些模型也是错误的。当underlying graph is a chain,如果他的pairwise potential对势设置为the corresponding marginal pair-probabilities的对数,一元团的概率设置为corresponding marginal label probabilities的负对数。对于任意图,概率和边缘概率之间的关系是不知道的,并且计算是NP难问题。 我们的模型 there is no model approximation in our method, but only algorithmic one. 换句话说,we do not want to compute wrong quantities exactly, our aim is to compute right quantities approximately.
(Zheng et al, 2015)where the Mean Field approximation平均场近似 is employed for inference. RNN,使用a joint end-to-end training。还是使用了模型假设,Mean Field is an approximation of the maximum marginal decision indeed最大边际决策.另一个问题是it works with a particular class of pairwise potential only. 使用我们的方法可以学习 repulsive pairwise potentials
(Chen et al. 2015),使用极大似然概率来学习。It is proposed to substitute the true marginals by local beliefs obtained by Loopy Belief Propagation. (这里实在看不懂。。。以后如果再看再来补上吧)
模型:
A CRF with unary potentials that depend on the image through a Convolutional Neural Network(CNN)。
G=(R,E) 是一个图,R是node set,E是edge set。Node和图像的像素关联,每个像素都被labelled by a label l from a pre-defined finite discrete label set L. 例如,在身体部位被标记为“head”,“left hand”,“torso”等。我们的任务是给每个像素分配一个label,为了获取这样的映射y:R-->L。令yi属于L,i是节点属于R。更general一些,yA denotes the restriction of the labelling y to a subset of nodes A属于R。建模后验概率分布 of labellings y given images x we use pairwise CRF whose energy can be written as
一元团:
二元团:
referred by the index c in equation (1)
例如:一类包含所有边,connecting pixels that are neighbours in the horizontal direction in the image grid. 另一类:the set of edges so that one node is 2 pixels left and 3 pixels above the other one, etc.