目标跟踪简述+深度学习目标跟踪+context目标跟踪

Visual Tracking With Deep Learning And The Context

一. The overview of Visual Tracking 目标跟踪简介

1. What is visual tracking?


This three pictures are the 1,40,80 frame of the same video.When we give the bounding-box of the running woman in the first frame,the bounding-box can still circle the same woman.

Given the initialized state (e.g.position and size) of a target object in a frame of a video, the goal of tracking is to estimate the states of the target in the subsequent frames.

Although object tracking has been studied for several decades, and much progress has been made in recent years , it remains a very challenging problem.

Numerous factors affect the performance of a tracking algorithm, such as illumination variation, occlusion, as well as background clutters, and there exists no single tracking approach that can successfully handle all scenarios.

2. Difficulties of visual tracking

There are many limiting factors of object tracking based on video image. In the theory and method, the research on the target tracking is confronted with great challenge.

The diversity of the target

  • Multiple moving targets. It is difficult to describe the unified model.

  • Motion laws of the targets are very complex.

  • The movement of the targets can lead to changes in its appearance.

  • Mutual occlusion may occur between multiple moving objects.

The complexity of the scene

  • Changes in lighting, atmospheric conditions in the scene can cause serious interference.

  • Regions having similar appearance as the target.

  • The target may be obscured by objects in the scene

In a dilemma

  • Fast but Fallible

  • Robust but Slow

  • The contradiction between real-time and accuracy

困境

3. Recent algorithms for visual tracking

Based on model matching

----- global model matching

  • Create a target appearance model online or offline.
  • Search for the most similar regions of the image in the model.
  • Advantage: Tracking rigid targets works well.
  • Disadvantage: can not work while the appearance changed.

-----Local model matching

  • Tracking targets are divided into different components, and the models are respectively established for each component.
  • Human motion is divided into head, limbs, body.
  • Advantage: Tracking stability. Especially occlusion
  • Disadvantage: Matching between components is difficult. time-consuming

-----Feature matching

  • Extracts features with translation, rotation, and scaling invariance.
  • Feature matching the current frame.
  • Advantage: insensitive to the shape, scale and other changes of the target.
  • Disadvantage: Most image features are sensitive to ambient conditions such as changes in light.

Based on classification

  • Take the tracking as online classification.
  • One is the target, the other is the background.
  • Training a target-background classifier.
  • The classifier is updated with the current image frame
  • Advantage: has a certain self-adaptability to the change of target
  • Disadvantage: Classification accuracy often depends on the expression of target features

Based on bayes filtering

  • Combining a priori information with current information.
  • The state of the target image in the current frame is estimated optimally using the a priori information before the current frame.
  • Typical algorithms include** Kalman filter** and particle filter.
  • Advantage: Wide range of applications and less constraints.
  • disadvantage: Particle filter algorithms often produce a large number of particles due to the precision of filtering, and the more the number of particles required, the higher the complexity of the algorithm

Based on deep learning(after 2015)

Depth learning in the field of target tracking is not smooth sailing. The main problem is the lack of training data: one of the magic of the depth model comes from the effective training of a large number of labeled training data, while the target tracking only provides the first frame of the bounding-box as training data. In this case, it is difficult to train a depth model at the beginning of the trace for the current target.

Several ideas:
  • Pre-training the depth model with auxiliary image data, and fine-tune on-line tracking.(DLT,SO-DLT NIPS15)
  • The CNN classification network pre-trained by the existing large-scale classification dataset is used to extract the features.(FCNT,HCFT ICCV15)
  • Pre-training with tracking sequences.(Mdnet CVPR16)
  • Using RNN.(RTT CVPR16)

4. Deep Learning for visual tracking

DLT: Learning a Deep Compact Image Representation for Visual Tracking (NIPS 2014)

DLT

预训练:SDAE+Tiny Image dataset+无监督训练:通用的物体表征能力;
在线跟踪结构:SDAE的encoding(通用特征表示)+sigmoid分类(二分类跟踪方式):获得 目标与背景的分类;
微调:利用第一帧获取正负样本:获取当前目标与背景更有针对性的分类网络;
后续帧跟踪:当前帧粒子滤波提取patch+patch依次输入分类网络+置信度;
模型更新:限定阈值;
优点:预训练+微调:解决训练数据不足
缺点:32*32 自编码器是否适合分类跟踪任务 4层网络特征表达能力不足

SO-DLT:Transferring Rich Feature Hierarchies for Robust Visual Tracking(ICCV 2015)

SO-DLT

在线跟踪:处理t帧时,以t-1帧预测位置为中心; 从小到大采样不同尺度区域,依次放入网络; 当CNN输出的概率图高于一个值,停止采样,以当前概率图为最佳区域; 在最终区域里确定boundingbox大小与位置
模型更新:CNNs---->及时响应目标变化; CNNl---->对噪声鲁棒;
借鉴:ensemble的思路解决update 的敏感性 ,跟踪算法提高评分的杀手锏。

FCNT: Visual Tracking with Fully Convolutional Networks (ICCV 2015)

FCNT

预训练:VGGNet+imageNet已分类数据集;
核心: FeatureMap可以直接做跟踪目标定位;
高层特征:擅长区分不同类(高度抽象)
底层特征:擅长区分同类物体(关注局部细节)
两层卷积结构: conv4-3:区分相似物体distractor(SNet) conv5-3:区分类别信息 (GNet)
在线跟踪: 利用上一帧中心采样一块区域,分别输入SNet和GNet; 生成两个heatmap(互补);
SNet:去掉了distractor
GNet:目标更加明显
总结: 有效抑制漂移,对遮挡不鲁棒 track新思路(多少层 哪几层)

MDNet:Learning Multi-Domain Convolutional Neural Networks for Visual Tracking(CVPR 2016)

图像分类与实际跟踪的巨大差别;
图像分类: 目标和背景的任意组合,目标出现在任何一个背景都要被检测出;
实际跟踪: 给出第一帧的前后景后,后续帧前后景和第一帧很类似;
直接用视频序列预训练CNN; 目标差别:某类物体在一个序列中是目标,在另一个就可能是背景;

MDNet

共享层:CNN获得目标通用的特征表达;
特定区域层:每个训练序列--->单独的domain--->单独的二分类层--->区分当前序列前后景 (解决不同序列目标不一致问题)
确定bounding:RCNN Region Proposal方式 上一帧附近寻找256个proposal,之后进行bounding回归
总结:Precision达到了94.8% 实时性:目标检测的Region Proposal是否适合在线跟踪任务 (256个proposal 89个domain)

Use RNN?

这是一个视频的第一帧 第10帧和第20帧,汽车在匀速前进时,视频序列具有明显的时序相关性。
跟踪任务的特殊性(时间序列,前后相关)
是否可以使用多方向的递归神经网络(RNN)学出跟踪视频序列的前后关联性?

What is RNN ?

RNN神经元
随时间展开的RNN

RNN Tracker

CVPR2016

image.png

AAAI2016

5. Visual Tracking With The Context

Context information is also very important for tracking.
Recently, some approaches have been proposed by mining auxiliary objects or local visual information surrounding the target to assist tracking .
The context information is especially helpful when the target is fully occluded or leaves the image region .
To improve the tracking performance, some tracker fusion methods have been proposed recently.

Context-Aware Visual Tracking

the environment can also be advantageous to the tracker if it contains objects that are correlated to the target

Question: whether the object being followed by the tracker is really the target?
Answer:Use the dynamic environment!


How to track a face in a crowd?

  • it is almost impossible to learn a discriminative model to distinguish the face of interest from the rest of the crowd.

Why do we have to focus our attention only on the target?

  • If the person (with that face) is wearing a quite unique shirt (or a hat), then including the shirt (or the hat) in matching will surely make the tracking much easier and more robust.
  • if another face always accompanies the target face, treating them as a geometric structure and tracking them as a group.

It seems that:

  • A target is seldom isolated and independent to the entire scene.
  • there may exist some objects that have short-term or long-term motion correlations to the targets.

So why not track the target and auxiliary objects as a group?

What is auxiliary objects?

  • frequent co-occurrence with the target .
  • consistent motion correlation to the target.
  • suitable for tracking.

This definition may cover a large variety of image regions or features

  • simple,generic, and low-level is better
  • Choose color regions but not the features
  • Because the color regions can be reliably and efficiently tracked

Experiments


(The yellow bounding-box is the target. the red are the color region.)

Tracking the Invisible: Learning Where the Object Might be

context helps in object detection is wellknown.
strongest predictors of vehicle presence and location in an image is the shadow it casts on the road


In tracking, many temporary, but potentially very strong links exist between the tracked object and the rest of the image.

local image features vote for the object.

  • Implicit Shape Model is used to choose the local image features.
  • Object points lie on the object surface and thus always have a strong correlation to the object motion(green points).
  • points on other independently moving objects or in the static background, are considered to carry no information about the object position(blue points).
  • Supporters are features which are useful to predicting the target object positions. They at least temporarily move in a way which is statistically related to the motion of the target(red points).

the position of an object can be estimated even when it is not seen directly (e.g., fully occluded or outside of the image region)
How to choose the supporter?


Experiments

We can see what we can not see

Context Tracker: Exploring Supporters and Distracters

Visual tracking is very challenging when the target leaves the field of view leading the tracker to follow another similar object, and not reacquire the right target when it reappears.
There is additional information which can be exploited instead of using only the object region.

What is supporters and distracters?
Distracters

  • Regions have similar appearance as the target
  • consistently co-occur
  • The tracker must keep tracking these distracters to avoid drifting
  • dangerous


Supporters

  • local key-points around the target
  • consistently co-occur
  • motion correlation
  • useful


Experiments

6. 目标跟踪的方向

提高目标的特征描述能力

  • 足够强的特征能够应对绝大多负面的环境影响
    提高系统实时性
  • 搜索策略需要遍历很多冗余区域大大影响到跟踪算法的实时性
  • 如何缩小目标搜索范围
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 199,830评论 5 468
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 83,992评论 2 376
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 146,875评论 0 331
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,837评论 1 271
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,734评论 5 360
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,091评论 1 277
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,550评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,217评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,368评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,298评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,350评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,027评论 3 315
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,623评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,706评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,940评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,349评论 2 346
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 41,936评论 2 341

推荐阅读更多精彩内容

  • 简单画画,求意见 古风画法,第一次
    墨写传说zp阅读 161评论 0 0
  • 灰色,是阴郁的颜色,我并不喜欢。但渐渐发现,自己的手机壁纸,输入法的背景,头像……居然都变成了灰色。当生活中不再...
    印凝阅读 284评论 0 0
  • 想当初 我错把你的冷漠当成了耍酷 而如今 看清楚 心痛的一塌糊涂 不愿意认输 就一步 两步 后退 然后抱着自己哭 ...
    小菲菲菲菲菲菲儿阅读 241评论 0 2