之前做完比赛过后计划看看mmdetection的源码写点blog,写了两篇过后忙其他事去了,这里就接着把之前没写完的东西补上。
之前写了模型和网络的创建,这里就主要写下训练过程中具体的loss,主要分为以下几部分
- RPN_loss
- bbox_loss
- mask_loss
RPN_loss
rpn_loss
的实现具体定义在mmdet/models/anchor_head/rpn_head.py
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
img_metas,
cfg,
gt_bboxes_ignore=None):
losses = super(RPNHead, self).loss(
cls_scores,
bbox_preds,
gt_bboxes,
None,
img_metas,
cfg,
gt_bboxes_ignore=gt_bboxes_ignore)
return dict(
loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])
具体的计算方式定义在其父类mmdet/models/anchor_heads/anchor_head.py
,主要是loss
和loss_single
两个函数。
先看loss
函数
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
gt_labels,
img_metas,
cfg,
gt_bboxes_ignore=None):
featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
assert len(featmap_sizes) == len(self.anchor_generators)
anchor_list, valid_flag_list = self.get_anchors(
featmap_sizes, img_metas)
label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
cls_reg_targets = anchor_target(
anchor_list,
valid_flag_list,
gt_bboxes,
img_metas,
self.target_means,
self.target_stds,
cfg,
gt_bboxes_ignore_list=gt_bboxes_ignore,
gt_labels_list=gt_labels,
label_channels=label_channels,
sampling=self.sampling)
if cls_reg_targets is None:
return None
(labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,
num_total_pos, num_total_neg) = cls_reg_targets
num_total_samples = (
num_total_pos + num_total_neg if self.sampling else num_total_pos)
losses_cls, losses_bbox = multi_apply(
self.loss_single,
cls_scores,
bbox_preds,
labels_list,
label_weights_list,
bbox_targets_list,
bbox_weights_list,
num_total_samples=num_total_samples,
cfg=cfg)
return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)
这个主要做了两个事
- 生成
anchor
和对应的target
- 计算
loss
首先,在此时rpn
的输出为feature map
中每个位置的anchor
分类的score
以及该anchor
的bbox
的修正值,我们要通过和gt
计算loss
来优化我们的网络,但是我们的gt是一堆人工标注的bbox
,无法直接计算loss
。所以,我们应该要先获取到anchor
然后将这些anchor
和 gt
对比在分别得到正负样本以及对应的target
,之后我们才能计算得到loss
。
所以第一步通过anchor_list, valid_flag_list = self.get_anchors(featmap_sizes, img_metas)
获取到了所有的anchor
以及一个 是否有效的flag
(根据bbox是否超出图片边界来计算。)
拿到了所有的anchor
之后就是和gt对比来区分正负样本以及生成label
了,通过定义在mmdet/core/anchor/anchor_target.py
的anchor_target()
实现。
在这个函数中调用assigner
将anchor
和gt
关联起来,得到正样本
和负样本
,并用sampler
将这些结果进行封装,方便之后使用。
得到了target
过后,就是计算loss
了,在self.loss_single
中,
def loss_single(self, cls_score, bbox_pred, labels, label_weights,
bbox_targets, bbox_weights, num_total_samples, cfg):
# classification loss
labels = labels.reshape(-1)
label_weights = label_weights.reshape(-1)
cls_score = cls_score.permute(0, 2, 3,
1).reshape(-1, self.cls_out_channels)
loss_cls = self.loss_cls(
cls_score, labels, label_weights, avg_factor=num_total_samples)
# regression loss
bbox_targets = bbox_targets.reshape(-1, 4)
bbox_weights = bbox_weights.reshape(-1, 4)
bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)
loss_bbox = self.loss_bbox(
bbox_pred,
bbox_targets,
bbox_weights,
avg_factor=num_total_samples)
return loss_cls, loss_bbox
这里用的loss
就是常见的CrossEntropyLoss
和SmoothL1Loss
bbox_loss
之前的rpn_loss
是对候选框的第一次修正,这里的bbox_loss
就是第二次修正了,两者的实际差别主要体现在分类上,在rpn
阶段只分两类(前景和背景),这里分类数为N+1
(真实类别+背景)
具体定义在mmdet/models/bbox_heads/bbox_head.py
def loss(self,
cls_score,
bbox_pred,
labels,
label_weights,
bbox_targets,
bbox_weights,
reduce=True):
losses = dict()
if cls_score is not None:
losses['loss_cls'] = self.loss_cls(
cls_score, labels, label_weights, reduce=reduce)
losses['acc'] = accuracy(cls_score, labels)
if bbox_pred is not None:
pos_inds = labels > 0
if self.reg_class_agnostic:
pos_bbox_pred = bbox_pred.view(bbox_pred.size(0), 4)[pos_inds]
else:
pos_bbox_pred = bbox_pred.view(bbox_pred.size(0), -1,
4)[pos_inds, labels[pos_inds]]
losses['loss_bbox'] = self.loss_bbox(
pos_bbox_pred,
bbox_targets[pos_inds],
bbox_weights[pos_inds],
avg_factor=bbox_targets.size(0))
return losses
可以看到和rpn loss
相比,这里要简单很多,因为这里只包含了rpn loss
中实际计算loss
的部分,但是他也同样需要rpn
中的assign
和sample
操作,两者的区别只是assign
的输入不同,rpn
的assign
输入是该图所有的anchor
, bbox
部分assign
的输入就是rpn
的输出。这里的loss
和rpn
中的计算方式完全一样,就不在赘述了。
mask_loss
mask
部分计算loss
之前也有一个获取target
的步骤。
mmdet/models/mask_heads/fcn_mask_head.py
def get_target(self, sampling_results, gt_masks, rcnn_train_cfg):
pos_proposals = [res.pos_bboxes for res in sampling_results]
pos_assigned_gt_inds = [
res.pos_assigned_gt_inds for res in sampling_results
]
mask_targets = mask_target(pos_proposals, pos_assigned_gt_inds,
gt_masks, rcnn_train_cfg)
return mask_targets
这里获取target
相对之前来说就要简单点了,通过定义在mmdet/core/mask/mask_target.py
的mask_target()
取到和prooisals
相同大小的mask
就行了。
def mask_target(pos_proposals_list, pos_assigned_gt_inds_list, gt_masks_list,
cfg):
cfg_list = [cfg for _ in range(len(pos_proposals_list))]
mask_targets = map(mask_target_single, pos_proposals_list,
pos_assigned_gt_inds_list, gt_masks_list, cfg_list)
mask_targets = torch.cat(list(mask_targets))
return mask_targets
def mask_target_single(pos_proposals, pos_assigned_gt_inds, gt_masks, cfg):
mask_size = cfg.mask_size
num_pos = pos_proposals.size(0)
mask_targets = []
if num_pos > 0:
proposals_np = pos_proposals.cpu().numpy()
pos_assigned_gt_inds = pos_assigned_gt_inds.cpu().numpy()
for i in range(num_pos):
gt_mask = gt_masks[pos_assigned_gt_inds[i]]
bbox = proposals_np[i, :].astype(np.int32)
x1, y1, x2, y2 = bbox
w = np.maximum(x2 - x1 + 1, 1)
h = np.maximum(y2 - y1 + 1, 1)
# mask is uint8 both before and after resizing
target = mmcv.imresize(gt_mask[y1:y1 + h, x1:x1 + w],
(mask_size, mask_size))
mask_targets.append(target)
mask_targets = torch.from_numpy(np.stack(mask_targets)).float().to(
pos_proposals.device)
else:
mask_targets = pos_proposals.new_zeros((0, mask_size, mask_size))
return mask_targets
而loss
部分也比较简单,也是用的CrossEntropyLoss
。
def loss(self, mask_pred, mask_targets, labels):
loss = dict()
if self.class_agnostic:
loss_mask = self.loss_mask(mask_pred, mask_targets,
torch.zeros_like(labels))
else:
loss_mask = self.loss_mask(mask_pred, mask_targets, labels)
loss['loss_mask'] = loss_mask
return loss
总结
总的来说这些loss
还是算比较好理解的,看起来有三部分的loss
,但是实际上每个部分的都差不多。
下一篇就准备写下整个的训练流程了,相当于将前面这三篇给连起来,有个更具体的理解。