实现Mask RCNN Pytorch版本

2022-07-02

语义分割

Word count: 322 | Reading time≈ 1 min

整体架构

本次实现的Mask RCNN主要分为以下几个部分：

backbone
RPN

backbone

首先原文提出了四种backbone：

resnet50
resnet50 + FPN
resnet101
resnet101 + FPN

本次基于resnet101 + FPN实现

resnet101^[1]

就是基础的resnet模型, 可借用torchvision实现

(resnet模型图)

特征金字塔(FPN)^[2]

FPN, 又称特征金字塔, 初始的FPN使用来进行目标检测的, 但由于其性能, 现大多用来做特征提取.

(FPN模型图)

基础的FPN先对高层特征图上采样(2倍的最邻近插值), 再对低层特征图升维(1 * 1 卷积), 最后对维度, 高度, 宽度一致的两个特征图进行元素对应相加, 得到融合特征图.

(resnet模型图)<sup id="fnref:3"><a href="#fn:3" rel="footnote"><span class="hint--top hint--error hint--medium hint--rounded hint--bounce" aria-label="[Mask-RCNN 算法及其实现详解](https://blog.csdn.net/remanented/article/details/79564045)
">[3]</span></a></sup>

Mask RCNN内的FPN上采样部分不变, 在升维部分, 这里的处理是把所有特征图维度处理至256.

在得到第一次P5, P4, P3, P2后, 对每一个特征图作3 * 3 卷积, 输出仍为256.

如图示, P6, P5, P4, P3, P2会送至RPN层.

RPN

Copyright： Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

整体架构

backbone

resnet101[1]

特征金字塔(FPN)[2]

RPN

resnet101^[1]

特征金字塔(FPN)^[2]