Mask R-CNN：实例分割与目标检测的高效框架

需积分: 13 151 浏览量更新于2024-09-09 收藏 3.2MB PDF 举报

"Mask R-CNN 是一种深度学习模型，由Kaiming He等人提出，用于实例分割和对象检测。该模型基于Feature Pyramid Network (FPN)，并添加了一个额外的分支来预测对象的分割掩模。它在检测物体的同时生成高质量的分割掩模，能够有效地处理小目标和细节遮挡问题。Mask R-CNN通过Top-Down和Bottom-Up的多层网络结构提高了性能。此外，模型中应用了一些工程技巧，如增加anchor的数量，增大图像尺寸以及调整ROI batch size，这些都有助于提升模型的表现。文章还提到了FCIS（Fully Convolutional Instance-aware Semantic Segmentation），这是另一个解决实例分割问题的框架，与Mask R-CNN相比，FCIS将mask估计和检测同时进行，利用两个紧密相关的任务之间的相互影响。FCIS是基于MXNet实现，支持多卡训练，并且在COCO分割竞赛中取得了冠军。" 在深度学习领域，卷积神经网络（CNN）被广泛应用于计算机视觉任务，如对象检测和分割。Mask R-CNN是CNN的一个重要应用，它扩展了Faster R-CNN框架，引入了实例分割的能力。Faster R-CNN通过区域提议网络（RPN）生成可能包含对象的区域，然后进行分类和框定位。然而，Mask R-CNN更进一步，增加了一个分支，用于预测每个实例的像素级掩模，实现了对象的精确分割。 Mask R-CNN的结构特点是使用特征金字塔网络，这是一个多尺度特征提取器，可以处理不同大小的对象，避免了小目标物体丢失的问题。模型的Top-Down和Bottom-Up结构使得信息可以在不同层次的特征图之间流动，增强了对细节和遮挡的敏感度。为了提高模型性能，研究者通常会尝试不同的超参数和优化策略。在Mask R-CNN中，增加了anchor的数量（从12到15），图像尺寸增大（从600到800像素），ROI batch size调整到512，这些改进在实验中显示显著提升了基础模型（Faster R-CNN）的AP值（从26.3到31.6）。这些实践技巧不仅适用于Mask R-CNN，也适用于其他类似任务，例如FCIS。 FCIS是一种全卷积的实例感知语义分割方法，它解决了先估计掩模再做检测的问题，通过inside/outside得分映射实现检测和掩模估计的协同。FCIS在COCO 2015分割竞赛中获胜，其代码最终开源，基于MXNet实现，支持多GPU训练，提高了训练效率。 Mask R-CNN和FCIS都是解决实例分割问题的强大工具，它们通过创新的网络架构和训练策略实现了优异的性能。这些方法不仅在COCO挑战赛中取得领先地位，也为后续的研究提供了坚实的基础和参考。

Mask R-CNN

Kaiming He Georgia Gkioxari Piotr Doll

ar Ross Girshick

Facebook AI Research (FAIR)

Abstract

We present a conceptually simple, ﬂexible, and general

framework for object instance segmentation. Our approach

efﬁciently detects objects in an image while simultaneously

generating a high-quality segmentation mask for each in-

stance. The method, called Mask R-CNN, extends Faster

R-CNN by adding a branch for predicting an object mask in

parallel with the existing branch for bounding box recogni-

tion. Mask R-CNN is simple to train and adds only a small

overhead to Faster R-CNN, running at 5 fps. Moreover,

Mask R-CNN is easy to generalize to other tasks, e.g., al-

lowing us to estimate human poses in the same framework.

We show top results in all three tracks of the COCO suite of

challenges, including instance segmentation, bounding-box

object detection, and person keypoint detection. Without

tricks, Mask R-CNN outperforms all existing, single-model

entries on every task, including the COCO 2016 challenge

winners. We hope our simple and effective approach will

serve as a solid baseline and help ease future research in

instance-level recognition. Code will be made available.

1. Introduction

The vision community has rapidly improved object de-

tection and semantic segmentation results over a short pe-

riod of time. In large part, these advances have been driven

by powerful baseline systems, such as the Fast/Faster R-

CNN [9, 29] and Fully Convolutional Network (FCN) [24]

frameworks for object detection and semantic segmenta-

tion, respectively. These methods are conceptually intuitive

and offer ﬂexibility and robustness, together with fast train-

ing and inference time. Our goal in this work is to develop a

comparably enabling framework for instance segmentation.

Instance segmentation is challenging because it requires

the correct detection of all objects in an image while also

precisely segmenting each instance. It therefore combines

elements from the classical computer vision tasks of ob-

ject detection, where the goal is to classify individual ob-

jects and localize each using a bounding box, and semantic

segmentation, where the goal is to classify each pixel into

RoIAlign

class

box

conv

Figure 1. The Mask R-CNN framework for instance segmentation.

a ﬁxed set of categories without differentiating object in-

stances.

Given this, one might expect a complex method

is required to achieve good results. However, we show that

a surprisingly simple, ﬂexible, and fast system can surpass

prior state-of-the-art instance segmentation results.

Our method, called Mask R-CNN, extends Faster R-CNN

[29] by adding a branch for predicting segmentation masks

on each Region of Interest (RoI), in parallel with the ex-

isting branch for classiﬁcation and bounding box regres-

sion (Figure 1). The mask branch is a small FCN applied

to each RoI, predicting a segmentation mask in a pixel-to-

pixel manner. Mask R-CNN is simple to implement and

train given the Faster R-CNN framework, which facilitates

a wide range of ﬂexible architecture designs. Additionally,

the mask branch only adds a small computational overhead,

enabling a fast system and rapid experimentation.

In principle Mask R-CNN is an intuitive extension of

Faster R-CNN, yet constructing the mask branch properly

is critical for good results. Most importantly, Faster R-CNN

was not designed for pixel-to-pixel alignment between net-

work inputs and outputs. This is most evident in how

RoIPool [14, 9], the de facto core operation for attending

to instances, performs coarse spatial quantization for fea-

ture extraction. To ﬁx the misalignment, we propose a sim-

ple, quantization-free layer, called RoIAlign, that faithfully

preserves exact spatial locations. Despite being a seem-

Following common terminology, we use object detection to denote

detection via bounding boxes, not masks, and semantic segmentation to

denote per-pixel classiﬁcation without differentiating instances. Yet we

note that instance segmentation is both semantic and a form of detection.

2017 IEEE International Conference on Computer Vision

DOI 10.1109/ICCV.2017.322

2980

下载后可阅读完整内容，剩余8页未读，立即下载

chinesemengmeng

粉丝: 0

Mask R-CNN：实例分割与目标检测的高效框架

Mask R-CNN

mask_rcnn_inception_v2_coco_2018_01_28(附代码).zip

mask_rcnn_coco.h5

什么是Mask R-CNN？Mask R-CNN的工作原理.docx

Car-Damage-Detection-Mask-R-CNN:利用Mask R-CNN在计算机视觉应用中检测汽车损坏

Diagnosis-Gastric-Cancer-with-Mask-R-CNN

基于Keras的Mask R-CNN训练过程中的Loss格式化输出实现

"Mask R-CNN 及 Yolov4 应用于电力巡检中绝缘子缺陷检测研究

Mask R-CNN：面向目标检测与实例分割的框架

改进Mask R-CNN网络在乳腺肿瘤目标检测中的应用

最新资源