RGB-T语义分割新突破：MEFNet融合专家策略提升性能

142 浏览量更新于2024-06-22 收藏 5.06MB PDF 举报

MEFNet是一种创新的RGB-T语义分割网络，其核心在于提出了一种多专家融合策略，旨在有效整合来自RGB（可见光）和T（热成像）模态的信息，从而提升语义分割的性能。RGB-T语义分割是计算机视觉领域的一个重要分支，它涉及到在具有多种传感器输入的数据上进行精确的物体分类和场景理解。在MEFNet的设计中，研究者们关注了两个关键组件：模态权重和通道注意力机制。模态权重用于赋予不同模态特征不同的重要性，确保融合过程中每个模态的有效贡献。通道注意力机制则帮助网络专注于最具区分度的特征通道，提高特征的表达能力和区分度。这种融合策略有助于减少模态间信息的冗余，增强模型对复杂环境的适应能力。通过实施这一融合策略，MEFNet能够高效地整合RGB和T数据，与当前最先进的方法相比，取得了显著的性能提升。具体来说，在IRSEG数据集上，MEFNet达到了72.8%的mean accuracy (mAcc)和62.6%的mean intersection over union (mIoU)的成绩，这显示了其在实际场景中的优秀表现，尤其是在处理夜间、低光照或遮挡条件下，热成像数据能提供重要的补充信息。 MEFNet的研究团队来自中国电子科技大学和南京邮电大学，他们分别来自光电科学与工程学院和通信与信息工程学院，这表明该工作既受益于深厚的学术背景，又结合了实际应用中的工程经验。整个研究过程不仅展示了理论创新，还强调了跨学科合作在推动技术进步中的作用。 MEFNet是一个具有前瞻性的框架，它通过引入多专家融合策略，成功地提高了RGB-T语义分割的精度和鲁棒性，为在实际环境中实现多模态视觉理解提供了新的可能。这一成果对于自动驾驶、无人机监控、安防等领域具有重要的实践价值。

To fuse the features, RTFNet(Sun et al., 2019b) and FuseSeg(Sun et al.,

2020) simply element-wise add two modal features for further processing.

MFNet(Ha et al., 2017) adopts concatenation to fuse RGB-Thermal fea-

tures. Element-wise summation or concatenation are not good strategies for

features fusion since they fuse the cross-modal features uniformly. There are

two kinds of cross-modal information: complementary information and con-

ﬂicting information. Complementary information has a positive impact on

segmentation, while conﬂict information acts like noises, producing a nega-

tive impact. Both complementary and conﬂict information are spatially dis-

tributed. Here, we use a compound attention mechanism consisting of modal

weight and channel attention to enhance complementary features and discard

noise features. The modal weight is a categorial distribution of modality and

sums to 1. Each modality acts as an expert branch of the multi-expert

network. SSMA(Valada et al., 2020) also uses a multi-expert network to

generate weights of diﬀerent modalities, but it generates a weight map for

each feature channel, which causes unnecessary calculations and increases

the complexity of the network.

As shown in Table 1, unlike the uniform fusion of RTFNet, FuseSeg, and

MFNet, our modal weight and channel attention can explicitly model the

validities of diﬀerent modalities and channel validities. Unlike GMNet(Zhou

et al., 2021) and SSMA(Valada et al., 2020), which redundantly calculate the

weight map for each channel, we decompose the weight matrix into more es-

sential modal weight matrix and channel attention vector, allowing the model

to focus more on the essential factors that cause diﬀerences in information

validities. Reduce the amount of computation required for fusion.

3. Method

3.1. Architecture

An overview of the proposed MEFNet is shown in Figures 1 and 2. The

proposed MEFNet use two independent encoders, which are multi-scale con-

volutional attention networks(MSCAN)(Guo et al., 2022), to extract RGB

and thermal features, considering that RGB and thermal images have diﬀer-

ent features. The information contained in the features obtained at diﬀerent

stages of the encoders is diﬀerent. With the increase of the encoding stage,

the features that can be extracted are more abstract and advanced. Low-level

features generally include low-level information such as the shape and outline

of the object, and high-level features are directly related to the categories of

剩余42页未读，继续阅读

byte_hit

粉丝: 18
资源: 6

RGB-T语义分割新突破：MEFNet融合专家策略提升性能

RGB-D语义分割：深度信息的选择使用

MEFNet模型在处理RGB-T图像时是如何平衡不同模态的权重，并应用通道注意力机制的？请结合《RGB-T语义分割新突破：MEFNet融合专家策略提升性能》一文给出详细解释。

深度感知CNN：提升RGB-D语义分割的新方法

通道注意力机制在RGB-D图像语义分割网络中的应用

室内RGB-D图像语义分割：双流加权Gabor融合提升性能

RGB-D语义分割发展的重要节点有哪些？

基于神经网络RGB-D图像分割

基于多尺度特征融合的RGB-D显著性检测.docx

基于多特征LightGBM的RGB-D场景分割方法研究.docx

RGB-D实例分割：双金字塔特征融合网络方法

最新资源