level-3level-3
level-2level-2
level-1level-1
level-3level-3
level-2level-2
level-1level-1
fusefuse
fusefuse
fusefuse
fusefuse
(a) traditional neck structure (b) traditional neck (c) our proposed neck
Figure 3: (a) is example diagram of traditional neck information fusion structure. (b) and (c) is
AblationCAM [38] visualization
state-of-the-art performance with single-level features. SFNet [
33
] aligns different level features with
semantic flow to improves FPN performance in model. SAFNet [
29
] introduced Adaptive Feature
Fusion and Self-Enhanced Modules. [
4
] presented a parallel FPN structure for object detection
with bi-directional fusion.However, due to the excessive number of paths and indirect interaction
methods in the network, the previous FPN-based fusion structures still have drawbacks in low speed,
cross-level information exchange and information loss.
However, due to the excessive number of paths and indirect interaction methods in the network, the
previous FPN-based fusion structures still have drawbacks in low speed, cross-level information
exchange and information loss.
3 Method
3.1 Preliminaries
The YOLO series neck structure, as depicted in Fig.3, employs a traditional FPN structure, which
comprises multiple branches for multi-scale feature fusion. However, it only fully fuse features from
neighboring levels, for other layers information it can only be obtained indirectly ‘recursively’. In
Fig.3, it shows the information fusion structure of the conventional FPN: where existing level-1, 2,
and 3 are arranged from top to bottom. FPN is used for fusion between different levels. There are
two distinct scenarios when level-1 get information from the other two levels:
1)
If level-1 seeks to utilize information from level-2, it can directly access and fuse this information.
2)
If level-1 wants to use level-3 information, level-1 should recursively calling the information
fusion module of the adjacent layer. Specifically, the level-2 and level-3 information must be fused
first, then level-1 can indirectly obtain level-3 information by combining level-2 information.
This transfer mode can result in a significant loss of information during calculation. Information
interactions between layers can only exchange information that is selected by intermediate layers, and
not selected information is discarded during transmission. This leads to a situation where information
at a certain level can only adequately assist neighboring layers and weaken the assistance provided to
other global layers. As a result, the overall effectiveness of the information fusion may be limited.
To avoid information loss in the transmission process of traditional FPN structures, we abandon the
original recursive approach and construct a novel gather-and-distribute mechanism (GD). By using
a unified module to gather and fuse information from all levels and subsequently distribute it to
different levels, we not only avoid the loss of information inherent in the traditional FPN structure but
also enhance the neck’s partial information fusion capabilities without significantly increasing latency.
Our approach thus allows for more effective leveraging of the features extracted by the backbone,
and can be easily integrated into any existing backbone-neck-head structure.
4