深海探测新突破：EfficientDet 提升水下物体检测性能

需积分: 0 12 浏览量更新于2024-08-03 收藏 2.16MB PDF 举报

"本文介绍了一个名为 DeepSeaNet 的项目，该项目致力于解决水下物体检测的难题，特别是针对含颗粒和杂质的盐水环境。研究人员在 Brackish 数据集中使用了 EfficientDet、YOLOv5、YOLOv8 和 Detectron2 等不同模型进行比较。Brackish 数据集包含了在能见度较低的 Limfjorden 水域拍摄的鱼类、螃蟹、海星等水生生物的注释图像序列。通过对比，发现 EfficientDet 在 mAP (平均精度均值) 上达到了 98.56%，优于其他模型，如 YOLOv3 (31.10% mAP)、YOLOv4 (83.72% mAP)、YOLOv5 (97.6% mAP) 和 Detectron2 (95.20% mAP)。此外，文章还提出了一个改进的 BiSkFPN 机制，即带有跳跃连接的 BiFPN 颈部，以进一步提升模型性能。" 在水下物体检测领域，由于环境条件的复杂性，传统的卷积神经网络 (CNN) 方法可能会遇到困难，且计算成本高。 EfficientDet 是一种高效的检测框架，它结合了 EfficientNet 的高效架构和 BiFPN 的特征金字塔网络，能够处理不同尺度的目标，从而在 Brackish 数据集上表现出色。YOLO (You Only Look Once) 系列是另一种广泛使用的实时目标检测算法，YOLOv5 和 YOLOv8 通过不断优化提升了检测性能，但仍然不及 EfficientDet。 YOLO 系列的最新版本，如 YOLOv8，引入了更多的优化，例如改进的锚框机制和更快的训练速度，使得它在检测精度上有显著提高。然而，尽管 YOLOv8 的 mAP 达到 98.20%，EfficientDet 仍以 98.56% 的 mAP 超过它，这表明在复杂环境下的检测能力上，EfficientDet 更具优势。 Detectron2 是 Facebook AI 研究院开源的一个通用目标检测库，它基于 Caffe2 构建，支持多种先进的检测算法。尽管 Detectron2 在 Brackish 数据集上表现良好，但其 mAP 仍低于 EfficientDet，表明在特定环境下，EfficientDet 的设计可能更适合处理水下视觉任务。 BiFPN（双向特征金字塔网络）是一种用于特征融合的结构，可以有效地利用多尺度信息。在本文中提出的 BiSkFPN 机制，通过添加跳跃连接，增强了特征的上下文信息传递，这可能是提升模型在水下环境中的检测性能的关键因素。 DeepSeaNet 项目通过对比不同模型在水下物体检测任务上的性能，强调了 EfficientDet 的优势以及 BiSkFPN 对于改善水下检测效果的重要性。这些研究结果为水下环境的自动化监测和保护提供了新的技术可能性，对于未来在海洋生物学、环境保护和水下机器人等领域有着广泛的应用前景。

recent works that has used EfficientDet algorithm along with

other OSOD. Secondly, I study FPN, PANet and BiFPN

bottlenecks used in OSOD algorithms. Finally, I study several

concepts form the related research that will be used in later

parts of the paper.

3.1-mynet: Improved EfficientDet using Attention

Mechanism (AM) – Multiclass Focal Loss (MFL):

A new method that uses AM to dampen the effect of noise

(caused by pollution, clouds, and climate) in remote sensing

images. This work [16] also modifies pooling in every layer

such that it can capture tiny class specific pixels and hence

uses exhaustive feature space. This approach increases

computational complexity but helps to achieve higher

accuracy. It is because of the residual deformable 3-D

convolution (RD3C) which extends the traditional 2-D

convolution operation to better capture object deformations

and variations in 3-D data (for e.g., space imagery or remote

sensing). Two basic operations that are used in the work are

3D-Convolutional operation and Geo-Spatial Deformable-3D

Convolutional Operation which is further explained in the

following equations. The standard 3D convolution operation

can be represented as:







  





















where  is the input volume,  is the output volume,  is the

convolution kernel, and , , and  are the depth, height,

and width of the kernel, respectively. In RD3C, the 3D

deformable convolution operation is introduced before the

standard 3D convolution, which can be represented as:







  













































where  is the intermediate feature map obtained by the

deformable convolution operation, , , and  are the

depth, height, and width of the offset kernel, respectively,

and  is the learnable deformation offset applied to the

kernel. The deformation offset is learned from the input

features using a separate convolutional operation, which can

be represented as:





























where 



is the feature extraction function applied to the

input features, 



is the set of learnable weights associated

with the 



feature channel, and  is the number of feature

channels. RD3C allows the convolution kernel to be

adaptively adjusted to the input features, which can better

capture the variations and deformations in 3D data, making

it well-suited for object detection tasks in high-resolution

remote sensing images of oil storage tanks.

3.2-Comparing YOLOv5 and EfficientDet

Mekhalfi et al., [17] initially perform a contrastive study and

provides enough evidence that proves, even though

EfficientDet results higher mAP but YOLOv5 can detect more

examples and has better generalization capabilities. They

reproduce results on EfficientDet and list out intuitions

behind using BiFPN over FPN as follows:

1. Including nodes with one input edge will have a

smaller contribution in feature fusion. (Yellow nodes

in Figure 2)

2. Extra edge ties the input node to the output node.

(Green and blue edges from input to output nodes)

3. Each bidirectional path is considered as one feature

layer, repeated several times to enable high-level

feature fusion. (Up down arrows in Figure 2)

Figure 2 BiFPN Feature-Fusion (Bottleneck of original EfficientDet)

3.3-Automated Defect Detection: Modifying Backbone

Even though Medak et al., in [18] agree that object detection

algorithms require large amount of data to provide human-

level accuracy, they prove EfficientDet to be able to perform

SOTA results on realistic performance in Ultrasonic and

Forensics defect detection. They introduce a novel anchors

(sliding window) size finding mechanism for OSOD, a kind of

hyperparameter search. Anchors are predefined rectangles

used by one-stage detectors to predict object locations and

sizes. In this case, the hyperparameters are calculated using

a novel procedure that considers the aspect ratio of the

defects in UT images. This improves the detection of defects

with extreme aspect ratios and increases the model's

average precision. The complete novelty of this approach

can be explained with the following Algorithm 1. It involves

K-means clustering with Jaccard distance to calculate new

values for aspect ratios and scales, and finding the template

anchor size that is most like the calculated shape to

determine the scale factor. The final values greatly differ

from commonly used default values and were found to

improve the performance of the EfficientDet model in

detecting defects.

3.4-Multilayer 3D Attention Mechanism

The combination of feature fusion with multilayer attention

helps to extract features from low-level visibility keeping

feature channel intact for multi-scale inputs. This research

work [19] proposed a method for classifying military ships

from high-resolution optical remote sensing images using a

multilayer feature extraction network inspired by

EfficientDet trackers. In the proposed method, a multilevel

attention mechanism was used to effectively extract

multilayer features, and a deep feature fusion network was

constructed to locate and distinguish different types of ships.

In contrast, our approach for marine animal and species

detection uses a modified EfficientDet network with skip

connections to improve accuracy, rather than using the

proposed method. Residual connections are a type of skip

connection used in deep neural networks, but they have

some limitations compared to standard skip connections.

BiFPN Layer

剩余12页未读，继续阅读

陈书予

粉丝: 2w+
资源: 20

深海探测新突破：EfficientDet 提升水下物体检测性能

水下目标检测：EfficientDet训练数据集PyTorch实践

高效训练水下目标检测数据集-EfficientDet_pytorch实现

水下光学图像物体检测算法实现与项目应用指南

art_art_EfficientDet训练水下目标检测数据集artart_EfficientDet_pytorch.zip

-Art-Art-EfficientDet训练水下目标检测数据集-Art-Art-Art EfficientDet_

_art___art__EfficientDet训练水下目标检测数据集_art__art__EfficientDet_pyto

水下物体检测算法赛（光学图像赛道）方案源码+项目说明.zip

水下宽带通信：设计和实现用于水下通信的声学系统

使用高斯混合模型和EM进行水下浮标检测：使用高斯混合模型和期望最大化算法检测水下浮标

Underwater-Image-Enhancements:为本科论文实施水下增强技术

最新资源

_art_artEfficientDet训练水下目标检测数据集_artartEfficientDet_pyto