3
recent works that has used EfficientDet algorithm along with
other OSOD. Secondly, I study FPN, PANet and BiFPN
bottlenecks used in OSOD algorithms. Finally, I study several
concepts form the related research that will be used in later
parts of the paper.
3.1-mynet: Improved EfficientDet using Attention
Mechanism (AM) – Multiclass Focal Loss (MFL):
A new method that uses AM to dampen the effect of noise
(caused by pollution, clouds, and climate) in remote sensing
images. This work [16] also modifies pooling in every layer
such that it can capture tiny class specific pixels and hence
uses exhaustive feature space. This approach increases
computational complexity but helps to achieve higher
accuracy. It is because of the residual deformable 3-D
convolution (RD3C) which extends the traditional 2-D
convolution operation to better capture object deformations
and variations in 3-D data (for e.g., space imagery or remote
sensing). Two basic operations that are used in the work are
3D-Convolutional operation and Geo-Spatial Deformable-3D
Convolutional Operation which is further explained in the
following equations. The standard 3D convolution operation
can be represented as:
where is the input volume, is the output volume, is the
convolution kernel, and , , and are the depth, height,
and width of the kernel, respectively. In RD3C, the 3D
deformable convolution operation is introduced before the
standard 3D convolution, which can be represented as:
where is the intermediate feature map obtained by the
deformable convolution operation, , , and are the
depth, height, and width of the offset kernel, respectively,
and is the learnable deformation offset applied to the
kernel. The deformation offset is learned from the input
features using a separate convolutional operation, which can
be represented as:
where
is the feature extraction function applied to the
input features,
is the set of learnable weights associated
with the
feature channel, and is the number of feature
channels. RD3C allows the convolution kernel to be
adaptively adjusted to the input features, which can better
capture the variations and deformations in 3D data, making
it well-suited for object detection tasks in high-resolution
remote sensing images of oil storage tanks.
3.2-Comparing YOLOv5 and EfficientDet
Mekhalfi et al., [17] initially perform a contrastive study and
provides enough evidence that proves, even though
EfficientDet results higher mAP but YOLOv5 can detect more
examples and has better generalization capabilities. They
reproduce results on EfficientDet and list out intuitions
behind using BiFPN over FPN as follows:
1. Including nodes with one input edge will have a
smaller contribution in feature fusion. (Yellow nodes
in Figure 2)
2. Extra edge ties the input node to the output node.
(Green and blue edges from input to output nodes)
3. Each bidirectional path is considered as one feature
layer, repeated several times to enable high-level
feature fusion. (Up down arrows in Figure 2)
Figure 2 BiFPN Feature-Fusion (Bottleneck of original EfficientDet)
3.3-Automated Defect Detection: Modifying Backbone
Even though Medak et al., in [18] agree that object detection
algorithms require large amount of data to provide human-
level accuracy, they prove EfficientDet to be able to perform
SOTA results on realistic performance in Ultrasonic and
Forensics defect detection. They introduce a novel anchors
(sliding window) size finding mechanism for OSOD, a kind of
hyperparameter search. Anchors are predefined rectangles
used by one-stage detectors to predict object locations and
sizes. In this case, the hyperparameters are calculated using
a novel procedure that considers the aspect ratio of the
defects in UT images. This improves the detection of defects
with extreme aspect ratios and increases the model's
average precision. The complete novelty of this approach
can be explained with the following Algorithm 1. It involves
K-means clustering with Jaccard distance to calculate new
values for aspect ratios and scales, and finding the template
anchor size that is most like the calculated shape to
determine the scale factor. The final values greatly differ
from commonly used default values and were found to
improve the performance of the EfficientDet model in
detecting defects.
3.4-Multilayer 3D Attention Mechanism
The combination of feature fusion with multilayer attention
helps to extract features from low-level visibility keeping
feature channel intact for multi-scale inputs. This research
work [19] proposed a method for classifying military ships
from high-resolution optical remote sensing images using a
multilayer feature extraction network inspired by
EfficientDet trackers. In the proposed method, a multilevel
attention mechanism was used to effectively extract
multilayer features, and a deep feature fusion network was
constructed to locate and distinguish different types of ships.
In contrast, our approach for marine animal and species
detection uses a modified EfficientDet network with skip
connections to improve accuracy, rather than using the
proposed method. Residual connections are a type of skip
connection used in deep neural networks, but they have
some limitations compared to standard skip connections.