奥地利道路安全：YOLO与DETR深度学习检测算法的实战评估

版权申诉

自动驾驶

深度学习

144 浏览量更新于2024-06-13 收藏 18.56MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

First qualitative observations on deep learning vision model YOLO and DETR for automated driving in AustriaA PREPRINT

Figure 1: The ﬁrst generation YOLO architecture [18].

(adding an improvement of about mAP of about 4%). YOLO was using arbitrary boundary boxes, with v2 bounding

boxes based on anchor box types were proposed with deﬁned offsets to these anchor boxes to maintain the generic

capabilities (improvement of the recall). Typically, objects like a standing person or a car have a deﬁned box ratio. In

contrast to YOLO, YOLOv2 now predicts for each bounding box the classes and not for each cell

. By k-means anchor

box dimension clusters, the data-driven selection (compared to hand selection) of anchor boxes was achieved based on

IoU

. Furthermore, the offsets of the anchor boxes were constrained with distant from the cell centroid. Furthermore,

capabilities for ﬁne-grained features and multi-scale training added to YOLOv2 performance. A detailed discussion can

be found in [

]. In YOLOv3 [

], multi-label classiﬁcation was used, since some classes are not mutual exclusive

(person, pedestrian, child, ...). In doing so, the soft max operation is avoided and the classiﬁcation loss is now based on

binary cross-entropy. It makes 3 predictions per location at different resolution levels. One prediction is carried out

at the last feature map layer, one that upsamples features from two layers back by two. And a third, by going back

another two and upsample it again by two. YOLOv3 gained signiﬁcant capabilities of detecting small objects [

Additional improvments on the usability and functionality were added with version 5 [

] (integrates the anchor-free

and objectness-free split head) and version 8 [

]. Version 8 can also be used for instance segmentation, skeleton

prediction of a human pose and classiﬁcation. To conclude, YOLO’s real-time capabilities and easy to handle model

architecture are crucial for rapid object detection in autonomous driving scenarios.

2.2 RT-DETR

DETRs have achieved remarkable performance in object detection tasks. Initially, the high computational cost limits their

practical usage. Especially, the post-processing with non-maximum suppression is beneﬁcial with the computational

cost, preventing original DETRs from being a new state-of-the-art (SOTA) for real-time object detection. The RT-DETR

was developed to solve the problem of high computational cost, above-mentioned [

]. In [

], it was shown how the

IoU

-threshold for admissible bounding boxes varies remaining prediction bounding boxes for YOLOv5 and YOLOv8.

Based on the number of remaining prediction bounding boxes, the non-maximum suppression takes a signiﬁcant

execution time (depending on the

IoU

-threshold hyperparameters) and motivates the use of DETRs, with an overview

of the architecture in Fig. 2. Firstly, the big picture of RT-DETR

is discussed. As described in [

], RT-DETR consists

of a backbone, a hybrid encoder and a transformer decoder with auxiliary prediction heads. The last three stages of the

backbone

{S3, S4, S5}

are fed as input into the encoder. The efﬁcient hybrid encoder processes multiscale features by

a process decoupling intra-scale feature interaction (AIFI) and cross-scale feature-fusion module (CCFM). The details

of the hybrid encoder (removing redundant operations of existing encoders) can be found in [

]. After the encoder, the

results are processed

IoU

-aware query selection. This is important to have the focus on the most relevant objects in the

scene by avoiding non-relevant parts and therefore enhancing the detection accuracy. The IoU-aware query selection

constraints the model to produce high classiﬁcation scores for features with high

IoU

scores and low classiﬁcation

scores for features with low

IoU

scores during training [

]. Finally, the decoder predicts outputs to generate boxes

and conﬁdence scores. This design reduces computational costs and allows for real-time object detection on accelerated

backends, outperforming other real-time object detectors (see Fig. 3.

Some graﬁcal explaination can be found here.

Implementation of RT-DETR is found on github.com/lyuwenyu/RT-DETR

剩余17页未读，继续阅读

人工智能_SYBH

粉丝: 4w+
资源: 220

会员权益专享

奥地利道路安全：YOLO与DETR深度学习检测算法的实战评估

适用于吴恩达深度学习课程的yolo.h5模型文件

yolo算法驾驶员疲劳检测模型+数据集

关于深度学习计算机视觉论文YOLO9000

深度学习物体检测yolo电子书

yolo与DETR的优缺点

钢筋计数模型yolo

深度学习与yolo的关系

python深度学习yolo

halcon深度学习yolo实例

深度学习图像识别YOLO5

手势识别模型yolo 网盘

yolo是深度学习的什么模型

自动驾驶 识别 算法 yolo

深度学习分类模型嵌入式

yolo和resnet的区别

移动机器人yolo模型

yolo算法和深度学习

车辆检测 深度学习模型

迁移学习yolo模型

会员权益专享

最新资源

自动驾驶识别算法 yolo

车辆检测深度学习模型