实时嵌入式设备行人检测：Yolo-v3在Jetson TX2上的应用

版权申诉

170 浏览量更新于2024-08-13 收藏 1.5MB PDF 举报

"yolov-3部署jetson.pdf" 这篇文档是关于在嵌入式设备上实现鲁棒实时行人检测的论文，特别是针对Nvidia Jetson TX2平台进行了优化的Yolo-v3对象检测框架。Yolo（You Only Look Once）是一种流行的深度学习算法，专门用于目标检测，而Yolo-v3是其第三代版本，它在速度和精度之间取得了较好的平衡，非常适合资源有限的嵌入式系统。在嵌入式设备上进行行人检测具有广泛的应用场景，例如监控交叉路口、安全监控、人群监测和监视等。然而，这个任务面临着诸多挑战，如摄像机视角的持续变化、行人外观的多样化，以及对轻量级算法的需求，因为这些算法可以在低功耗的硬件上高效运行。该论文提出的框架首先在不同的图像区域执行精细和粗略的检测，利用时间（帧间）和空间（帧内）特征来提高检测精度和实时性能。这种方法旨在通过结合多帧信息来弥补单帧检测的不足，从而提高对动态环境的适应性。作为核心检测器，Yolo-v3的优势在于其端到端的检测能力，可以同时预测多个物体的类别和位置，而且速度快。 Nvidia Jetson TX2是一款强大的嵌入式计算平台，拥有高性能的GPU，适合运行复杂的深度学习模型。尽管如此，为了在这样的平台上实现实时性能，通常需要对模型进行优化，例如模型剪枝、量化和蒸馏等技术，以减少计算量和内存需求。论文展示了该框架在两个公认的数据集上的性能，这通常包括准确性和速度的权衡。数据集的选择可能包括像COCO（Common Objects in Context）或PASCAL VOC这样的标准目标检测数据集，它们包含了大量的行人实例，涵盖了各种环境和条件。该研究为嵌入式设备上的行人检测提供了一个实用且高效的解决方案，特别适用于机器人和无人机等应用场景。通过采用Yolo-v3并在Jetson TX2上进行优化，该框架能够在保持高检测精度的同时，实现低延迟的实时检测，这对于实时响应至关重要。此工作对于进一步推动嵌入式系统的智能视觉应用具有重要意义。

Robust Real-Time Pedestrian Detection on Embedded Devices

Mohamed Afifi

∗

, Yara Ali

, Karim Amer, Mahmoud Shaker, and Mohamed Elhelw

Center for Informatics Science, Nile University, Giza, Egypt

ABSTRACT

Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications

including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However,

the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances

as well as the need for lightweight algorithms suitable for embedded systems. This paper proposes a robust

framework for pedestrian detection in many footages. The framework performs fine and coarse detections on

different image regions and exploits temporal and spatial characteristics to attain enhanced accuracy and real time

performance on embedded boards. The framework uses the Yolo-v3 object detection [1] as its backbone

detector and runs on the Nvidia Jetson TX2 embedded board, however other detectors and/or boards can be

used as well. The performance of the framework is demonstrated on two established datasets and its achievement

of the second place in CVPR 2019 Embedded Real-Time Inference (ERTI) Challenge

†

Keywords: Pedestrian detection, UAV, real time inference.

1. INTRODUCTION

Various deep learning architectures have been proposed since Krizhevsky et. al. [2] trained a neural network model

of multiple convolutional and feedforward layers on large dataset of images for object classification, numerous

deep learning architectures have been proposed. One family of these architectures is designed for object detection

which entails predicting bounding boxes that enclose objects of interest in a certain image. The state of the art

approaches for this task can be roughly divided into two categories. The first include two-stage models such as

R-CNN [3], Fast R-CNN [4], Faster R-CNN [5] and SPP-net [6]. These models propose search regions then process and

classify those regions. The second category comprises single-stage models such as Yolo [7] and SSD [8]. Two-stage

object detection models achieve better accuracy but with slow inference due to demanding computations. On

the other hand, single-stage models are faster with lower accuracy compared to two-stage models.

In order to deploy the above models onboard of embedded devices, two important aspects must be taken into

consideration. First, typical embedded devices have limited computational power. Second, a sequence of images

(i.e. video) must be processed. Recent work aimed to address these constraints by creating light-weight versions

of original models such as Tiny-Yolo1 and SSD300.8 Other approaches such as MobileNet [9] and ShuffleNet [10]

optimize the base of a pre-trained network to have higher FPS. Lu et. al. [11] incorporated a Long Short-Term

Memory (LSTM) model to make use of the spatio-temporal relation among consecutive frames in a video while

Broad et. al. [12] added a convolutional recurrent layer to the SSD architecture to fuse temporal information.

This paper proposes a novel framework for robust real-time pedestrian detection in videos captured above

street level such as those from pole-mounted security cameras. The framework uses the Yolo-v3 as its backbone

detector but will work with other detectors with similar features. It exploits temporal information in videos while

performing real-time inference by combining deep learning models pre-trained on large scale dataset of single

images. Multiple input resolutions are used to perform robust pedestrian detection with a high throughput

making it suitable for real-time operation on the Nvidia Jetson TX2 and similar embedded boards. Figure 1 shows an

example where the proposed framework clearly achieves improved results compared to the Yolo-v3

detector.

Indicates equal contribution

Further author information: (Send correspondence to Karim Amer)

Karim Amer: E-mail: k.amer@nu.edu.eg

†

https://sites.google.com/site/uavision2019/

下载后可阅读完整内容，剩余6页未读，立即下载

普通网友

粉丝: 1264
资源:
5619

实时嵌入式设备行人检测：Yolo-v3在Jetson TX2上的应用

torchvision-0.8.1-jetson.zip

vscode_1.63.2-1639561157-jetson.zip

demo-usb-can示例jetson.rar

cuda9.0-repo-l4t-arm64-jetson.zip.003

cuda9.0-repo-l4t-arm64-jetson.zip.001

cuda9.0-repo-l4t-arm64-jetson.zip.002

YoloV3-ncnn-Jetson-Nano

Jetson-TX1-Accelerated-GStreamer-User-Guide.pdf

caffe-ssd-jetson-cuda10.tar.gz

PyPI 官网下载 | Jetson.GPIO-2.0.15.tar.gz

最新资源