人群区域实时可重构嵌入式架构的人数统计

64 浏览量更新于2023-12-01 收藏 26.74MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

0勃艮第弗朗什-孔泰大学博士学位论文0在勃艮第大学准备0第37号博士学位授予单位0工程科学与微技术0图像仪器与计算机学博士学位0由0GONG SONGCHENCHEN0人群区域实时可重构嵌入式架构的人数统计0博士学位论文于2020年11月13日在第戎提交并答辩0评审委员会成员：0BOURENNANE EL-BAY勃艮第弗朗什-孔泰大学教授米兰德科学学院，法国0论文导师0YANG FAN勃艮第弗朗什-孔泰大学教授米兰德科学学院，法国0考官0KECHADI M-TAHAR都柏林大学学院教授计算机学院，爱尔兰0评阅人0RABAH HASSAN洛林大学教授 MAE小组负责人，JeanLamour-UMR研究所，法国0评阅人0N ◦ 0 0 1école doctorale sciences pour l’ingénieur et microtechniquesredundant information, thereby realizing effective compressionof information, saving information storage space.Then, interms of crowd counting, we use multiple sources of information,namely HOG, LBP and CANNY based ﬁltering. These sourcesprovide separate estimates of the number of counts and otherstatistical measures, through the support vector Machine SVM,classiﬁcation. At the same time, in order to effectively solve theproblem of extracting scale-related features in crowd counting.We propose a new framework M-MCNN based on MCNN forcrowd counting on any single image. M-MCNN not only containsthe original three columns of convolutional neural networks withdifferent ﬁlter sizes, but replaces the fully connected layers with aconvolutional layer of 1*1 ﬁlters, so the input image of the modelcan be of any size. Moreover, in a single individual sample, wegreatly improve the learning of sample features by extracting thetexture features of a single human head , and better use it fordatasets.Finally, we implement our new framework M-MCNNthrough FPGA, and transplant it on the drone to estimate andpredict the high-density crowd area in real time.Our modelachieved good results in crowd counting.Titre : Implementation of real time reconﬁgurable embedded architecture for people counting in a crowdareaMots-clés : Caractéristiques de texture, Détection de contours, M-MCNN, FPGARésumé :Le comptage des foules est un sujet de recherche important.De nos jours, la population est de plus en plus préoccupée parles problèmes de sécurité.Lorsque la densité de populationatteint des pics élevés, les systèmes de comptage se mettenten route et analyse les foules, aﬁn de réorienter le surplusde personnes lorsque le seuil normal est dépassé.Avec cegenre de système, le piétinement du nouvel an de Shanghai nese reproduirait plus.Actuellement, le comptage de populationrencontre deux problèmes majeurs : l’analyse des foules dansles zones à forte densité de population, ou comment fairepour que le modèle distingue le plus ﬁnement possible lescaractéristiques d’une tête humaine d’une part; et commenttrouver une caractéristique de tête dans une image avec unelarge gamme de densité de population d’autre part.L’aspectle plus critique pour cette analyse est l’impossibilité d’installerun système de vidéosurveillance intelligent dans certains lieuxpublics. Dans ces conditions, comment pourrions-nous estimerla densité de population dans ces zones aﬁn d’éviter defuturs accidents ?Face à ces déﬁs, nous proposons lamise en œuvre d’une architecture embarquée reconﬁgurableen temps réel pour le comptage des personnes dans leszones de regroupement. Premièrement, notre travail intègre lesfonctionnalités de HOG et LBP, qui non seulement combinent lesinformations d’identiﬁcations de multiples caractéristiques, maiségalement la plupart des informations redondantes, réalisant ainsiune compression efﬁcace des informations, économisant ainside l’espace mémoire pour le stockage des données.Pour lecomptage de personnes dans une foule, nous utilisons plusieurssources d’informations, à savoir HOG, LBP et le ﬁltrage deCANNY. Ces sources fournissent des estimations distinctes dunombre de personnes comptées et d’autres mesures statistiquesde classiﬁcation, par le biais du vecteur de support machineSVM. Dans le même temps, aﬁn de résoudre efﬁcacement leproblème d’extraction des fonctionnalités liées à l’échelle dans lecomptage de foules, nous proposons un nouveau environnementM-MCNN basé sur MCNN utilisé pour le comptage de foules surune seule image.M-MCNN contient non seulement les troiscolonnes originales des réseaux de neurones convolutionnelsavec différentes tailles de ﬁltres, mais aussi remplace les couchesentièrement connectées par une couche convolutionnelle deﬁltre 1*1,de sorte que l’image d’entrée du modèle peutêtre de n’importe quelle taille.De plus, pour un échantillonindividuel, nous améliorons considérablement l’apprentissage descaractéristiques de l’échantillon en extrayant les caractéristiquesde texture d’une seule tête humaine et mieux l’utiliser pour les jeuxde données. Enﬁn, nous implémentons notre nouveau frameworkM-MCNN sur un FPGA et l’installons sur un drone pour estimeret prévoir la zone de foule à haute densité en temps réel. Notremodèle a obtenu de bons résultats en comptage de personnesdans une foule.0勃艮第弗朗什-孔泰大学观察大道32号25000贝桑松，法国0标题：人群区域实时可重构嵌入式架构的人数统计关键词：纹理特征，边缘检测，M-MCNN，FPGA0人群计数任务是一个重要的研究问题。现在越来越多的人关注安全问题。考虑到拥挤场景的情况：人口密度系统分析人群，并在其人口密度超过正常范围时触发警报以转移人群。有了这样的系统，上海新年踩踏事件将不会再次发生。目前人口计数最困难的问题：一方面，在人口密集的区域，如何使模型更细致地区分人头特征，例如头部重叠。第二个方面是在具有广泛人口密度的图像中找到小尺度的局部头部特征。最关键的方面，在一些公共场所，我们无法安装智能视频监控系统。那么我们如何估计高密度人群区域以避免人群踩踏事故？面对这些挑战，我们提出了在人群区域进行实时可重构嵌入式架构的人数统计的实现。首先，我们的工作整合了HOG和LBP的特征，既结合了多个特征的有效识别信息，又消除了大部分冗余信息，从而实现了信息的有效压缩，节省了信息存储空间。然后，在人群计数方面，我们使用多个信息源，即基于HOG、LBP和CANNY的滤波。这些信息源通过支持向量机SVM进行分类，提供了人数估计和其他统计量的独立估计。同时，为了有效解决人群计数中与尺度相关的特征提取问题，我们提出了一种基于MCNN的新框架M-MCNN，用于在任意单个图像上进行人群计数。M-MCNN不仅包含了原始的三列卷积神经网络，其滤波器尺寸不同，而且用1*1的卷积层替换了全连接层，因此模型的输入图像可以是任意尺寸。此外，在单个个体样本中，我们通过提取单个人头的纹理特征，极大地提高了样本特征的学习能力，并更好地应用于数据集。最后，我们通过FPGA实现了我们的新框架M-MCNN，并将其移植到无人机上，实时估计和预测高密度人群区域。我们的模型在人群计数方面取得了良好的结果。ACKNOWLEDGEMENTSI devote my success in Computer ﬁeld (Research results) to my professor El-Bay Bouren-nane, who enlightened me, inspired me and encouraged me in the past 4 years.I am delighted to have the opportunity to study at the prestigious University of Burgundy.Particularly, I want to express my sincere gratitude to The University of Burgundy ImVialab for funding my research. Thanks to ImVia Lab for providing me a good working envi-ronment and platform. At the same time, I also thank China Lian Chuang Company forits ﬁnancial support. Without the ImVia lab, I wouldn’t have the opportunity to accomplishmy research dream and to make acquaintance with so many talented people around theworld. In the future, I am looking forward to have the honor to bring more glory to theImVia lab.Furthermore, I would love to give the credit to my team, my colleagues, and my family forsupporting me all the time.GONG SONGCHENCHENSeptember 2020vCONTENTSIIntroduction11BACKGROUND32PROBLEM DEFINITION53PROPOSED APPROACHES94PUBLICATIONS115DISSERTATION OUTLINE13IIContribution A156Multi-feature fusion technology for target edge detection and analysis176.1Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .176.2Pedestrian detection and Crowd counting . . . . . . . . . . . . . . . . . . .176.2.1Pedestrian detection . . . . . . . . . . . . . . . . . . . . . . . . . . .176.2.2Crowd counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .196.3Fusion of texture feature and edge detection for people counting. . . . . .236.3.1Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .236.3.1.1Edge detection and extraction process. . . . . . . . . . .246.3.1.2Some edge detection and extraction operators . . . . . . .246.3.2Improvement of Canny . . . . . . . . . . . . . . . . . . . . . . . . . .286.3.2.1Improved method of canny operator . . . . . . . . . . . . .286.3.2.2Adaptive Canny and Median ﬁltering.. . . . . . . . . . . .316.3.3Approaches based on texture features . . . . . . . . . . . . . . . . .346.3.3.1HOG feature based head . . . . . . . . . . . . . . . . . . .356.3.3.2LBP feature. . . . . . . . . . . . . . . . . . . . . . . . . .386.3.3.3My ﬁrst contribution: combining edge detection and fea-tures extraction for image analysis. . . . . . . . . . . . .416.4Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46viiviiiCONTENTS6.5Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50IIIContribution B537Multi-feature counting of dense crowd image based on multi-column convo-lutional neural network557.1Deﬁnition of deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . .557.2Convolutional neural network . . . . . . . . . . . . . . . . . . . . . . . . . .577.2.1Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .587.2.2Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .587.2.2.1Combination of convolutional layer and pooling layer . . . .597.2.2.2Dimensional change process . . . . . . . . . . . . . . . . .597.2.3Activation function RelU (Rectiﬁed Linear Units). . . . . . . . . . .617.2.4Fully connected layer. . . . . . . . . . . . . . . . . . . . . . . . . .617.2.5Local receptive ﬁeld . . . . . . . . . . . . . . . . . . . . . . . . . . .627.2.6Multi-convolution kernel . . . . . . . . . . . . . . . . . . . . . . . . .627.2.7Multiple convolutional layers . . . . . . . . . . . . . . . . . . . . . . .637.3Convolutional neural network applied to computer vision . . . . . . . . . . .637.3.1Crowd counting based on convolutional neural network. . . . . . .637.3.1.1MCNN method . . . . . . . . . . . . . . . . . . . . . . . . .637.3.1.2CNNs method . . . . . . . . . . . . . . . . . . . . . . . . .647.3.1.3Switch-CNN network architecture. . . . . . . . . . . . . .667.3.1.4CSRNet architecture. . . . . . . . . . . . . . . . . . . . .687.3.1.5SFCN network structure. . . . . . . . . . . . . . . . . . .697.3.2Multi-feature counting of dense crowd image based on multi-columnconvolutional neural network. . . . . . . . . . . . . . . . . . . . . .707.3.2.1Texture feature and target edge detection . . . . . . . . . .717.3.2.2Strong features and density maps . . . . . . . . . . . . . .727.3.3M-MCNN network model construction . . . . . . . . . . . . . . . . .727.3.3.1Crowd count based on density map . . . . . . . . . . . . .727.3.3.2Density map via geometry adaptive kernels . . . . . . . . .737.3.3.3Optimization of M-MCNN architecture . . . . . . . . . . . .777.4Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .817.4.1Shanghaitech dataset . . . . . . . . . . . . . . . . . . . . . . . . . .827.4.2UCSD dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84CONTENTSix7.4.3WorldExpo’10 dataset . . . . . . . . . . . . . . . . . . . . . . . . . .847.4.4GCC dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .857.4.5CHDP dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .867.5Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89IVContribution C918Implementation of real time reconﬁgurable embedded architecture for peo-ple counting in a crowd area938.1FPGAs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .938.1.1FPGA Reconﬁguration . . . . . . . . . . . . . . . . . . . . . . . . . .958.1.1.1Classiﬁcation of reconﬁguration systems. . . . . . . . . .958.1.1.2Structure of the reconﬁguration system . . . . . . . . . . .968.1.1.3FPGA reconﬁguration technology. . . . . . . . . . . . . .968.1.1.4Application of reconﬁguration technology . . . . . . . . . .988.1.2Vivado: design tool for Xilinx FPGA. . . . . . . . . . . . . . . . . .998.1.2.1Vivado Degisn Flow . . . . . . . . . . . . . . . . . . . . . .998.1.2.2Vivado envirement . . . . . . . . . . . . . . . . . . . . . . . 1028.2MULTI-COLUMN MULTI-FEATURE CONVOLUTIONAL NEURAL NET-WORK CROWD COUNTING ARCHITECTURE IMPLEMENTED IN REAL-TIME RECONFIGURABLE EMBEDDED SYSTEM . . . . . . . . . . . . . . 1028.2.1FPGA-based crowd detection and estimation . . . . . . . . . . . . . 1028.2.2Multiple hardware implementations of deep learning algorithms . . . 1048.2.2.1CPU hardware . . . . . . . . . . . . . . . . . . . . . . . . . 1048.2.2.2GPU hardware . . . . . . . . . . . . . . . . . . . . . . . . . 1048.2.2.3Application Speciﬁc Integrated Circuit Chip (ASIC) . . . . . 1048.2.2.4FPGA hardware . . . . . . . . . . . . . . . . . . . . . . . . 1058.2.2.5The advantages and disadvantages of FPGA implemen-tation of deep learning. . . . . . . . . . . . . . . . . . . . 1058.2.3FPGA Development Board. . . . . . . . . . . . . . . . . . . . . . . 1068.2.3.1Introduction to Zynq UltraScale+ ZCU102. . . . . . . . . 1068.2.3.2Vivado HLS. . . . . . . . . . . . . . . . . . . . . . . . . . 1088.2.3.3Framework spooNN . . . . . . . . . . . . . . . . . . . . . . 1088.2.4Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088.2.4.1Experiment design . . . . . . . . . . . . . . . . . . . . . . . 1088.2.4.2Experiemnts results . . . . . . . . . . . . . . . . . . . . . . 111xCONTENTS8.2.5Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112VConclusion, Perspectives1139Conclusion, Perspectives1159.1Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1159.2Perspectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116IINTRODUCTION11BACKGROUNDChina is a populous country. According to the population announced by the Bureau ofStatistics in 2019, the population of China is 1.397 billion. This is a huge number. Hence,the problem that people care most about now is safety. Then, how can a security pre-caution be made in a populous country? In the 21st century, people’s lives have becomemore and more colorful: large-scale concerts, music and dance festivals, World Cup, NewYear’s ﬁreworks and more. These entertainment activities bring us joy and make life moreexciting. On the contrary, do we know exactly the potential safety hazards that these en-tertaining activities bring to people? How do we solve it? For example: December 31,2014, the New Year’s Eve in Shanghai, was a day that impresses everyone. As it wasthe New Year’s Eve, many tourists and citizens gathered on the Bund to celebrate theupcoming new year. During the celebration, one person fell to the bottom of the walk-way. Following this ﬁrst incident, many others tripped and fell onto each other one afteranother, eventually leading to large-scale crowding and stamping. If an alert could beissued quickly, the crowd could be dispersed earlier which would prevent the ShanghaiNew Year stampede from happening. Therefore, the research on crowd counting is be-coming increasingly hot. If you can accurately estimate the crowd density of the currentscene, issue an alert, and arrange corresponding security measures, you can effectivelyreduce or avoid such incidents.Video surveillance is an important part of security protection, and the number of peopleand crowd density are an important factor of concern for video surveillance. In order toclearly introduce the development history of people counting and crowd density estima-tion technology, we elaborate from the development of monitoring equipment.Electronic surveillance systems began to appear in the 1970s, and the development ofvideo surveillance technology can be divided into three stages. The ﬁr

下载后可阅读完整内容，剩余1页未读，立即下载