基于CNN图像分类技术的智能系统应用与准确度评估

137 浏览量更新于2023-12-10 收藏 1.77MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

Intelligent 智能Systems 系统同Applications 17（ 2023 年）200172咨询operational 操作精确度of基于 CNN image 图像班级 Finers using 使用年 OracleSubrogate安东尼奥扇区a，∗，迈克尔R . Lyu b，Roberto Pietrantuo a，Stefano俄罗斯aa那不勒斯腓特烈二世大学 University of Napoli Federico II Italyb The Chinese University of Hong Kong 香港中文大学香港KongA R T I Cl Ei n n n n F oA B S R R A C TKeywords：Image class 文件系统Learning 学习神经网络准确性Convolutional Neural NetworksAccuracy评估Oracle 的问题context：根据机器学习（ML）系统对任意（未绑定）输入的图像类定义的精确度进行评估。此分类上一篇：This is two to the Oracle 的问题因为它影响了自动判断的能力，所以当没有机器可读时，它会影响评估的准确性。系统 .Objective ：我们建议 The图像分类 Oracle Surrogate ICOS ：自动评估准确性的技术以operation of image 图像班级 Finers based 基础on常规型神经Networks（ CNNs ）。Method：To establish whether the class defication of an arbitrary image is correct or not，ICOS leverages threeknowledge sources：operational input data，training data，and the ML algorithm（ICOS知识源：操作输入数据，训练数据和机器学习算法）。知识是通过表达的5.不变量—properties which should not be violated bycorrect classitions.如果你不应该被正确的类所侵犯。Icos Infors and Filters Invariants to improve the correctdetection of discrimination of disc相信相信的错误的检测，减少错误的数量。We evaluate ICOS experimentallyon twelve CNNs using the popular MNIST，CIFAR 10，CIFAR 100，and ImageNet（使用人口MNIST，CIFAR100，and ImageNet）datasets。 we比较it to two Alternative 替代战略性，namely 名字交叉引用And自我检查。结果：E X实验结果显示， ICOS 表现优于其他策略在术语准确、Showing 介绍higher稳定性over a变量ofCNN And Datasets同在 splier Complexity 的And size .Conclusions： ICOS likely invariants are shown to be e in automatically detecting magnola defications by CNNsused in image class defication tasks when the expected 产出是未知的； ICOS ultimately yields faithfulassessments of their accuracy in operation. ICOS预期输出是未知的，自动检测到CNNs的magnifications定义。Knowledge about data input can also be manually incorated into ICOS 数据输入可以手动合并到 ICOS 中，toincrease坚固性Against Unexpected 未预期phenomena以operation ，像label 标签Shift |1. introduction介绍机器学习（ML）系统是今天整体部分的多个ap—plications 2到他们的能力，以 reaching the same level or of even outperforming humanbeings（Kühl et al.）2020他 et al ，2015关于 Silveret al.，2017 formany tasks ， like in the image class defication （ IC ） domain . （很多任务，就像图像类的定义一样）机器学习系统是“is a softwaresystem including one or more components that learn how to perform atask from a given data set”（软件系统包含了一个或更多的组件，学习如何执行一项任务）。2020）.学习组件是基于ML模型。The mainperformance indicator of such models是成功的主要表现指标精确度namely the num—ber of correctly class defined images out of the total.（英文）把所有的图像都删掉。The Accuracy of准确的ML模型在复数因素上的中继，就像数据分析一样 The训练程序ITSELF ，And The真正的银过程。◎ ◎ Corresponding作者 |1、解析函数y的最大值为（操作准确性Operational Accuracy任意大数据集（英语：Arbitrary large set of operational data（操作数据集），并依赖于模型。However ， the correct labels for operational data aregenerally unknown ， and the most reliable approach to de infne theground truth is still manual labeling - this is because of the well known .（ However ， the correct labels for operational data are generallyunknown ， and the most relivable approach to do the ground truth isstill manual labeling - this is because of the well known ）（因为众所周知，是对地面的正确处理标签）。Oracle 的问题（墨菲等）al. ，2007）.Evaluating ML—based IC systems with a large number of arbitraryin—put images without an automatic oracle—thus ， by manuallychecking that each image is correctly class defined—is clearly expensive.评估基于ML—based IC系统的大型图像不在自动图像—thus范围内，通过手动检查图像正确定义—is clearly expensive.文学界有两个主要战略地址这个问题：a）sam—pling a conveniently small subset from theoperational dataset，accord—ing to a certain belief（sam—pling从操作数据集得出的方便的小子集，concord—ing to a certain belief），2.选择所有运营数据集的更多代表性样本（Selecting those samples morerepresentative of the whole operational dataset）2019（ 1 ）选择领导者Failing样品（扇区）ET al. ， 2021））And then manually LA -Contents lists available 相关at scienceDirect智能系统与Applications日记homepage ：www.journals.elsevier.com/intelligent-systems-with-applicationsE-mail 地址：安东尼奥 · 勇士unina.it（ A .扇区（ Warrior ），lyu@cse.cuhk.edu.hk（ M . R 。（ Lyu ），罗伯托 . 梵蒂冈@unina.it（ R 。（ Pietrantuo ），史蒂芬 · 罗素unina.it（ S 。俄语）。https://doi.org/10.1016/j.iswa.2022.200172获得28七月2022 年？获得以Revised形式25 November 十一月2022 年？接受24 12 月2022Available 可用online 4月20232667—3053/© 2022 The Author（s）。出版社： 2017 年度第 12 届中国移动通信技术有限公司（http://creativecommons.org/licenses/by-NC/4.0/）.A. 扇区，m.r. Lyu ，R . Pietrantuo ET艾尔。智能系统与应用 17 （ 2023 ）2001722beling only such inputs to get an estimate of the expected accuracy onthe whole operational dataset（仅运行数据集）；（B）解释MLalgorithms和stati—tical techniques自动检测输入图像的故障是类定义的。我们对牛奶战略的工作重点，以及对未来的承诺 cost of manual标签选择样本 A common approach for such a strategy is to build oraclesthrough cross—reference ， like in mul—tiple implementation testing.（一种策略的通用方法是通过交叉引用来构建Oracle，就像在mul—tiple实现测试中。2018），则 f ？detecting class定义为多数投票的失败。The various techniques of this type of a way by which multiple modelsare derived：one can，这是一个衍生的技术，因为多个模型可以派生。4.为实例，Sirisakaokul et al.，Splicerent models on the same set，2018Pei et al. ， 2017 中间模式训练（ Intermediate Models DuringTraining ），王等人 2020）.A sort of cross-referencing Oracle is also used by （引用 Oracle 的一件事是很常用的）SelfChecker 相关1 . 深度神经网络（ Deep NeuralNetworks ， DNN ） 2021 英文名称： Monitors DNN Outputs andTriggers an Alarmif the internal layer features of the model areinconsistent with the definual prediction . （如果模型内部层特征与定义预测不一致，那么 e = 在自动化中检测到可接受的虚假比率的故障 - - 可能是 trickiest issue ）oracles 的他们的论文题目是《关于在训练集和/或模型内部结构中编码的知识》（Single neurons or layers out）。However ， in operation ， well-known phenomena like （英语： However ， in operation ， well-known phenomena like ）（ However ， in operation ， well-knownphenomena like ）曲名： Draft（ Tsymbal ，2004），Distribution Shift分配And标签： Shift（ Garg et al. ）2020（ 1 ）可以 strugly 影响模型的准确性，直到模型被调用来操作那些从他们的训练中偏离的假设。在那些案例中， such knowledge becomes less 并且 as asource to build年automatic Oracle ，as we show 显示以The实验 |针对这个问题，新兴的 ML 系统生命周期就像 MLOPS&阿达里， 2021 foresee specialized teams ， volving both software andoperations engineers 。They have to ensure the correct behavior tak -ing into account the characteristics of the actual execution environmentand of the operational domain knowledge ， collected during active mon- itoring and exploited to contrast the above-mentioned （他们必须正确地执行环境和操作域知识的特性，收集并利用积极地与above-mented 的对抗）Phenomena 。1，则cos（图像分类Oracle Surrogate（重定向自A Technique toAddress）Oracle的问题当评估基于ML的IC系统的运行准确性提供时。itconsists of an 是 Oracle 代孕 that judges if the IC program under testcorrectly class defines an arbitrary in—put image whose label isunknown.（如果IC程序正确地测试下的类定义为输入图像，则标签未知） The ICOS automatic oracle aims to strong to operational changesby：i ）考虑多个信息源，包括，besides the training set和the MLalgorithm，操作域知识；（二）首页〉外文书〉人文〉心理励志〉setting the knowledge in the training set more robust to changes in orderto balance the occurrence offalse 的positive 的And maximize 的The号码of true positives 。ICOS Derives A Set Of系列5.不变量representing properties that allcorrect outputs should preserve，利用所有正确的结果应该保留知识来源：• 输入数据操作输入的不变（英语： The Invariants from theoperational input （ called ））输入 - 数据依赖性不变量（ 1 ）encode the operational domain knowl - edge as rules de defined bydomain experts on the input and provided to the ML model ; theresulting invariants are then automatically checked for . ）强奸。• 训练数据：训练数据依赖性不变量自动将数据从训练数据中删除，以便给 ML 模型期待的角色行为者。• 算法 ML Algorithm：algorithm dependent 不变量获取信息 - Actionabout 关于 how 如何 The OUTPUT is Computed From The mlAlgorithm 算法When any invariant is violated，ICOS labels the test export（当任何不变性被侵犯时，ICOS实验室的测试输出）Fail作者Otherwise通过. ICOS的实现在GitHub上已公开可用。1A Recent Work from Google 强调了需求和重要性of参domain 域知识 Knowledge as a Set of规则to Improvetrain1https://github.com/ICOS-OAA/ICOS.git.A. 扇区，m.r. Lyu ，R . Pietrantuo ET艾尔。智能系统与应用 17 （ 2023 ）2001723（ Choudhary ），2022）.在与此相关工作的联机中，使用input—dependent—invariants，we integrate into ICOS the domain knowledgeto assess CNNs operational accuracy.目标是创造一个自动化的甲骨文更多 e 意义比预测准确性的国家的最先进的艺术资产of TheCNN During The operation 。WeevaluateICOSontwelveconvolutionalneuralnetworks（CNNs），the most popular and performing基于ML的IC solutions（Sharma et al.），2011年，美国2018）. 1 . 实施例（例： _2018）and to SelfChecker （ Xiao et al. ）（肖等人），2021）.实验数据集是 MNIST （ LeCun ） &Cortes ，2010 CIFAR 100 （ Krizhevsky ）2009），并想象（ Deng et al. ，2009在 IC 中使用 widely 。 Westudy the accu - racy estimation considering the contribution of aquentierent types of invariants ， the sensitivity to invariant selectioncriteria ， and the ro-bustness （ ACCRAY estimation 考虑变量的各种类型，敏感性来保持不变的选择标准，以及风险评估）of TheOracle Subrogate以出席of label 标签Shift |显示该ICOS是可以faithfully估计精确提供由CNNs在操作环境，最终表现CRO和SelfChecker。所有三种类型的不变量都有助于检测不信值，但是a finne selection of the invariant in fluu—ences the obtained results.在结果中，我们看到，通过选择更多的不变因素，正确检测到的错误增加的数量，但付出—在虚假的积极性条款中。最后， performanceis shown to be more robust than the baselines with respect tounexpected phenomena like label shift ， with an error reduction inpresence of shift ranging of two orders . 最后，性能比基准更加危险，尊重没有预期的现象，就像标签移位一样，存在误差减少在两个命令之间快速转换的瞬间of magnitude以The best Cases 。2. Related 相关WorkWe analyze related research on the operational accuracy assess—ment of ML systems ， with spec reference to CNNs for image clas—sification.我们分析与操作准确性评估—ML系统的评估相关的研究，以及特定的指涉数据到图像序列CNNs。Sign research and sumort has been deduted in recent years toquality evaluation of ML systems.（IBM研究和数据统计系统最近几年对ML系统的质量评估进行了投入。2020）， yet few worksconcern the assessment of the accuracy provided in the operationalenvironment . （将工作结合在一起，对操作环境中准确性进行评估。以事实，研究员初级聚焦on testing 测试of ml sys -tems ， with the main aim of exposing （ sys - ems ， with the mainaim of exposing ）错误预测英文名称： Namely of Spotting asMany Failing Behaviors as Possible 2019， Juefei-Xu ，等。2018张et al. ， 2018 ，但是，张，薛，等。 2018 关于 Odena&Goodfellow ，2019）.输出此类型的故障—定义测试（和调试—操作）过程是一个改进的模型，具有更高的准确性。此分类上一篇：This re—Sembles What IsCalled Debug测试4.在传统的测试文字（Frankl et al.）1998）. Clearly，as in the traditional清晰地，传统地Debug测试软件检测结果并不需要与操作中的准确性相关，并且无法用于操作准确性评估，因为测试数据可能不代表当前的操作上下文。本事件发生时，测试数据是gener—ated 艺术定义（如相反的 examples generation ）或 when they arenatural but di sign from input observed in the define. 这是一个典型的例子： The resulting number of exposed mispredictions and/or thecoverage achieved only an " indirect " indirect indication of the expectedaccuracy in operation ， and ultimately of the with the firm that can beplaced in the system . （无法预测的错误和/或覆盖完成只会导致“ 间接 ” 索引操作中预期的准确性，以及最终的可能被放置在系统中。但是NO quantitative 数量估计is吉芬为了估计The精确度以operation ，two main战略are ：• sampling a subset of the operational input dataset to bemanually labeled，and then use it to estimate the accuracy.对操作性输入数据集进行人工化，并将其用于预测准确性。The ideais to select 的意思年as Much小as可能的参of Inputs 的Fromwhich 什么年准确And稳定性（注： e.e. ，小变量）估计is看过（李et al. ，2019战士， et al 。2021 Zhao et al. ，2022）.这镜子 operational 操作 testing 测试为常规（ Not 基于 ML ）Systems 系统穆萨（ Musa ）1996关于 Pietrantuono&俄罗斯，2016）.• 利用 ML 算法和统计技术来自动化 - CallyDetect饰 Failures以operation 。 The 主意 is to 饰 Evaluate The 英文名称：Automatically ， namely to implement an oracle ， so to avoidthe need of manually labeling the inputs 2020）.A. 扇区，m.r. Lyu ，R . Pietrantuo ET艾尔。智能系统与应用 17 （ 2023 ）2001724As the cost of manual labeling can be high and is not scalable，this workfocuses on the second solution，which also allows an online evaluation（当手工标签的成本高且无法扩展时，这一工作就实现了第二次解决方案，而这一切都是在线评估）of the operational accuracy操作准确性。首页 > 外文书 > 青少年文学 > 小说 > The Rest of the SectionFocuses on the Liter ： ature on automatedoracles 的Automated Oracle 自动化本文标题：《中国 IT 产业发展史上的一个难题》（ The Oracle Problem in ML Testing is one of the mainchallenges tackled by researchers Zhang et al. ）2022）.提案（ OftenThe Proposed ）解决方案are饰 Tailored为，或at Least Evaluated 的on ，image 图像类 - 确定。A common strategy to build an automatic oracle is to use（建立自动化甲骨文的共同战略正在发挥作用）Cross ref—erencing交叉参考如：such as multiple—explementation testing （ MIT ）（ Srisakaokul etal.），2018）.麻省理工学院是由Srisakaokul等人提议测试监督学习—软件。A test input's pro测试输入X y oracle is derived from the majority—voted out of multiple implementations of the same algorithm.（y oracle isderived from the majority—voted out of multiple implementation of thesame algorithm），是从大量算法衍生出来的。多个实施的成本是明确的高。On the other hand，the solution is able to obtain a feedbackabout the results of any arbi，trary input submitted to the system undertest.在另一个人身上，解决方案是可以接受任何arbi输出的反馈。技术要求不要求任何 Any prior知识 Knowledge about 关于The图像实验室Pei et al. adopt多个深度学习（DL）系统在他们的深度—Xplore框架为white—bo X测试（Pei et al.）2019）.他们，a neuron coveragemetric to measure the parts of the sut exercised by test inputs.（一个神经元覆盖测量方法来测量通过测试输入进行的SUT部分。DL 系统是作为交叉引用的 Oracle 用于 avoid 手册checking .10 . wang et al. （2020使用 DISSECTOR - - A Fault ToleranceApproach 进行容错容错控制区别输入可能导致机器学习系统失败。输入验证是通过前训练模型之巅的训练子模型进行的，hence usingsub—models for cross—reference. The common characteristic among thethree presented techniques is the source of knowledge used to set up theoracle as cross—reference.（三个呈现的技术都是知识的来源，用来建立Oracle作为交叉引用。在所有案例中，ML系统的输出都是基于知识编码的训练集Training Set. Splicerent ML models or the same ML modelbut sentierent architecture or sub—models trained from the same mainmodel ML.是一个多个ML模型，或者是由另一个ML模型训练而来的。the training set to perform a majority voting based on that knowledge.（以知识为基础进行投票）。These techniques are strictly a sulted bybiases in the training set.（在训练集中，他们是严格的。when the训练历are Not代表性of The operational 操作环境，ThePerformance 业绩of that oracles退化这意味着 Cantly 。cobo et al. （2019他提出了基于真实类概率（ TCP ）的 CNNs 故障预测标准。标准是通过建立一个类定义模型的定义网络（使用确定网络）学习的。TCP is shown 在执行故障预测上的类 - - 定义和分割问题 |Currently ， automatic oracles are of great interesting also inmisbehavior prediction of DNNs in autonomous driving （ Jahangirova等，英语：Jahangirova）2021）. Stocco et al.建议SelfOracles检测不支持的驾驶场景—ios基于DNN运行时间行为者（Stocco等。 2020）. 基于训练集中的图像， autoencoders are used to compute for eachoperational image a reconstruction error.（自动编码器正在使用计算）此分类上一篇： The higher the error ， the higherThe概率of失败onThe考虑到样品 |19 . xiqiqiq （2021最近建议的自动驾驶检查器（ SC ）对 CNNs和自主驾驶系统的机器人故障检测。SC检测在部署中失败，当测试下模型内部飞机的输出与定义预测不一致时。在这种情况下，内部layers的输出被用于交叉引用。Besides的失效检测，SC also建议替代预测。SC—最佳成绩国家的最先进的技术（DISSECTOR（王et al。2020与FIDNET （ Corbiere ）ET al. ， 2019），And SelfOracle 介绍（ Stocco ）ET al. ， 2020））.该软件是介于三个（MIT，DeepXplore，and DIS—SECTOR）和最后三个（Con FIDNet，SelfOracle，and SC）之间的技术是知识如何从培训集中提取。在 Particular ， The菲里斯特三个approaches 的TRY在 “ 软 ”Model Learning 学习From The Same source ， Exploiting theensemble ，以及以及 Complect 。The last three techniques com—pute metrics to exploit the knowledge encoded in each training image.（最后三个技术com—pute metrics来利用each training image中的知识编码。此策略是粒子和 sc ， which 结果处于失败状态 - of - the -art techniques预言 .A. 扇区，m.r. Lyu ，R . Pietrantuo ET艾尔。智能系统与应用 17 （ 2023 ）2001725The discused techniques do not account for the possible deviationsof the operational context from the pre—deployment one.（讨论的技术不会账号，因为运营环境中的可能偏差）Hence，they are

下载后可阅读完整内容，剩余1页未读，立即下载