完全适应标签分配学习的统一集中损失

159 浏览量更新于2023-10-25 收藏 1.03MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

20513统一集中损失：完全适应标签分配学习regression张立立∗- Jingjing Wang ∗作者： Zhaoliang Yao ，Yachun Li ， Pengju yang ，Jingwei杨，chunmao ：WANG ，Shiliang 的pu †Hikvision 研究所中国{liqiang 23 ， wangjingjing 9 ， yaozhaoliang ， liyachun 6 ， yangpengju ，马来西亚，地图yanjingwei ， wangchunmao ， pushiliang . hri}@hikvision.comAbstractLearning from a label distribution 已经实现Promise-ing results on ordinal regression tasks such as facial ageand head pose estimation in ， ALDL （自适应标签分发学习的概念）已包括：11 - tion recently for itssuperiority in theory . 理论上至少有其优越性。However ， com - pared with the methods assuming fixedform label 使用方法对表格进行了固定标签Distribu-tion - - 分布 ALDL methods 方法 Have 不是Achieved 更好 Performance 。我们已经讨论了存在ALDL算法，而不是充分利用顺序回归的内在属性。在本文中，我们建议学习在顺序回归任务上执行adap—tive 标签分配应该遵循三个原则。第一，概率Correspond - -1.00.80.60.40.20均值变量我们的Multimodal 模式appears102030405060708090100Age要了解基本事实，最重要的是要在标签上做贡献。第二，邻居标签的概率应该随着距离的增加而远离地面事实，即，发行版是统一的。 Third ， the labeldistribution should vary with samples changing ， andeven be distinct for different instances with the samelabel ，due to the different level of difficulty andsoftware . 第三，该标签发行应该有样本变化，甚至可能有不同的 instances 与同一个标签，必须有不同的等级和不同的可能性。ambiguity 。在这些原则的前提下，我们提出了一个全新的损失函数，用于完全适应性标签分发学习，称为统一集中的损失。特别是，大学模式的损失是从学习到制定策略—训练分配成为统一的模式。 Furthermore ， Theestimation error and the variance of the predicted（ Furthermore ，《预测错误及预测的差异》）Distribu - tion for a specific sample are integrated into theproposedconcentratedlosstomakethepredicteddistribution maxi - mize （分配特定样本的集成到所建议的集中损耗中，以使预定分配的最大值 -mize ）at The Ground-truth （真相）而vary According的两个The predict 不确定性上一篇： TESSIVEEXPLIVE RESORTS ON TYPICIAL REGRESSION TASKSINCLUDING AGE AND HEAD POSE ES - TIMATION ，SHOWTHESUPERIORITYOFOURPROPOSEDUNIMODAL -CONCENTRATEDLOSSCOMPAREDWITH EXETING LOSSFunctions 。∗作者贡献等于此Work .† Shiliang Pu 是对应的作者 |Probability的20514Figure 1的2013 年第一届中国移动通信博览会（22）和我们。我们的预测是乐观的，以统一和学习According 的两个具体地 instances adaptively 。 on TheContrary ，预测Of均值变量are乐观的两个be集中式为大家 instances 而 do 不是 ensure unimodal 模式 Distributionexplicitly 的。1. Introduction介绍有序回归解决了标签的挑战问题are related 相关于A A Natural or implied 的令令。许多Criticaltasks involved in the ordinal regression problem （顺序回归问题）Facial Attracial ： facial age estimation 、head pose estimation 、facial attracial ：tivenesscomputation 、 movie ratings which play 等等。1 、Portant role in many practical applications such as humancomputer （作为人类的许多实际应用程序中的便携式角色）互动，司机 Driver监测，Precise广告宣传及视频宣传（10，32].早期的古典主义（11，，17，，20，，24，，37，，38 ◎ 基于通常的分类或回归，而不是表现得很好，忽略了常规的关系 among labels ，并遭受了模棱两可的标签。在 Recent近年来，Ranking基于methods 方法（3，21] are提议Which useMul - Tiple binary classifiers to determine the rank order .二进制分类器确定 rank 顺序。They explicitly 他们明确make 做use Of The常规information 信息but Theydo20515不考虑标签ambiguity 。Toaddresstheordinalrelationshipandlabelambiguity ， label 正常关系和标签野心Distributionlearning（ LDL ）（7] converts A A single 单一label toa label distribution . 标签分配。The label distributioncovers a cer （英语： The label distribution covers acer ） Number Of class Labels ，Presentation 介绍 TheDegree 的两个which each 什么时候label describes 描述The 毅力。since The REAL Distribution for eachinstance is not available 而且必须是有正确的假设产生的，它可以被称为 fixed form label Distributionlearning（ fldl ） The典型Form 2009 年 12 月 21 日@下午 1 时 40 分 is the Gaussian distribution centered at theground truth with assumed standard deviation [1 ，，7 ，， 8]. Although FLDL ap - proaches （英语：Although FLDL ap - proaches ）achieve 。Improved 的业绩， However ， They 使用A 固定的 FormDistribution两个describe 描述various instances限制自己的表达ability 。为了克服这个限制，自适应贝尔分布学习的概念（ ALDL ）9” 他提出了。基于 ALDL 的方法，平均方差（22这是一个典型的工作实现了承诺的结果，这是一个具有学习意义和变化的分布。 “ 永远，永远， ”IT Pursues 的A A highly集中式Distribution为all“ 这意味着，从本质上讲，变化是可能的，变化是可能的，但也是可能的。它不能保证所学习的分布是统一的，由联合使用SoftMax和意味着变化的损失，没有统一的模式限制。因此，我们观察到，从平均变量中学习的分布不是完全适应的，而且对于某些实例，如图所示，是多模式的。1。我们可以看到，老年人的学习分布是非常及时的，而两个人的学习分布是类似的。经验丰富的分布不符合面部老化的趋势，这可能在不同的年龄有很大的不同（9].尽管如此，当前的ALDL方法尚未充分利用顺序回归的固有属性。在本文中，以下三个原则总结了顺序回归。首先，遵循经验风险最小化，概率对地面的反应—事实应该是标签分配的最高点。2 、当标签移动时，顺序回归的标签任务逐渐变化，测试状态和等级之间的相似性 - 总体定义逐渐消失from theground-truth 。因此，我们认为，邻近地区的可能性——与邻近地区的标记相对应的概率应该随着距离的增加而有所下降，即远离地面真理。发行版是统一的。Third ， the label distribution should 第三，标签分配应该 vary with The Samples Changing ，而 Even bedistinct for 其中Different 不同instances with The SAME标签，杜两个The differ - ent levels of difficulty andambiguity . （困难和模棱两可的程度。In otherwords ， the learned 在其他的话，学到的labelDistribution Should 的be adaptive 适应为A A particularstance 。 to satisfy 的The Principles above ，We建议A A新的 adaptive label Distribution learning办法装备用一个统一集中的损失。基于原则 I ，我们直接 -maximize 最大 The 可能性 at The Ground-truth （真相）via孔20516集中损失作为我们的初级学习目标。based on 原则II ，The unimodal 模式Loss 衍生from learning 两个Rank 战略（ LTR ）（6] IS introduced 的两个约束性The分布是统一的。如果两个邻居标签都排名incor - 一个积极的损失将产生更新列车 - 参数，以正确的顺序关系。based on原则三、The varianceOfTheDistribution2.correspond-ingtotheconcentration degree is integrated and optimized jointly目前正在整合和优化中于The集中式Loss ，WhichCAN be Regarded AS一个索引器Of Data uncertainty不确定性而label ambiguity 。 The main contributionsof this work 是什么意思Three-Fold ：Weare The FIRST 两个 comprehensively 的Summarize 总结The学习自适应标签偏差的trinsic principles for learning an adaptive label distribution on ordinal regression tasks 。第一种可能（ First ， the Probabil ）at The Ground-truth （真相）Should 的be The HiGhest于The这是贡献。第二，The Distribution Should 的be统一模式第三，The Distribution Should 的be adaptive 适应两个个人情况。这些三个原则将照亮design OfLoss Functions 功能为未来 works 于The Field ofOrdinal 普通场Regression 。与先前的方法不同，这些方法不完全符合上述原则，我们提议采用一种新的以统一模式为中心的松动，采用统一模式部分进行分布，并以集中的部分进行分配 -tradedat The Ground-truth （真相）而充分adaptive 适应两个Individual - 个人Instances 。这个提议的损失可以很容易地嵌入到存在的 CNN 中，而不改变结构，并且exten - siveexperimental results demonstrate its 的描述结果Superiority 。2. related 相关work 工作有三种方法可以分为三个类别：基于非 LDL的方法，基于 FLDL 的方法和基于 ALDL 的方法方法。2.1. 非LDL非LDLmethods 方法 CAN be Grouped 的 IntoRegression 基础，分类基于而Ranking 基础的。classification 分类基于方法通常将顺序回归作为分类问题。For examples ， age estimation was cast asa clas - sification problem with 101 categories （ 2011年，美国）27 ◎ 片名 And the Angle Of yaw何为Divided 分裂Into COARSE Bins AS class Labels为Headpose estimation ）14，，25]. [ 5 ] These methods treatordinal la - bels as independent ones ， and the cost ofbeing assigned to any wrong category is the same whichcan ’ t exploit the . （这些方法通常是独立的，并且被赋予任何错误的类别的成本与无法利用的相同）标签之间的关系（ re-lations betweenlabels ）基于回归的方法直接返回The Ground-truth（真相）with Euclidean 的Loss两个penalize 的TheDifference 的 Between The 估计而 Ground-truth （真相）大多数情况下，不显式使用常规信息。 yiet艾尔（38] used CNNs模型两个extract特点 Features···20517∼∼∼∈{}Σ而D=1;☆☆equals的两个TheGround-truth （真相）labely;σi ， jI·ΣΣfrom several facial regions，and used a square loss for ageestimation.从多面地区，并使用一平方米的损失进行年龄估计。janjin et al. （）24 [ 1 ] 提出了一个统一的CNN 网络，即联合预测面部年龄、头部姿势和其他方面的工作。tributes -最近，排名技术正在进行一是惯性回归问题。9 et al. （21 ◎ 片名Leveraged the Ordinal information 信息 Of Age Bylearning A A Network with mul - tiple binary outputs ，while Chen et al. （二进制输出， while Chen 等）3[ 通过学习多个二进制 CNN 并聚合年龄估计的结果来实现这一点。使用这些方法或使用初步信息以提高性能，他们拿走了Single Label AsGround-Truth Without Considering Label （SingleLabel ）ambiguity 。2.2. FLDLLabel 标签Distribution learning IS提议两个AddressThe“ 你知道吗？这是一个模棱两可的问题。对于基于FLDL的方法，分销形式是在培训和执行固定的操作之前确定的。它们的客观目的是在已知分布和固定一个之间的差距。（ Gent et al. ）8 ◎ 片名 Firstlydefined The label Distribution By Assigning 的 A AGaussian or Triangle distribution for an instance . （三重分布）解得： sin （5 [ 采用了正常的分布，并通过最小化封装器差异来学习标签的分布 ]两个分布using Deep CNN 的。模拟两个DLL ，刘etONS have not strictly complied the intrinsic principlessum marized in this work which can ’ t fully take theadvan-day of -Aldl 。3. 方法Methodology在本节中，我们将首先给一个简介回顾基于FLDL 的方法和细节我们的 ALDL 方法，其中一个新颖的客观功能，统一集中的损失，是为高度灵活的分布而提出的学习。3.1. Preliminaries的评论正式地， let xIdenote 。 The I- TH inputinstance with I =1，2，...，n，y阿索莱Idenote 。The predicted value By The网络，以及y I1，，2 ，， C Denote The Ground-Truth Label Where（英语： Ground-Truth Label Where ）n IS TheNumber Of instances而C IS The Number Of classes 。insteadOf RESTRESSING y I 直接， FLDL 基于methods 方法transform y Ifrom A A single 单一classlabel两个A A label分布和Then predict 预言y阿索莱 IBy label Distribution 学习。Gaussian 的Distribution IS Commonly 普通used于FLDL（1，5，7，9]. instances with The SAME class label y IShare 分享 The Identical 相同 Gaussian 的戴 -tribution 。Taking Gaussian 的Distribution D n（☆ ☆ ☆ σ2）AS例子1（J― ― ☆ ☆）2艾尔（19] employed 雇员三个Gaussian 的labelDistribution两个DE -Scribe 的A A Face 脸示例于The Yaw ，PITCH而角色roll domainDi， j=s√2πσ2EXP的（ -）2σ2），J = 1，2，...，C ，（ 1 ）respectively。 DLDL-V2（1] Improved 的The DLDL ByIntro -Ducingan Expectation 期待Loss from Distribution两个alleviate训练目标与评估指标之间的不一致性。例如： sin （30将随机森林连接到深度神经网络，并利用决策树的尖端 - 旨在模型任何一般形式的标签分发。例如： tan （ 23 [ ] self-pacedregression forests to 自动回归distin - Guish noisy andconfusing facial images from regular （模糊的面部图像）专辑中文名： One ， Which alleviate Theinterference 干预arising 的from他。如何 -where 哪里D i ， jdenotes ：The可能性Of x IBelongs的两个class JCJISThe standard Deviation Of DI; s IS A A 正常化因素。LetzI= F（x I;（denote 。The产出Of The last充分CNN 模特儿的一层 con - nected （ FC ） layer of aCNN modelF（）何处ΘIs The Model 模特参数 |SoftMax 的业务IS APPLIE 应用两个Turn Put - putzIInto Distribution 分布pI。The Elements 元素p i ，jOf pIis - puted 的ASEXP 的（z i ， j）永远，These 这些methods 方法use A A固定的FormDistribution两个describe 描述variousinstances Which限Their expression ability 。pi ，j =CK =1EXP的（zi，k）。（2）2.3. ALDL不同from FLDL基于methods 方法Which assume基于ALDL 的固定形式标签分发methods 方法IS不是assumed at The开始而IT IS基因 -Kullback（Kullback）分歧经常被采用Fldl是LossFunction的缩写。关于 Loss（l kl）IS乐观的2 . 在预定义的分布之间减少差距DI而The predictedDistribution pI。 The FINAL prediction 预测y阿索莱IISobtained 的 By Taking The Expectation 期待 OfpIAS Follows 的Cerated automatically during learning自动化持续学习。（ Gent et al. ）8双适应标签分配学习算法（ pro -posedtwoadaptivelabeldistributionlearningalgorithms） IIS ALDL而BFGS-ALDLrespectively 相关两个汽车 -20518y阿索莱I=J ∗ pJ = 1i，j。（3）学习如何将标签分发适应不同的时代。 He 他et艾尔（13] generated 产生age label Distribution through a 通过 Weighted 体重 linear 线性 combination Of The inputImage 的 label 和其 context-neighboring 样本 . 艾尔（ Al ）22 [ 2019 - 04 - 15 ] · 分配的意义与变化消失时的意义区别第一，分配的差异，以确保一个夏普的分配。However ， we argue that existingaldl （我们因为 ALDL 存在）meth -Thus ，Different 不同instances with The SAME label areexpected 2 预言模拟器分布。 IT IS Against The自然Nature that DIF - - ferent instances with the same labelshould have their own distributions corresponding to their-characteristics 。3.2. 提议办法为了解决问题，我们现在提供了一种新颖的自适应标签分发方法，可以使用 Pro—duce unimodal 和instance—aware分发。Fig. 220519yy）2Exp（）（II）2vI――ΣΣ― ―――—― ―――――I =1image1unimodal 模式Loss（pI，JpI，J  1）* Sign（JyI）↓0SoftMax的image2Concentrated 的Loss愤怒麦克斯1 2imageinputFeatureextraction充分adaptive 适应Distribution learningFigure 2的Overview of our proposed method.我们建议的方法。统一模式的损失使得最终预测的分配将被链接到一个山地生活的曲线with single 单一Peak ，While The mean 的而variance Of The probabilities 概率are乐观的jointly via The集中式Loss两个make 做The predicted distribution adaptive to individual 预测分配，适应个人Instances 。给出了我们的方法的概述，其中提议的统一模型和集中的损失已嵌入到一个扩展CNN为End-to- end 的learning without any对模型进行额外的修改。The Details Are Given 的评论下面。3.2.1unimodal 模式Loss根据原则，我们通常优先考虑为普通重绘任务输出单一模式分布的关键。Hence ，我们提议一次性损失denoted as ; l uni which is formulated as 是什么意思Follows 的In the other direction where sign 在另一个方向（j 和I] =+1OUR 我们l uni法律顾问The probabilities 概率两个Decrease 的monotonically 的After The Ground-truth （真相）阵地。 Thus ， The predictedDistribution Will be optimized to be unimodal （英语： Will be optimized to be unimmodal ）l uni。我们的提议l uni英文名： Is Superior to the SoftmaxLoss Used In 22]. since l uniCan adjust the ranking relationwithin the 可以把排名的关系放入其中predictedDistribution While The softmax 的 Loss 不是。 Pleaserefer to proof in sec . 请允许我们在 SEC 进行测试。3.2.3对于更多细节。第一 � l 租 � U 物 � Y 的品名、 � 格、 � 盗俊 ① | 量（ � � 合同附件）：_22 ◎ Are （你）更可能是多模式的，比较的例子在图中给出。 4。nC -1l uni =1max（ 0 ），，（p nI = 1 J = 1i，j― ―pi，j+1）∗Sign（J― ― yI]），（四）3.2.2Concentrated 的LossAccording 根据两个Principles Discused 讨论在那之前，The Learned 学习bution - - 消费Should 的maximize最大at The Ground-truth （真相）而be adaptive 适应哪里的旗帜（J y I] is a sign function which equals to -1while 相等 -1 的函数Jy I<0而equals 的两个1 otherwise 。 IT IS desirable的为value of 值p i ， jp i ， j+1be negative if 是什么意思j 和 I<0 0be positive if （正面的）J y I> 0，Which conforms 的两个The Properties 文件Of统一分配模式普通模式Constrain Distribution to Be Unimodal。下一篇文章下一篇文章： Order to show how ourunimodal loss l uni表演者： We Take A case 事件对于个人的情况。To complish this goal ， we pro - - 表示损失以实际为准l con Which integrates 综合的TheDI F参考BetweenThe估计y阿索莱而The基础 -truth y 而 The uncertainty 不确定性 Indicator 指数variance Of The predicted distribution together ， andoptimized them 预计共同分布，乐观地对待他们Jointly 。WeFIRST maximize 最大The Following喜欢为x IOfJ 0。这是The adjacent probabilities 概率are不是于In2 πv I2v I20520Σ――nuni1 . 在达到基本真理之前 monotonically beforereaching the ground-truth阵地。2I2v I2Calth p；i，j =+1，（五）v I=p i ， j∗（J ― ― y阿索莱I）2。（8）J =1丁磊 Luni=1。（6）alth p；i，j +1根据EQ。 5以及EQ。 6- The p i ， j威尔 · Bedecreased 的thenWeta K eTheN eg at Iv elogOf例如： tan（·）两个getlcon ASFollows的杜两个its积极的Gradients ，While p i ， j+1威尔beincreased 增加l con=- ln（ 1 分） sin （pI; x I，（（9 ）2、注意它的负面等级。 In other words ，our 我们unimodal 模式n2Lossl法律顾问The probabilities 概率两个make 做Them increase 增加=1Σ（1lnv+（y阿索莱I― ―yI）+1ln2π），（ 10）I =1205212Iαv――――n――。-2――何处不断1ln 2π Can be omitted during 可以在优化。Instance-aware Adaptive Distribution 实例感知型分布learning。toDemonstrate How it works ， we take the gradient ofconcen （我们演示它是如何工作的，我们接受理念的梯度）l conw.r.t. The Variance版本v I。众所周知，样本意味着和变化是统计上相互独立的，因为它是计算机的。AS丁磊L conα v I1=2v I（y阿索莱I― ―yI）22v，（11）where 哪里丁磊 Lcon已关注Properties 文件I0pI，，JpI，J  1 yILabel标签丁磊Lcon>0，，While v>（y阿索莱― ― y）2，（ 12 ）α v I丁磊L conIIIFigure 3.一幅画： How Unimodal Loss （ Orange ） and Soft -Max （美）Loss（ Green ）影响The可能性Distributionrespectively 。α v I< 0，While 0

下载后可阅读完整内容，剩余1页未读，立即下载