稀疏化与分离：优化可穿戴设备上的深度学习资源

需积分: 0 156 浏览量更新于2024-08-05 收藏 4.35MB PDF 举报

本文主要探讨了在可穿戴设备这种资源受限环境中，如何有效地应用深度学习进行推理的问题。随着深度学习在传感器数据分析和解读方面所带来的显著准确性提升，它对于下一代移动、可穿戴和嵌入式应用具有巨大的吸引力。然而，现有的深度学习算法，如深度神经网络（DNN）和卷积神经网络（CNN），在识别高级别类别和处理低级别数据时，往往对设备的内存、计算能力和能源需求有很高的要求。这在移动和嵌入式平台上的资源有限性面前构成了一大挑战。为解决这一问题，作者提出了" SparseSep"，这是一种创新的方法，旨在通过稀疏化全连接层和分离卷积核来降低深度学习模型的资源消耗。全连接层的稀疏化意味着减少神经元之间的连接数量，从而减少计算量；而卷积核的分离则可能涉及将一个大的卷积核分解为多个较小的部分，分别在不同的硬件模块上并行处理，进一步节省硬件资源。通过稀疏化和分离，SparseSep使得大规模的深度神经网络和卷积神经网络能够在资源受限的可穿戴设备上运行得更为高效。这种方法不仅有助于减轻设备负担，提高能效，还可能使得复杂的学习模型得以在低功耗设备上部署，进而推动了可穿戴技术的广泛应用和普及。此外，论文可能会详细介绍稀疏化和分离的具体实现技术，如权重剪枝、量化、低秩分解等方法，以及如何在实际硬件架构中优化这些技术以达到最佳性能。同时，研究结果还会包括实验评估，展示在实际可穿戴设备上，与传统方法相比，SparseSep在准确性和资源效率上的提升效果。这篇文章的核心贡献在于为解决可穿戴设备上的资源约束问题提供了一个新颖且实用的深度学习优化策略，这对于推动此类设备的智能化发展具有重要意义。

and non-linear layers. The purpose of the convolution and

p ooling layers can be viewed as that of feature extractor be-

fore the fully connected layers are engaged. Inference then

pro ceeds exactly as previously described for DNNs until ul-

timately a classiﬁcation is reached.

Contrary to the shallow learning-based models, deep learn-

ing models are usually big an often contains more than mil-

lion parameters. H igh parameter space improves the capac-

ity of these models and they often outperform prior shallow

mo dels in terms of model generalization perform ances. How-

ever, the accuracy gains come at the expense of high energy

and memory costs. Although, high end wearables contain-

ing GPU, e.g., NVIDIA Tegra K1, can eﬃciently run deep

mo dels [12], the high resource demands make deep learning

mo dels unattractive for low end wearables. In this paper we

explore sparse factorizations and convolutional kernel sep-

arations to optimize the resource demands of deep models,

while maintaining the functional properties of the models.

3. DESIGN AND OPERATION

Beginning with this section, and spanning the following two,

we detail the design and algorithms of SparseSep.

3.1 Design Goals

SparseSep is shaped on the following objectives.

• No Re-training. The training of a large deep model is

the most time consuming and computationally demand-

ing task. For example, a large model such as GoogleNet

is trained using thousands of CPU cores [13], which is

b eyond the current capabilities of a single wearable de-

vice. In this work, we mainly focus on the inference

cycle of a deep model and p erform no training on the

resource-constrained devices. The training process also

requires a very large training dataset, often inaccessible

to the developers [14]. Thus new techniques are needed

to compress popular cloud-scale deep learning models to

run on wearable and IoT grade hardware gracefully.

• No Cloud O✏oading. As noted in §1, o✏oading

the execution of portions of deep models can result in

leaking sensitive sensor data. By keeping inference com-

pletely local, user and applications have greater privacy

protection as the data or any intermediate results never

leave the device.

• Target Low-resource Platforms. Even high-end

mobile processors (such as the Tegra K1 [15]) still require

careful resource use, when executing deep learning mod-

els. But in this class of processors, the gap in resources

is closing. However, for low-energy highly portable wear-

able processors that lack GPUs or have only a few MBs

of RAM (e.g., ARM Cortex M3 [16]), local execution of

deep models remains impractical. For this reason, Spars-

eSep turns to new ideas like the use of sparsiﬁcation of

weights and kernel separation, in search of the leaps in

resource eﬃciency required to make these low-end pro-

cessors viable.

• Minimize Model Changes. Deep models must un-

dergo some degree of change to enable their operation

on wearable hardware. However, a core tenet of Spars-

eSep is to minimize the extent of such modiﬁcations

and remain functionally faithful to the initial model ar-

chitecture. For this reason, we frame the problem as

one of deep model compression (originally formulated by

the m achine learning community), where model layer ar-

rangements remain unchanged and only per-layer con-

nections are changed through the insertion of additional

summarizing layers. Thus, the degree of changes made

by SparseSep is a key metric that is minimized during

mo del processing.

• Adopt Principled Approaches. Ad-ho c methods

to al ter a deep model – such as ‘specializing’ a model to

recognize a smaller set of activities/contexts, or chang-

ing layer/unit parameters to generate a desired resource

consumption proﬁle – are dangerous as they violate the

domain experience of the modeling experts. Methods like

sparse coding [17] and model compression [18] are sup-

p orted by theoretical analysis [19]. Assessing if a model

can be altered solely by changes in the accuracy metric

can be dangerous and can potentially hurt, for example,

its ability to generalize.

3.2 Overview

We now brieﬂy outline the core approach of SparseSep to

optimize the architecture of large deep learning models so

that they meet the constraints of target wearable devices.

In §4 we provide the necessary theory and algorithms of this

pro cess, but we begin here with the key ideas.

The inference pipeline of a deep learning model is domi-

nated by a series of matrix computations, especially multi-

plications, and convolutions. Attempts have been made to

optimize the total number of computations by low-rank fac-

torizing of the weight matrix or decomposing convolutional

kernels into separable ﬁlters in an ad-hoc manner. Both

weight factorization and kernel separation, however, require

mo diﬁcation in the architecture of the model by inserting

a new layer and updating weight components (see §4.1 and

§4.4). Although, counter-intuitive, the insertion of a new

layer only achieves computat ional eﬃciency under certain

conditions, which depends on, e.g., the size of the newly

inserted layer, the size of the original weight matrix, and

the size of convolutional kernels. In §4.1, §4.2 and §4.4 we

derive and show the c onditions unde r which computational

and memory eﬃciencies can be achieved.

In this paper, we postulate that the computational and space

eﬃciency of the deep learning models can b e further im-

proved by adding sparsity constraints to the factorization

pro cess. Accordingly, we propose a sparse dictionary learn-

ing approach to enforcing a sparse factorization of the weight

matrix (see §4.3). In §5.2 we show that under speciﬁc spar-

sity conditions the resource scalability of the proposed ap-

proach is signiﬁcantly better than existing approaches.

The weight factorization approach signiﬁcantly reduces the

memory footprint of both DNN and CNN models by opti-

mizing the parameter space of the fully connected layers.

The factorization also helps to reduce the overall number of

op erations needed and improves the inference time. How-

ever, the inference time improvement due to factorization

is much more pronounced for DNNs than CNNs. This is

primarily due to the fact that a major portion of the CNN-

based inference time (often over 95%) is spent on performing

convolution operations [12, 20], where the layer factorization

technique has no inﬂuence. To overcome this limitation, we

also propose a runtime convolution kernel separation tech-

nique that optimizes the convolution operations to reduce

剩余13页未读，继续阅读

马李灵珊

粉丝: 40
资源: 297

稀疏化与分离：优化可穿戴设备上的深度学习资源

EdgeMI：资源受限条件下深度学习多设备协同推理.pdf

基于动态指导的深度学习模型稀疏化执行方法.pdf

matlab的代码在相机上实现-sparse-depth-sensing:IROS'16/IJRR“稀疏感知，用于资源受限的深度重构”

快速和简单的资源受限的深度网络结构学习-Python开发

模型压缩与加速技术用于轻量化部署，提高模型效率，适用于移动端和嵌入式设备等资源受限环境

morph-net：快速简单的深度网络结构资源受限学习

「深度学习稀疏性」首篇大综述论文

基于资源受限边缘设备的深度学习传感器数据理解方法

OKRELM：可穿戴设备上的在线核化极限学习机推动活动识别

iot设备深度学习推理库：快速部署算法实现

最新资源