IONN：边缘服务器上的神经网络计算增量卸载策略

需积分: 0 145 浏览量更新于2024-08-05 收藏 1.82MB PDF 举报

"IONN(Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers)是一种针对移动设备到边缘服务器的神经网络计算的增量卸载技术，旨在解决在资源受限的移动设备上运行计算密集型深度神经网络（DNN）的问题。传统的做法是让移动客户端将DNN查询发送到中央云服务器，而IONN则探索了在分散式云基础设施如云小站、边缘/雾服务器中更有效的解决方案。文章由Hyuk-Jin Jeong、Hyeon-Jae Lee、ChangHyun Shin和Soo-Mook Moon等人撰写，来自首尔国立大学。他们提出，在新兴的分散式云环境中，传统的集中式云DNN卸载方式并不适用，因为这会导致客户端必须将DNN模型上传至边缘服务器，而这可能导致由于上传时间长而产生的查询处理延迟。 IONN技术的核心思想是实现增量卸载，即仅将必要的计算部分从移动设备传递到边缘服务器，而不是整个DNN模型。这种方法可以显著减少数据传输量，从而加快查询响应时间，提高效率。此外，IONN可能还考虑了在移动设备与边缘服务器之间动态分配计算任务，以优化能源消耗和性能。边缘计算的概念在此文中被强调，作为一种可以更接近用户的数据处理方式，边缘服务器能减少网络延迟，提高用户体验。IONN通过智能地管理移动设备和边缘服务器之间的计算负载，降低了对中央云的依赖，使得服务更加实时且高效。 IONN的实施可能涉及到复杂的算法设计，包括决策机制来确定何时和什么部分的计算应转移到边缘服务器，以及如何在不同计算资源之间有效地协调。此外，可能还需要考虑到网络条件的变化、服务器的可用性和可靠性等因素。 IONN为移动设备上的DNN计算提供了一种新的、灵活的解决方案，适应了日益增长的边缘计算需求，有望在物联网(IoT)、自动驾驶、实时分析等领域发挥重要作用。其目标是提高用户体验，同时减少对网络带宽和移动设备电池寿命的压力。"

IONN: Incremental Oloading of Neural Network Computations SoCC ’18, October 11–13, 2018, Carlsbad, CA, USA

[

], or pytorch [

], requires a substantial space (more than 3 GB)

so it is not realistic to upload such an image on demand at runtime.

Rather, it is more reasonable for a VM (or a container) image for the

DNN framework to be pre-installed at the edge server in advance,

so the client uploads only the client’s DNN model to the edge server

on demand.

To check the overhead of uploading a DNN model, we measured

the time to transmit the DNN model through wireless network. It

takes about 24 seconds to upload the AlexNet model, meaning that

the smart glasses should execute the queries locally for 24 seconds

before using the edge server, thus no improvement in the meantime.

Of course, worse network conditions would further increase the

uploading time.

If we used a central cloud server with the same hardware where

the user’s DNN model is installed in advance, we would have ob-

tained the same DNN execution time, yet with a longer network

latency. For example, if we access a cloud server in our local region

(East Asia) [

], the network latency would be about 60 ms, com-

pared to 1 ms of our edge server due to multi-hop transmission.

Also, it is known that the multi-hop transmission to distant cloud

datacenters causes high jitters, which may hurt the real-time user

experience [32].

Although edge servers are attractive alternatives for running

DNN queries, our experimental result indicates that users should

wait quite a while to use an edge server due to the time to upload

a DNN model. Especially, a highly-mobile user, who can leave the

service area of an edge server shortly, will suer heavily from the

problem; if the client moves to another location before it completes

uploading its DNN, the client will waste its battery for network

transmission but never use the edge server. To solve this issue,

we propose IONN, which allows the client to ooad partial DNN

execution to the server while the DNN model is being uploaded.

3 BACKGROUND

Before explaining IONN, we briey review a DNN and its variant,

Convolutional Neural Network (CNN), typically used for image pro-

cessing. We also describe some previous approaches to ooading

DNN computations to remote servers.

3.1 Deep Neural Network

Deep neural network (DNN) can be viewed as a directed graph

whose nodes are layers. Each layer in DNN performs its opera-

tion on the input matrices and passes the output matrices to the

next layer (in other words, each layer is executed). Some layers

just perform the same operations with xed parameters, but the

others contain trainable parameters. The trainable parameters are

iteratively updated according to learning algorithms using training

data (training). After trained, the DNN model can be deployed as a

le and used to infer outputs for new input data (inference). DNN

frameworks, such as cae [

], can load a pre-trained DNN from

the model le and perform inference for new data by executing

the DNN. In this paper, we focus on ooading computations for

We measured the size of a docker image for each DNN framework (GPU-version) from

dockerhub, which contains all libraries to run the framework as well as the framework

itself.

inference, because training requires much more resources than in-

ference, hence typically performed on powerful cloud datacenters.

A CNN is a DNN that includes convolution layers, widely used

to classify an image into one of pre-determined classes. The image

classication in the CNN commonly proceeds as follows. When

an image is given to the CNN, the CNN extracts features from the

image using convolution (conv ) layers and pooling (po ol) layers.

The conv/pool layers can be placed in series [

] or in parallel [

]

[

]. Using the features, a fully-connected (fc) layer calculates the

scores of each output class, and a softmax layer normalizes the

scores. The normalized scores are interpreted as the possibilities of

each output class where the input image belongs. There are many

other types of layers (e.g., about 50 types of layers are currently

implemented in cae [

]), but explaining all of them is beyond the

scope of this paper.

3.2 Oloading of DNN Computations

Many cloud providers are oering machine learning (ML) services

[

] [

], which perform computation-intensive ML algorithms

(including DNN) on behalf of clients. They often provide an appli-

cation programming interface (API) to app developers so that the

developers can implement ML applications using the API. Typically,

the API allows a user to make a request (query) for DNN compu-

tation by simply sending an input matrix to the service provider’s

clouds where DNN models are pre-installed. The server in the

clouds executes the corresponding DNN model in response to the

query and sends the result back to the client. Unfortunately, this

centralized, cloud-only approach is not appropriate for our scenario

of the generic use of edge servers since pre-installing DNN models

at the edge servers is not straightforward.

Recent studies have proposed to execute DNN using both the

client and the server [

] [

]. NeuroSurgeon is the latest work on

the collaborative DNN execution using a DNN partitioning scheme

[

]. NeuroSurgeon creates a prediction model for DNN, which

estimates the execution time and the energy consumption for each

layer, by performing regression analysis using the DNN execution

proles. Using the prediction model and the runtime information,

NeuroSurgeon dynamically partitions a DNN into the front part

and the rear part. The client executes the front part and sends its

output matrices to the server. The server runs the rear part with the

delivered matrices and sends the new output matrices back to the

client. To decide the partitioning point, NeuroSurgeon estimates the

expected query execution time for every possible partitioning point

and nds the best one. Their experiments show that collaborative

DNN execution between the client and the server improves the

performance, compared to the server-only approach.

Although collaborative DNN execution in NeuroSurgeon was ef-

fective, it is still based on the cloud servers where the DNN model is

pre-installed, thus not well suited for our edge computing scenario;

it does not upload the DNN model nor its partitioning algorithm

considers the uploading overhead. However, collaborative execu-

tion gives a useful insight for the DNN edge computing. That is, we

can partition the DNN model and upload each partition incremen-

tally, so that the client and the server can execute the partitions

collaboratively, even before the whole model is uploaded. Start-

ing from this insight, we designed the incremental ooading of

403

剩余10页未读，继续阅读

奔跑的楠子

粉丝: 32
资源: 299

IONN：边缘服务器上的神经网络计算增量卸载策略

随机特征映射的四层神经网络及其增量学习.pdf

基于神经网络的增量式模型算法控制

基于BP神经网络的增量式PID控制

BP神经网络进行增量式PID的参数自整定

具有拓扑学习神经网络的增量非参数回归的高斯混合框架

attach-juxtapose-parser：“用图神经网络严格增量选区分析”论文的代码

一种改进的神经网络增量学习算法.caj

基于RBF神经网络的集成增量学习算法.pdf

自组织增量神经网络matlab代码

一种连续增量学习模糊神经网络

最新资源