Docker与宿主机性能比较：深度学习应用的容器化影响

下载需积分: 49 | PDF格式 | 508KB | 更新于2024-09-07 | 24 浏览量 | 举报

本文主要探讨了在深度学习技术广泛应用的背景下，Docker容器技术对于部署和管理深度学习软件所带来的便利性。随着深度学习软件框架的快速发展，为了适应新的硬件特性和软件库，开发人员频繁更新这些工具，这给终端用户和系统管理员带来了挑战。为了解决这个问题，Docker容器技术作为一种流行的解决方案被广泛采用，它简化了复杂环境中的应用部署与管理。然而，关于Docker容器是否会对深度学习应用程序的性能造成影响，目前尚缺乏系统的评估。本文的目标是系统性地研究Docker容器对深度学习任务执行效率的影响。研究者首先对Docker容器环境下系统的性能进行了基准测试，重点关注了以下几个关键方面： 1. **性能比较**：研究者对比了Docker容器内的深度学习工作负载与直接在主机（host）上运行时的性能表现，包括CPU、内存、I/O和网络吞吐量等方面。这有助于揭示是否存在因为容器隔离或资源限制导致的性能下降。 2. **资源利用效率**：考察了容器如何管理和分配资源，以及在容器化环境中，深度学习模型如何有效地利用硬件资源，如GPU加速。 3. **启动时间与冷热启动效应**：分析了容器启动的时间开销，以及在频繁重启场景下，容器的恢复速度是否对整体应用效率有显著影响。 4. **扩展性和可移植性**：探究了Docker容器能否支持深度学习应用的横向和纵向扩展，以及容器技术是否能方便地在不同硬件平台之间迁移。 5. **内存管理与优化**：研究了Docker如何处理内存分配，以及如何通过配置和优化避免内存瓶颈。 6. **网络性能**：考察了容器内的网络通信效率，包括数据包传输延迟、带宽利用率以及容器间网络通信是否会影响深度学习模型的训练速度。通过深入的性能评估，论文旨在为深度学习开发者和系统管理员提供有价值的指导，帮助他们决定在何种情况下使用Docker容器，以及如何最大限度地发挥其优势，同时尽量减少潜在的性能损失。最终，这项研究的结果对于优化容器技术在深度学习场景下的最佳实践具有重要意义。

Performance Evaluation of Deep Learning Tools in

Docker Containers

Pengfei Xu

Department of Computer Science

Hong Kong Baptist University

Email: pengfeixu@comp.hkbu.edu.hk

Shaohuai Shi

Department of Computer Science

Hong Kong Baptist University

Email: csshshi@comp.hkbu.edu.hk

Xiaowen Chu

Department of Computer Science

Hong Kong Baptist University

Email: chxw@comp.hkbu.edu.hk

Abstract—With the success of deep learning techniques in

a broad range of application domains, many deep learning

software frameworks have been developed and are being updated

frequently to adapt to new hardware features and software

libraries, which bring a big challenge for end users and system

administrators. To address this problem, container techniques are

widely used to simplify the deployment and management of deep

learning software. However, it remains unknown whether con-

tainer techniques bring any performance penalty to deep learning

applications. The purpose of this work is to systematically

evaluate the impact of docker container on the performance of

deep learning applications. We ﬁrst benchmark the performance

of system components (IO, CPU and GPU) in a docker container

and the host system and compare the results to see if there’s any

difference. According to our results, we ﬁnd that computational

intensive jobs, either running on CPU or GPU, have small

overhead indicating docker containers can be applied to deep

learning programs. Then we evaluate the performance of some

popular deep learning tools deployed in a docker container and

the host system. It turns out that the docker container will not

cause noticeable drawbacks while running those deep learning

tools. So encapsulating deep learning tool in a container is a

feasible solution.

I. INTRODUCTION

Ever since the great success of deep learning techniques

in many application domains, more and more deep learning

software tools have been developed by different research

institutions and companies for both academic research and

commercial use [1]. Popular tools like Caffe [2], CNTK [3],

MXNet [4], TensorFlow [5], Torch [6], etc. are still being

actively developed and their new versions are being released

frequently, which brings signiﬁcant software management

challenge to system administrators. It is even worse when

different tools or different versions of the same tool need to

be installed in a system that is shared by multiple users. A

practical solution to simplify the management of deep learning

tools is to make use of docker containers so that environmental

setting conﬂicts can be easily resolved by packaging a software

and its all required libraries into a single image [7]. Despite its

popularity in practical usage, there lacks a systematic analysis

on the performance overhead brought by docker containers for

deep learning tools. This paper aims to investigate the impact

of docker containers on the performance of deep learning tools.

A typical deep learning training workﬂow involves data

access from/to disk drives and intensive data processing on

CPUs and/or accelerators such as GPUs [8]. Therefore we

evaluate the performance of CPU, GPU, disk I/O, and the

overall deep learning training with and without dock container,

respectively. For CPU performance, we make use of two

classical and representative benchmarks, HPL and HPCG. For

GPU performance, a set of GPU programs are selected to test

different types of GPU operations. Disk I/O performance can

be another important factor when huge amount of data are

fed to neural networks during training process [9]. We test

I/O performance from several aspects, including I/O access

latency, random access throughput, and sequential access

throughput. At last, we evaluate the training performance of

ﬁve popular deep learning software tools with different neural

network models and datasets. Based on our experimental re-

sults, we ﬁnd that docker containers have negligible overhead

in computing-intensive tasks on both CPU and GPU. The I/O

performance of sequential access under the docker container

is found to be at the same level as the host system. When

it comes to random access, we observe even shorter response

time on docker container than on the host system using one

of the tested disk drives. This is because docker containers

can make better use of the NAND cache on the hard disk to

gain faster random data access. Since each factor mentioned

above has satisfactory results on docker, it is not surprising

to ﬁnd that running deep learning tools in docker containers

has negligible overhead compared to running on host systems

directly.

This paper is organized as follows. Background of deep

learning and docker containers and related work are introduced

in Section II. The design of our experiments is presented in

Section III. We show our experimental results and analysis in

Section IV. We conclude our work in Section V.

II. BACKGROUND

A. Deep Learning

Deep learning is a class of machine learning techniques

which powers great number of facets in our everyday life.

Deep neural networks are built of many processing layers

and are able to learn features from a mass of data with

various stages of abstraction [9]. This technology has many

applications like speech recognition [10] [11] [12] [13], image

recognition [14] [15] [16] [17], natural language processing

[18] [19] [20], and the list is getting longer and longer.

Comparing with conventional machine leaning techniques,

arXiv:1711.03386v1 [cs.DC] 9 Nov 2017

下载后可阅读完整内容，剩余8页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

coding_myway

粉丝: 40

Docker与宿主机性能比较：深度学习应用的容器化影响

Docker高级应用与集群技术探索

Docker：软件容器引擎，简化开发运维与加速应用交付

Docker入门教程：快速掌握容器化技术

【Docker存储驱动选择】：优化CentOS7上Docker容器存储性能的独家秘籍

容器技术大比拼：Docker vs. Kubernetes，谁主沉浮？

【Docker性能调优】：提升Docker容器性能的10大调优策略

OpenWrt R9上Docker容器性能的秘密：优化与资源限制终极解决方案

【监控与日志】Windows Server 2016下Docker性能优化秘籍

在Docker中优化.NET6微服务的性能

【Docker性能调优】：专家级技巧，提升容器运行效率

最新资源