MLOps详解：定义、架构与实践

需积分: 3 134 浏览量更新于2024-08-03 收藏 616KB PDF 举报

"MLOps (Machine Learning Operations)旨在解决机器学习项目自动化和运营化的问题，以实现ML产品的快速落地和生产。MLOps涵盖了最佳实践、概念集合以及开发文化等多个方面，但其定义和影响在研究者和专业人士之间尚不清晰。为此，进行了一项混合方法的研究，包括文献回顾、工具审查和专家访谈，以期整合必要的原则、组件、角色，以及相关的架构和工作流程，提供MLOps的综合概览。" MLOps（Machine Learning Operations）是一种新兴的实践领域，它的核心目标是加速和优化机器学习模型从开发到生产的整个生命周期管理。这一领域的重要性在于，尽管许多工业ML项目致力于开发ML产品，但在自动化和实际部署这些产品时面临重大挑战，导致许多项目无法满足预期。 MLOps包含多个关键组成部分： 1. **最佳实践**：为了确保高效且可靠的ML系统，MLOps强调了诸如持续集成/持续部署(CI/CD)、模型版本控制、数据治理、测试自动化等实践。这些实践有助于提高开发过程的透明度和可重复性。 2. **概念集**：MLOps涉及一系列概念，如模型监控、特征工程、模型解释性和公平性。这些概念帮助团队理解和评估模型的性能和潜在问题。 3. **开发文化**：MLOps推动了跨职能团队的合作，强调数据科学家、软件工程师、运维人员之间的紧密协作，以促进快速迭代和敏捷开发。通过文献回顾和工具审查，研究者们发现MLOps架构通常由以下部分组成： - **数据管道**：负责数据的收集、预处理和存储，确保数据质量和可用性。 - **模型开发**：涵盖模型训练、验证和选择，以及模型的版本控制。 - **部署与监控**：模型部署到生产环境，并对其进行持续监控，以便及时发现并解决问题。 - **反馈循环**：基于模型在生产中的表现，提供改进模型的反馈信息。此外，MLOps涉及多种角色，如数据工程师、数据科学家、DevOps工程师和业务分析师，他们共同协作以确保ML系统的稳定性和效率。结合专家访谈，研究者提出了一套综合的MLOps流程，这包括从需求分析、数据准备、模型训练到模型部署、监控和维护的一系列步骤。这些流程旨在促进自动化、标准化，减少错误和延迟，最终提升ML项目的成功率和价值。总结来说，MLOps是解决ML产品开发和部署难题的关键途径，它融合了软件工程的最佳实践与机器学习的特定需求，以创建一个高效、可重复和可扩展的机器学习开发和运营环境。随着研究的深入和实践的积累，MLOps的定义和应用将更加明确，为AI和ML领域的专业人员提供更有力的支持。

MLOps

Kreuzberger, Kühl, and Hirschl

Strauss [5, p.61], this stage is called “theoretical saturation.” All

interviews are conducted between June and August 2021.

With regard to the interview design, we prepare a semi-

structured guide with several questions, documented as an

interview script [33]. During the interviews, “soft laddering” is

used with “how” and “why” questions to probe the interviewees’

means-end chain [39]. This methodical approach allowed us to gain

additional insight into the experiences of the interviewees when

required. All interviews are recorded and then transcribed. To

evaluate the interview transcripts, we use an open coding scheme

[8].

4 Results

We apply the described methodology and structure our resulting

insights into a presentation of important principles, their resulting

instantiation as components, the description of necessary roles, as

well as a suggestion for the architecture and workflow resulting

from the combination of these aspects. Finally, we derive the

conceptualization of the term and provide a definition of MLOps.

4.1 Principles

A principle is viewed as a general or basic truth, a value, or a

guide for behavior. In the context of MLOps, a principle is a guide

to how things should be realized in MLOps and is closely related

to the term “best practices” from the professional sector. Based on

the outlined methodology, we identified nine principles required to

realize MLOps. Figure 2 provides an illustration of these principles

and links them to the components with which they are associated.

P1 CI/CD automation. CI/CD automation provides continuous

integration, continuous delivery, and continuous deployment. It

carries out the build, test, delivery, and deploy steps. It provides

fast feedback to developers regarding the success or failure of

certain steps, thus increasing the overall productivity

[15,17,26,27,35,42,46] [α, β, θ].

P2 Workflow orchestration. Workflow orchestration

coordinates the tasks of an ML workflow pipeline according to

directed acyclic graphs (DAGs). DAGs define the task execution

order by considering relationships and dependencies

[14,17,26,32,40,41] [α, β, γ, δ, ζ, η].

P3 Reproducibility. Reproducibility is the ability to reproduce

an ML experiment and obtain the exact same results [14,32,40,46]

[α, β, δ, ε, η].

P4 Versioning. Versioning ensures the versioning of data,

model, and code to enable not only reproducibility, but also

traceability (for compliance and auditing reasons) [14,32,40,46] [α,

β, δ, ε, η].

P5 Collaboration. Collaboration ensures the possibility to

work collaboratively on data, model, and code. Besides the

technical aspect, this principle emphasizes a collaborative and

communicative work culture aiming to reduce domain silos

between different roles [14,26,40] [α, δ, θ].

P6 Continuous ML training & evaluation. Continuous

training means periodic retraining of the ML model based on new

feature data. Continuous training is enabled through the support of

a monitoring component, a feedback loop, and an automated ML

workflow pipeline. Continuous training always includes an

evaluation run to assess the change in model quality [10,17,19,46]

[β, δ, η, θ].

P7 ML metadata tracking/logging. Metadata is tracked and

logged for each orchestrated ML workflow task. Metadata tracking

and logging is required for each training job iteration (e.g., training

date and time, duration, etc.), including the model specific

metadata—e.g., used parameters and the resulting performance

metrics, model lineage: data and code used—to ensure the full

traceability of experiment runs [26,27,29,32,35] [α, β, δ, ε, ζ, η, θ].

P8 Continuous monitoring. Continuous monitoring implies

the periodic assessment of data, model, code, infrastructure

resources, and model serving performance (e.g., prediction

accuracy) to detect potential errors or changes that influence the

product quality [4,7,10,27,29,42,46] [α, β, γ, δ, ε, ζ, η].

P9 Feedback loops. Multiple feedback loops are required to

integrate insights from the quality assessment step into the

development or engineering process (e.g., a feedback loop from the

experimental model engineering stage to the previous feature

engineering stage). Another feedback loop is required from the

monitoring component (e.g., observing the model serving

performance) to the scheduler to enable the retraining

[4,6,7,17,27,46] [α, β, δ, ζ, η, θ].

4.2 Technical Components

After identifying the principles that need to be incorporated into

MLOps, we now elaborate on the precise components and

implement them in the ML systems design. In the following, the

components are listed and described in a generic way with their

essential functionalities. The references in brackets refer to the

respective principles that the technical components are

implementing.

C1 CI/CD Component (P1, P6, P9). The CI/CD component

ensures continuous integration, continuous delivery, and

continuous deployment. It takes care of the build, test, delivery, and

deploy steps. It provides rapid feedback to developers regarding the

success or failure of certain steps, thus increasing the overall

productivity [10,15,17,26,35,46] [α, β, γ, ε, ζ, η]. Examples are

Jenkins [17,26] and GitHub actions (η).

Source Code

Repository

CI/CD

Component

Workflow

Orchestration

Component

Feature

Stores

Model Training

Infrastructure

Model

Registry

ML Metadata

Stores

Monitoring

Component

Model Serving

Component

PRINCIPLES

P1 CI/CD automation

P2 Workflow orchestration

P3 Reproducibility

P4 Versioning of data, code, model

P5 Collaboration

P6 Continuous ML training & evaluation

P7 ML metadata tracking

P8 Continuous monitoring

P9 Feedback loops

COMPONENT

Figure 2. Implementation of principles within technical

components

剩余12页未读，继续阅读

layyuiop

粉丝: 11
资源: 12

MLOps详解：定义、架构与实践

PCI Express System Architecture

ModScan32 V7.B01-03 测试

windchill client architecture common objects overview

做一个介绍ChatGPT的PPT

Camel in Action, Second Edition

写一篇激光对刀仪市场分析的PPT

url(r'^overview', overview_view.show_overview)什么功能

最新资源