Drain：基于固定深度树的在线日志解析新方法

研究论文

1星 200 浏览量更新于2024-08-27 收藏 338KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"Drain: An Online Log Parsing Approach with Fixed Depth Tree" 是一篇针对大规模、快速增长的日志数据处理的研究论文。随着Web服务的广泛应用，日志已经成为记录系统运行关键信息的重要手段，对于服务提供商和用户来说，对日志进行有效的分析是维护服务性能和故障诊断的关键步骤。通常的流程是先解析未结构化的原始日志消息，然后通过数据挖掘模型提取出对系统行为至关重要的信息。然而，当前大多数日志解析方法倾向于离线、批量处理，即在收集所有日志后进行模型训练。这种方法在日志量急剧增长的情况下会面临挑战，因为模型的训练时间会变得非常耗时，无法实时响应快速变化的服务需求。为了克服这一问题，Drain方法应运而生。 Drain专注于在线、实时的日志解析，采用固定深度树（Fixed Depth Tree）的架构。这种架构允许系统在接收到新日志的同时，即时地进行解析，而不必等待所有日志的积累。这样做的优势在于提高了处理效率，能够实现实时监控和异常检测，从而更快地识别和解决问题，减少对服务的影响。该方法的核心在于设计一个高效的树结构，每个节点代表一种可能的模式或规则，用于匹配日志中的关键信息。固定深度限制了搜索空间，减少了计算复杂性，使得在有限的时间内处理大量日志成为可能。同时，它也考虑到了动态扩展和自适应性，可以根据实际情况调整解析策略，以应对不断变化的日志特征。此外，Drain论文可能还会探讨如何构建和优化这个固定深度树，包括如何选择合适的特征、如何设计有效的匹配算法以及如何通过在线学习来更新和改进模型。论文可能会包含实验部分，展示Drain与传统离线方法在性能、准确性和实时性方面的对比结果，以证明其在大规模日志分析场景下的优越性。 Drain是一项创新的日志分析技术，它突破了传统离线处理的局限，为实时、高效的Web服务管理和故障诊断提供了一种新的解决方案。对于IT专业人士而言，理解并掌握这种在线解析方法对于优化服务性能、提升运维效率具有重要意义。"

资源详情

资源推荐

Drain: An Online Log Parsing Approach with Fixed

Depth Tree

Pinjia He

∗

, Jieming Zhu

∗

, Zibin Zheng

†

, and Michael R. Lyu

∗

Computer Science and Engineering Department, The Chinese University of Hong Kong, China

{pjhe, jmzhu, lyu}@cse.cuhk.edu.hk

†

Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education

School of Data and Computer Science, Sun Yat-sen University, China

zhzibin@mail.sysu.edu.cn

Abstract—Logs, which record valuable system runtime infor-

mation, have been widely employed in Web service management

by service providers and users. A typical log analysis based Web

service management procedure is to ﬁrst parse raw log messages

because of their unstructured format; and then apply data mining

models to extract critical system behavior information, which can

assist Web service management. Most of the existing log parsing

methods focus on ofﬂine, batch processing of logs. However, as

the volume of logs increases rapidly, model training of ofﬂine

log parsing methods, which employs all existing logs after log

collection, becomes time consuming. To address this problem,

we propose an online log parsing method, namely Drain, that

can parse logs in a streaming and timely manner. To accelerate

the parsing process, Drain uses a ﬁxed depth parse tree, which

encodes specially designed rules for parsing. We evaluate Drain

on ﬁve real-world log data sets with more than 10 million raw

log messages. The experimental results show that Drain has the

highest accuracy on four data sets, and comparable accuracy

on the remaining one. Besides, Drain obtains 51.85%∼81.47%

improvement in running time compared with the state-of-the-

art online parser. We also conduct a case study on an anomaly

detection task using Drain in the parsing step, which determines

the effectiveness of Drain in log analysis.

Index Terms—Log parsing; Online algorithm; Log analysis;

Web service management;

I. INTRODUCTION

The prevalence of cloud computing, which enables on-

demand service delivery, has made Service-oriented Architec-

ture (SOA) a dominant architectural style. Nowadays, more

and more developers leverage existing Web services to build

their own systems because of their rich functionality and

“plug-and-play” property. Although developing Web service

based system is convenient and lightweight, Web service man-

agement is a signiﬁcant challenge for both service providers

and users. Speciﬁcally, service providers (e.g., Amazon EC2

[1]) are expected to provide services with no failures or SLA

(service-level agreement) violations to a large number of users.

Similarly, service users need to effectively and efﬁciently

manage the adopted services, which have been discussed in

many recent works (e.g., Web service monitoring [2]). In this

context, log analysis based service management techniques,

which employ service logs to achieve automatic or semi-

automatic service management, have been widely studied.

Logs are usually the only data resource available that

records service runtime information. In general, a log message

is a line of text printed by logging statements (e.g., printf(),

logging.info()) written by developers. Thus, log analysis tech-

niques, which apply data mining models to get insights of sys-

tem behaviors, are in widespread use for service management.

For service providers, there are studies in anomaly detection

[3], [4], fault diagnosis [5], [6] and performance improvement

[7]. For service users, typical examples include business model

mining [8], [9] and user behavior analysis [10], [11].

Most of the data mining models used in these log analysis

techniques require structured input (e.g., an event list or a

matrix). However, raw log messages are usually unstructured,

because developers are allowed to write free-text log messages

in source code. Thus, the ﬁrst step of log analysis is log

parsing, where unstructured log messages are transformed into

structured events. An unstructured log message, as in the

following example, usually contains various forms of system

runtime information: timestamp (records the occurring time

of an event), verbosity level (indicate the severity level of

an event, e.g., INFO), and raw message content (free-text

description of a service operation).

081109 204655 556 INFO dfs.DataNode$PacketResponder

: Received block blk_3587508140051953248 of size 67

108864 from /10.251.42.84

Traditionally, log parsing relies heavily on regular expres-

sions [12], which are designed and maintained manually by

developers. However, this manual method is not suitable for

logs generated by modern services for the following three

reasons. First, the volume of logs is increasing rapidly, which

makes the manual method prohibitive. For example, a large-

scale service system can generate 50 GB logs (120∼200

million lines) per hour [13]. Second, as open-source platforms

(e.g., Github) and Web service become popular, a system often

consists of components written by hundreds of developers

globally [3]. Thus, people in charge of the regular expressions

may not know the original logging purpose, which makes

manual management even harder. Third, logging statements

in modern systems updates frequently (e.g., hundreds of new

logging statements every month [14]). In order to maintain

a correct regular expression set, developers need to check all

logging statements regularly, which is tedious and error-prone.

Log parsing is widely studied to parse the raw log messages

automatically. Most of existing log parsers focus on ofﬂine,

batch processing. For example, Xu et al. [3] design a method

2017 IEEE 24th International Conference on Web Services

DOI 10.1109/ICWS.2017.13

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38671628

粉丝: 8
资源: 942

Drain：基于固定深度树的在线日志解析新方法

drain:heroku 的日志流失

美国伯克利著名结构抗震分析程序drain2DX的源代码

用日志解析器Drain3的示例代码

如何使用日志解析器Drain3

IRF230主开关引脚图

NCP1207引脚功能

how to plumb in a washing machine

OUTPUT_OPEN_DRAIN OUTPUT_OPEN_DRAIN

kubectl drain命令怎么用

使用 Kubernetes go API 客户端库设置节点执行 drain 操作的代码示例

open drain和push pull

请写k8s设置节点执行 drain 操作的api代码

kubectl drain命令

输出引脚类型有哪些，对应缩写是什么

open drain output

FPGA PIN opendrain

writer.drain的功能

open drain与pull up

CMOS Driver和 Open-Drain NMOS Driver的区别

最新资源