Apache Flume 2版:Hadoop分布式日志收集指南

需积分: 1 1 下载量 9 浏览量 更新于2024-07-19 收藏 2.51MB PDF 举报
《Apache Flume:Hadoop分布式日志收集实战——第二版》是一本深入讲解Apache Flume在Hadoop生态系统中角色的专业书籍。该书由Steve Hoffman撰写,由Packt Publishing出版,面向那些希望设计并实现一系列Flume代理以高效地将流式数据传输到Hadoop环境的读者。作为第二版,它在保留原书的基础上,更新了技术细节和最佳实践,以适应不断发展的大数据技术。 本书的核心内容围绕Apache Flume的设计与实施展开,Flume是一种强大的日志收集工具,特别适用于处理高吞吐量、实时性和可靠性要求极高的数据流。它通过一系列称为“channels”和“sinks”的组件,将数据分发到Hadoop的各个组件如HDFS(Hadoop分布式文件系统)或HBase等,确保数据的可靠存储和处理。Flume设计的核心在于其事件驱动架构,允许开发者根据需要灵活配置数据管道,包括数据清洗、过滤、格式转换等功能。 书中详细介绍了如何设置各种Flume代理(sources、processors和sinks),比如使用AvroSource从网络或文件系统读取数据,使用MemoryChannel暂存数据,以及将数据写入Hadoop的HDFS或发送到HBase等。此外,还会探讨高级主题,如故障恢复机制、监控和日志管理,以及如何在集群环境中部署和扩展Flume。 版权方面,该书享有Packt Publishing的专有权利,所有复制、存储或任何形式的传输均需得到出版商的书面许可。尽管作者和出版社努力确保信息的准确性,但书中提供的内容并不保证无误,也不承担因使用本书信息导致的任何直接或间接损失的责任。 本书适合Hadoop开发者、数据工程师和系统管理员阅读,帮助他们掌握如何在分布式环境下有效管理大规模日志数据,提升数据处理的效率和稳定性。随着大数据和云计算的快速发展,理解并熟练使用Apache Flume对现代IT专业人士来说是一项重要技能。

Warning: No configuration directory set! Use --conf <dir> to override. Info: Including Hadoop libraries found via (/opt/hadoop-3.1.2/bin/hadoop) for HDFS access Info: Including HBASE libraries found via (/opt/hbase-2.2.6/bin/hbase) for HBASE access 错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty Info: Including Hive libraries found via (/opt/hive-3.1.2) for Hive access + exec /opt/jdk1.8.0_351/bin/java -Xmx20m -cp '/opt/flume-1.9.0/lib/*:/opt/hadoop-3.1.2/etc/hadoop:/opt/hadoop-3.1.2/share/hadoop/common/lib/*:/opt/hadoop-3.1.2/share/hadoop/common/*:/opt/hadoop-3.1.2/share/hadoop/hdfs:/opt/hadoop-3.1.2/share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.2/share/hadoop/hdfs/*:/opt/hadoop-3.1.2/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.2/share/hadoop/mapreduce/*:/opt/hadoop-3.1.2/share/hadoop/yarn:/opt/hadoop-3.1.2/share/hadoop/yarn/lib/*:/opt/hadoop-3.1.2/share/hadoop/yarn/*:/opt/hbase-2.2.6/conf:/opt/jdk1.8.0_351//lib/tools.jar:/opt/hbase-2.2.6:/opt/hbase-2.2.6/lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.2.6.jar:/opt/hbase-2.2.6/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/opt/hbase-2.2.6/lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/hbase-2.2.6/lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/opt/hbase-2.2.6/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/opt/hbase-2.2.6/lib/client-facing-thirdparty/log4j-1.2.17.jar:/opt/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/opt/hadoop-3.1.2/etc/hadoop:/opt/hadoop-3.1.2/share/hadoop/common/lib/*:/opt/hadoop-3.1.2/share/hadoop/common/*:/opt/hadoop-3.1.2/share/hadoop/hdfs:/opt/hadoop-3.1.2/share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.2/share/hadoop/hdfs/*:/opt/hadoop-3.1.2/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.2/share/hadoop/mapreduce/*:/opt/hadoop-3.1.2/share/hadoop/yarn:/opt/hadoop-3.1.2/share/hadoop/yarn/lib/*:/opt/hadoop-3.1.2/share/hadoop/yarn/*:/opt/hadoop-3.1.2/etc/hadoop:/opt/hbase-2.2.6/conf:/opt/hive-3.1.2/lib/*' -Djava.library.path=:/opt/hadoop-3.1.2/lib/native org.apache.flume.node.Application --name a1 --conf/opt/flume-1.9.0/conf --conf-file/opt/flume-1.9.0/conf/dhfsspool.conf-Dflume.root.logger=DEBUG,consol SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flume-1.9.0/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2023-06-08 17:26:46,403 ERROR node.Application: A fatal error occurred while running. Exception follows. org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: --conf/opt/flume-1.9.0/conf at org.apache.commons.cli.Parser.processOption(Parser.java:363) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.flume.node.Application.main(Application.java:287)

2023-06-09 上传
2023-06-10 上传

启动flume是报以下错误Info: Including Hadoop libraries found via (/opt/software/hadoop-2.8.3/bin/hadoop) for HDFS access Info: Including Hive libraries found via (/opt/software/hive-2.3.3) for Hive access + exec /opt/jdk1.8.0_261/bin/java -Xmx20m -cp '/opt/software/flume-1.8.0/conf:/opt/software/flume-1.8.0/lib/*:/opt/software/hadoop-2.8.3/etc/hadoop:/opt/software/hadoop-2.8.3/share/hadoop/common/lib/*:/opt/software/hadoop-2.8.3/share/hadoop/common/*:/opt/software/hadoop-2.8.3/share/hadoop/hdfs:/opt/software/hadoop-2.8.3/share/hadoop/hdfs/lib/*:/opt/software/hadoop-2.8.3/share/hadoop/hdfs/*:/opt/software/hadoop-2.8.3/share/hadoop/yarn/lib/*:/opt/software/hadoop-2.8.3/share/hadoop/yarn/*:/opt/software/hadoop-2.8.3/share/hadoop/mapreduce/lib/*:/opt/software/hadoop-2.8.3/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/software/hive-2.3.3/lib/*' -Djava.library.path=:/opt/software/hadoop-2.8.3/lib/native org.apache.flume.node.Application --conf-file /opt/software/flume-1.8.0/conf/hdfs.conf --name agent1 Dflume.root.logger=DEBUG,console SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/software/flume-1.8.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/software/hadoop-2.8.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/software/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.是什么原因

2023-06-10 上传