Apache Flume实战:弹性、可扩展的数据流传输

5星 · 超过95%的资源 需积分: 9 278 下载量 39 浏览量 更新于2024-07-22 2 收藏 3.77MB PDF 举报
"《Using Flume: Flexible, Scalable, and Reliable Data Streaming》是由Hari Shreedharan编写的,由O'Reilly Media在2014年出版的专业指南,详细介绍了如何使用Apache Flume进行实时数据流传输。本书旨在帮助操作工程师配置、部署和监控Flume集群,并教导开发者编写自定义插件以适应特定需求。书中包含Flume设计与实现的深度解析,以及其高可扩展性、灵活性和可靠性的关键特性。" Apache Flume是一个强大的工具,专门用于收集、聚合和将大量流式数据写入Hadoop分布式文件系统(HDFS)、Apache HBase、SolrCloud和Elastic Search等系统。它通过作为数据生产者和消费者之间的缓冲区来提供稳定的流量速率。书中的内容包括: 1. **Apache Hadoop和Apache HBase简介**:了解这两个关键的大数据存储和处理框架,它们在大数据生态系统中的角色,以及Flume如何与它们集成。 2. **流式数据使用Apache Flume**:深入理解Flume的工作原理,如何通过Flume实现近实时的数据传输。 3. **源(Sources)**:探讨不同类型的Flume源,这些源可以接收各种数据源的数据,如日志文件、网络套接字等。 4. **通道(Channels)**:学习Flume如何使用通道存储数据,保证数据在传输过程中的持久性和可靠性。 5. **接收器(Sinks)**:研究如何配置和使用Flume接收器将数据写入目标存储系统,如HDFS、HBase等。 6. **拦截器、通道选择器、接收器组和接收器处理器**:这些组件允许对数据进行定制处理,如过滤、转换和格式化。 7. **向Flume发送数据**:通过API接口从自定义应用程序中发送数据到Flume代理的详细方法。 8. **规划、部署和监控Flume**:指导如何根据需求规划Flume集群的架构,以及如何有效地监控运行中的集群以确保稳定性。 本书还提供了代码示例和练习,帮助读者加深对Flume实际应用的理解。无论你是希望提升Flume操作技能的操作工程师,还是希望开发自定义组件的开发者,都能从这本书中获益。通过学习,你将能够构建和管理一个高效、灵活且可靠的Flume数据流处理系统。
2017-03-28 上传
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems. Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub. Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers Dive into key Flume components, including sources that accept data and sinks that write and deliver it Write custom plugins to customize the way Flume receives, modifies, formats, and writes data Explore APIs for sending data to Flume agents from your own applications Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running Table of Contents Chapter 1. Apache Hadoop and Apache HBase: An Introduction Chapter 2. Streaming Data Using Apache Flume Chapter 3. Sources Chapter 4. Channels Chapter 5. Sinks Chapter 6. Interceptors, Channel Selectors, Sink Groups, and Sink Processors Chapter 7. Getting Data into Flume* Chapter 8. Planning, Deploying, and Monitoring Flume