Flume实战:灵活构建高效大数据实时流处理

需积分: 11 1 下载量 166 浏览量 更新于2024-07-19 收藏 4.76MB PDF 举报
《使用Flume:灵活、可扩展与可靠的数据流处理》是一本深入浅出的实用指南,专为数据工程师和开发者设计。作者Hari Shreedharan通过这本书,提供了如何轻松设置和部署Flume管道的全面教程,让读者能够理解如何在实时环境中将前端服务器的数据高效地传输到Hadoop分布式文件系统(HDFS)、Apache HBase、Solr Cloud、Elasticsearch等大数据存储和分析平台。 该书强调了Flume的强大功能,它是一种灵活的数据收集工具,特别适合于处理大量实时数据流。对于操作工程师而言,书中详尽介绍了如何配置、部署和监控Flume集群,确保系统的稳定性和可靠性。同时,它也为开发者提供了宝贵的实践指导,教会他们如何编写Flume插件和定制组件,以适应特定的应用场景。 对于希望在大数据领域实现快速、连续的数据摄入,并充分利用Hadoop生态系统的企业或团队,这本1积分英文书籍无疑是一份极具价值的参考资料。作者通过实例和理论相结合的方式,帮助读者掌握Flume架构以及其组件设计,以便更好地进行数据集成和处理。作为O'Reilly Media出版的作品,它还得到了StreamSets CTO Arvind Prabhakar的高度评价,他指出这本书对于理解和实施高效的Hadoop数据流解决方案具有深远的影响。 无论是为了满足日常运维需求还是寻求技术升级,通过阅读《Using Flume》,读者不仅能提升自己的技能,还能确保在数据流处理的世界中保持竞争优势。此外,社交媒体上O'Reilly Media和Facebook页面的存在,表明这本书不仅在学术界受到认可,也受到了业界专业人士的广泛关注。 《Using Flume:灵活、可扩展、可靠的数据流处理》是一本不可或缺的数据流管理工具书,它涵盖了从基础配置到高级定制的全方位知识,适合不同角色的读者在实际工作中应用和优化数据处理流程。
2017-03-28 上传
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems. Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub. Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers Dive into key Flume components, including sources that accept data and sinks that write and deliver it Write custom plugins to customize the way Flume receives, modifies, formats, and writes data Explore APIs for sending data to Flume agents from your own applications Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running Table of Contents Chapter 1. Apache Hadoop and Apache HBase: An Introduction Chapter 2. Streaming Data Using Apache Flume Chapter 3. Sources Chapter 4. Channels Chapter 5. Sinks Chapter 6. Interceptors, Channel Selectors, Sink Groups, and Sink Processors Chapter 7. Getting Data into Flume* Chapter 8. Planning, Deploying, and Monitoring Flume