Apache Flink 1.7实战:流处理与状态管理

需积分: 10 28 下载量 141 浏览量 更新于2024-07-17 收藏 9.13MB PDF 举报
《流处理与Apache Flink》是一本由Flume PMC成员编写的高级指南,出版于2019年4月,特别针对Apache Flink 1.7版本。本书提供了深入理解和实践流处理的基础、实现以及操作技术,适合那些希望探索和掌握实时数据处理的读者。 书中首先介绍了状态ful流处理的概念,传统数据基础设施通常分为事务处理和分析处理两部分。事件驱动应用和数据管道是状态ful流处理的重要应用场景,而流式分析则是现代数据处理的重要趋势。作者回顾了开源流处理的历史,展示了Flink如何在这个领域中发展,并引导读者如何运行他们的第一个Flink应用程序。 在第二部分,作者详细阐述了流处理的基本原理。数据流编程是核心概念,通过数据流图展示数据的流动和转换过程。并深入探讨了并行处理在流处理中的作用,包括数据并行性和任务并行性,以及不同的数据交换策略。理解延迟和吞吐量对于优化系统性能至关重要。 时间概念在流处理中扮演着关键角色。书里解释了在流处理中,“一分钟”可能并不像传统意义上的时间那样简单。处理时间和事件时间是两种主要的时间模型,处理时间关注消息的即时处理,而事件时间则更注重事件的实际发生时间。水印机制在此时起到保持数据一致性的作用。此外,作者还比较了这两种时间模型的特点和应用场景。 最后,书中讨论了状态在流处理中的重要性,以及Flink的特定一致性模型,这对于确保在高吞吐量下数据的正确性和持久性至关重要。通过学习这些基础知识,读者将能够设计、实现和管理复杂的实时数据处理系统,充分利用Apache Flink的强大功能。 《流处理与Apache Flink》是一本实用且理论结合实际的教程,无论是对初学者还是经验丰富的开发者,都能从中收获关于Flink的深入理解和实战技巧。无论是对于构建实时数据管道,还是进行实时分析和事件驱动应用,这本书都是不可或缺的参考资料。
2019-03-11 上传
注:Stream Processing with Apache Flink网页版 Book Description With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You’ll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released. Get started with Apache Flink, the open source framework that enables you to process streaming data—such as user interactions, sensor data, and machine logs—as it arrives. With this practical guide, you’ll learn how to use Apache Flink’s stream processing APIs to implement, continuously run, and maintain real-world applications. Authors Fabian Hueske, one of Flink’s creators, and Vasia Kalavri, a core contributor to Flink’s graph processing API (Gelly), explains the fundamental concepts of parallel stream processing and shows you how streaming analytics differs from traditional batch data analysis. Software engineers, data engineers, and system administrators will learn the basics of Flink’s DataStream API, including the structure and components of a common Flink streaming application. Solve real-world problems with Apache Flink’s DataStream API Set up an environment for developing stream processing applications for Flink Design streaming applications and migrate periodic batch workloads to continuous streaming workloads Learn about windowed operations that process groups of records Ingest data streams into a DataStream application and emit a result stream into different storage systems Implement stateful and custom operators common in stream processing applications Operate, maintain, and update continuously running Flink streaming applications Explore several deployment options, including the setup of highly available installations