构建企业级实时流处理：Apache Kafka详解

需积分: 9 126 浏览量更新于2024-07-19 收藏 8.48MB PDF 举报

《Kafka权威指南》是一本深入剖析Apache Kafka技术的著作，由NehaNarkhede、Gwen Shapira和Todd Palino共同编撰。本书旨在帮助企业在大规模实时数据处理和流式计算领域实现高效运作。在当今信息化社会，数据是企业的生命线，无论是日志消息、指标、用户活动还是其他形式的数据，它们都承载着重要的信息，驱动决策和业务流程。 Kafka是一个分布式流处理平台，特别适合于处理大量实时数据，并且具有高吞吐量和低延迟的特点。它采用发布订阅模型，使得生产者能够将数据发布到主题（topic），而消费者则订阅这些主题并获取数据。这种设计使得Kafka能够实现实时的数据流动，支持复杂的应用场景，如日志收集、监控警报、机器学习等。书中详细介绍了Kafka的核心组件和工作原理，包括： 1. Broker：Kafka的核心节点，负责存储和复制数据。它们构成一个集群，确保数据的高可用性和可靠性。 2. Topic：类似于数据库表，是消息的逻辑分类，消费者和生产者通过主题进行交互。 3. Partitioning：将主题划分为多个分区，每个分区都有一个领导者，提高了并发性能和数据并行处理能力。 4. Producer：应用程序端用于发送消息到Kafka的组件，可以设置消息的持久化策略和确认机制。 5. Consumer：接收并处理从Kafka主题中发布的消息，支持多种消费模式，如拉取和推送到消费者。 6. Replication：Kafka通过副本机制保证数据的一致性和容错性，即使部分broker故障，数据仍可从其他副本恢复。 7. Offset Management：跟踪消费者对每个分区的读取位置，使得消息消费具有顺序性和幂等性。 8. Streaming Applications：书中还探讨了如何利用Kafka构建实时流处理应用，包括开发工具、最佳实践以及如何与其他技术（如Spark Streaming、Flink等）集成。《Kafka权威指南》不仅提供理论知识，还包含了大量的实战示例和案例分析，让读者能够快速理解和掌握Kafka的使用。此外，书中还提到了Confluent Enterprise的增值服务，如客户端支持、Schema Registry（元数据管理服务）和REST Proxy（提供API接口），这些对于企业级部署具有实际价值。《Kafka权威指南》是所有希望在大数据处理和实时流处理领域深入学习和实践的IT专业人士不可或缺的参考书籍，无论你是初次接触Kafka，还是想要提升现有技能，这本书都能提供详尽的指导和支持。

a platform oriented around real-time streams of events. This means that not only can

it connect off-the-shelf applications and data systems, it can power custom applica‐

tions built to trigger off of these same data streams. We think this architecture cen‐

tered around streams of events is a really important thing. In some ways these flows

of data are the most central aspect of a modern digital company, as important as the

cash flows you’d see in a financial statement.

The ability to combine these three areas—to bring all the streams of data together

across all the use cases—is what makes the idea of a streaming platform so appealing

to people.

Still, all of this is a bit different, and learning how to think and build applications ori‐

ented around continuous streams of data is quite a mindshift if you are coming from

the world of request/response style applications and relational databases. This book is

absolutely the best way to learn about Kafka; from internals to APIs, written by some

of the people who know it best. I hope you enjoy reading it as much as I have!

— Jay Kreps

Cofounder and CEO at

Conuent

Foreword | xv

Preface

The greatest compliment you can give an author of a technical book is “This is the

book I wish I had when I got started with this subject.” This is the goal we set for our‐

selves when we started writing this book. We looked back at our experience writing

Kafka, running Kafka in production, and helping many companies use Kafka to build

software architectures and manage their data pipelines and we asked ourselves,

“What are the most useful things we can share with new users to take them from

beginner to experts?” This book is a reflection of the work we do every day: run

Apache Kafka and help others use it in the best ways.

We included what we believe you need to know in order to successfully run Apache

Kafka in production and build robust and performant applications on top of it. We

highlighted the popular use cases: message bus for event-driven microservices,

stream-processing applications, and large-scale data pipelines. We also focused on

making the book general and comprehensive enough so it will be useful to anyone

using Kafka, no matter the use case or architecture. We cover practical matters such

as how to install and configure Kafka and how to use the Kafka APIs, and we also

dedicated space to Kafka’s design principles and reliability guarantees, and explore

several of Kafka’s delightful architecture details: the replication protocol, controller,

and storage layer. We believe that knowledge of Kafka’s design and internals is not

only a fun read for those interested in distributed systems, but it is also incredibly

useful for those who are seeking to make informed decisions when they deploy Kafka

in production and design applications that use Kafka. The better you understand how

Kafka works, the more you can make informed decisions regarding the many trade-

offs that are involved in engineering.

One of the problems in software engineering is that there is always more than one

way to do anything. Platforms such as Apache Kafka provide plenty of flexibility,

which is great for experts but makes for a steep learning curve for beginners. Very

often, Apache Kafka tells you how to use a feature but not why you should or

shouldn’t use it. Whenever possible, we try to clarify the existing choices, the trade‐

xvii

offs involved, and when you should and shouldn’t use the different options presented

by Apache Kafka.

Who Should Read This Book

Kaa: e Denitive Guide was written for software engineers who develop applica‐

tions that use Kafka’s APIs and for production engineers (also called SREs, devops, or

sysadmins) who install, configure, tune, and monitor Kafka in production. We also

wrote the book with data architects and data engineers in mind—those responsible

for designing and building an organization’s entire data infrastructure. Some of the

chapters, especially chapters 3, 4, and 11 are geared toward Java developers. Those

chapters assume that the reader is familiar with the basics of the Java programming

language, including topics such as exception handling and concurrency. Other chap‐

ters, especially chapters 2, 8, 9, and 10, assume the reader has some experience run‐

ning Linux and some familiarity with storage and network configuration in Linux.

The rest of the book discusses Kafka and software architectures in more general

terms and does not assume special knowledge.

Another category of people who may find this book interesting are the managers and

architects who don’t work directly with Kafka but work with the people who do. It is

just as important that they understand the guarantees that Kafka provides and the

trade-offs that their employees and coworkers will need to make while building

Kafka-based systems. The book can provide ammunition to managers who would

like to get their staff trained in Apache Kafka or ensure that their teams know what

they need to know.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program ele‐

ments such as variable or function names, databases, data types, environment

variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values deter‐

mined by context.

xviii | Preface

剩余321页未读，继续阅读

xiaoddcc

粉丝: 0
资源: 5

构建企业级实时流处理：Apache Kafka详解

kafka-definitive-guide pdf

Kafka The Definitive Guide

kafka the definitive guide

Kafka The Definitive Guide Real-Time Data and Stream Processing at Scale epub

Kafka The Definitive Guide Real-time data and stream processing at scale 无水印pdf

Kafka The Definitive Guide Real-Time Data and Stream Processing at Scale azw3

Kafka the Definitive Guide 2nd Edition

航空公司客户满意度数据转换与预测分析Power BI案例研究

课题设计-基于MATLAB平台的图像去雾处理+项目源码+文档说明+课题介绍+GUI界面

微信支付V2版本的支付接口，java的SDK

最新资源