《Kafka权威指南》：构建大规模实时流处理应用

需积分: 9 157 浏览量更新于2024-07-17 收藏 6.9MB PDF 举报

《Kafka权威指南》是一本由NehaNarkhede、GwenShapira和ToddPalino合著的专业书籍，专为读者提供Apache Kafka的深入理解和实践经验。这本书是针对实时数据处理和大规模流处理领域的权威指南，适合那些希望在构建高可用、可扩展的流处理应用时寻求技术细节和技术路线的人。 Kafka是一个开源的分布式流处理平台，特别强调低延迟、高吞吐量和持久化。它设计用于处理大规模的数据流，常被用于日志收集、事件驱动的系统以及实时分析场景。书中对Kafka的核心组件进行了详尽的介绍，包括： 1. **消息生产者（Producers）**：负责将数据发布到Kafka主题（Topic），支持多种编程语言的客户端API，如Python、C/C++和.NET。 2. **消息队列（Brokers）**：存储分区（Partitions）中的消息，每个分区由一个或多个实际运行的服务器（Broker）管理。Kafka采用复制机制来提高数据可靠性。 3. **主题（Topics）**：消息的容器，可以根据需要创建和管理，具有水平扩展性。每个主题被划分为多个分区，每个分区都有一个或多个副本，确保数据持久性和容错性。 4. **消费者（Consumers）**：读取主题中的消息，可以并行处理多个分区，支持消费组（Consumer Group）的概念，使得消费过程更加高效和可靠。 5. **Kafka Connect**：提供了统一的接口，允许与其他系统和服务集成，如数据库、NoSQL存储、Hadoop等，实现数据的持久化和异构系统之间的数据交换。 6. **Schema Registry**：负责存储和管理主题的元数据，确保数据结构的一致性，即使主题的消费者在不同时间接入也能正确解析数据。 7. **REST Proxy**：提供了安全的API接口，使得外部系统能够与Kafka进行交互，例如查询主题信息、监控性能等。此外，《Kafka权威指南》还涵盖了Kafka的安装、配置、调试和优化技巧，以及如何搭建和管理一个生产级的Kafka集群。书中不仅包含理论知识，还有丰富的实践案例和最佳实践，使读者能快速上手并构建高效稳定的流处理应用。通过阅读本书，无论是初学者还是经验丰富的开发人员，都能获得一个全面且深入的Kafka理解，并掌握如何在实际项目中有效利用Kafka进行实时数据处理和大规模流处理。如果你正在寻找一个全面的Kafka学习资源，这本书无疑是不可或缺的参考资料。

a platform oriented around real-time streams of events. This means that not only can

it connect off-the-shelf applications and data systems, it can power custom applica‐

tions built to trigger off of these same data streams. We think this architecture cen‐

tered around streams of events is a really important thing. In some ways these flows

of data are the most central aspect of a modern digital company, as important as the

cash flows you’d see in a financial statement.

The ability to combine these three areas—to bring all the streams of data together

across all the use cases—is what makes the idea of a streaming platform so appealing

to people.

Still, all of this is a bit different, and learning how to think and build applications ori‐

ented around continuous streams of data is quite a mindshift if you are coming from

the world of request/response style applications and relational databases. This book is

absolutely the best way to learn about Kafka; from internals to APIs, written by some

of the people who know it best. I hope you enjoy reading it as much as I have!

— Jay Kreps

Cofounder and CEO at

Conuent

Foreword | xv

Preface

The greatest compliment you can give an author of a technical book is “This is the

book I wish I had when I got started with this subject.” This is the goal we set for our‐

selves when we started writing this book. We looked back at our experience writing

Kafka, running Kafka in production, and helping many companies use Kafka to build

software architectures and manage their data pipelines and we asked ourselves,

“What are the most useful things we can share with new users to take them from

beginner to experts?” This book is a reflection of the work we do every day: run

Apache Kafka and help others use it in the best ways.

We included what we believe you need to know in order to successfully run Apache

Kafka in production and build robust and performant applications on top of it. We

highlighted the popular use cases: message bus for event-driven microservices,

stream-processing applications, and large-scale data pipelines. We also focused on

making the book general and comprehensive enough so it will be useful to anyone

using Kafka, no matter the use case or architecture. We cover practical matters such

as how to install and configure Kafka and how to use the Kafka APIs, and we also

dedicated space to Kafka’s design principles and reliability guarantees, and explore

several of Kafka’s delightful architecture details: the replication protocol, controller,

and storage layer. We believe that knowledge of Kafka’s design and internals is not

only a fun read for those interested in distributed systems, but it is also incredibly

useful for those who are seeking to make informed decisions when they deploy Kafka

in production and design applications that use Kafka. The better you understand how

Kafka works, the more you can make informed decisions regarding the many trade-

offs that are involved in engineering.

One of the problems in software engineering is that there is always more than one

way to do anything. Platforms such as Apache Kafka provide plenty of flexibility,

which is great for experts but makes for a steep learning curve for beginners. Very

often, Apache Kafka tells you how to use a feature but not why you should or

shouldn’t use it. Whenever possible, we try to clarify the existing choices, the trade‐

xvii

offs involved, and when you should and shouldn’t use the different options presented

by Apache Kafka.

Who Should Read This Book

Kaa: e Denitive Guide was written for software engineers who develop applica‐

tions that use Kafka’s APIs and for production engineers (also called SREs, devops, or

sysadmins) who install, configure, tune, and monitor Kafka in production. We also

wrote the book with data architects and data engineers in mind—those responsible

for designing and building an organization’s entire data infrastructure. Some of the

chapters, especially chapters 3, 4, and 11 are geared toward Java developers. Those

chapters assume that the reader is familiar with the basics of the Java programming

language, including topics such as exception handling and concurrency. Other chap‐

ters, especially chapters 2, 8, 9, and 10, assume the reader has some experience run‐

ning Linux and some familiarity with storage and network configuration in Linux.

The rest of the book discusses Kafka and software architectures in more general

terms and does not assume special knowledge.

Another category of people who may find this book interesting are the managers and

architects who don’t work directly with Kafka but work with the people who do. It is

just as important that they understand the guarantees that Kafka provides and the

trade-offs that their employees and coworkers will need to make while building

Kafka-based systems. The book can provide ammunition to managers who would

like to get their staff trained in Apache Kafka or ensure that their teams know what

they need to know.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program ele‐

ments such as variable or function names, databases, data types, environment

variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values deter‐

mined by context.

xviii | Preface

剩余321页未读，继续阅读

barcelona5

粉丝: 0
资源: 8

《Kafka权威指南》：构建大规模实时流处理应用

Kafka the Definitive Guide 2nd Edition

Kafka The Definitive Guide

Kafka The Definitive Guide Real-time data and stream processing at scale 无水印pdf

kafka-definitive-guide pdf

kafka_the definitive guide(201707)

Spark: The Definitive Guide: Big Data Processing Made Simple 英文.pdf版

spark the definitive guide(epub)

Java全能学习面试手册——Java精品实战技术书.zip

Kafka权威指南2017：实时数据与流处理实战

Apache Kafka权威指南：实时数据与流处理大规模实践

最新资源