实时大数据流处理：Kafka权威指南

需积分: 0 173 浏览量更新于2024-07-19 收藏 6.86MB PDF 举报

"Kafka The Definitive Guide" 是一本由 Neha Narkhede、Gwen Shapira 和 Todd Palino 合著的专业书籍，专注于实时数据和大规模流处理。这本书深入探讨了 Apache Kafka 的使用，它是一个分布式消息系统，用于构建强健的流处理应用程序。书中强调了数据传输速度和效率对于企业敏捷性和响应能力的重要性，并指出数据管道在数据驱动的企业中的关键作用。 Kafka 是一个核心组件，它允许快速地从数据生成点传输到分析点，减少了在数据管理上的精力投入，使企业能更专注于核心业务。书中的内容涵盖了 Kafka 的各种特性，如它的高吞吐量、低延迟特性和广泛支持的客户端库，包括 Python、C/C++ 和 .NET。此外，还提到了 Confluent 提供的开源连接器、客户端、模式注册表和 REST 代理，这些工具进一步增强了 Kafka 的功能和易用性。通过这本书，读者可以了解到如何利用 Kafka 构建可靠的实时数据流处理系统，包括设置、操作和优化 Kafka 集群的实践指导。作者们分享了他们在数据流处理领域的专业知识，帮助读者理解和应用 Kafka 在现代企业架构中的最佳实践。 Kafka 的核心概念包括生产者、消费者、主题和分区，这些元素共同确保了数据的可靠传输和处理。生产者负责发布消息到主题，消费者则订阅并处理这些消息。主题可以被划分为多个分区，提供水平扩展性和容错性。Kafka 还支持数据保留策略，允许用户根据需求设置消息存储时间，平衡存储成本和数据可访问性。本书还涵盖了 Kafka Connect，这是一个用于集成外部系统的框架，简化了与其他数据源和接收器（如数据库、Hadoop 或 Elasticsearch）的数据同步。此外，模式注册表是 Kafka 中的一个关键组件，它确保了在整个系统中数据的一致性，避免了因解析不匹配而引发的问题。 "Kafka The Definitive Guide" 是一本全面的指南，适合数据工程师、架构师和开发人员阅读，他们希望利用 Kafka 来构建高效、可扩展的实时数据处理解决方案。通过深入学习这本书，读者将能够掌握 Kafka 的核心原理，以及如何在实际项目中有效地运用它来提升企业的数据处理能力。

a platform oriented around real-time streams of events. This means that not only can

it connect off-the-shelf applications and data systems, it can power custom applica‐

tions built to trigger off of these same data streams. We think this architecture cen‐

tered around streams of events is a really important thing. In some ways these flows

of data are the most central aspect of a modern digital company, as important as the

cash flows you’d see in a financial statement.

The ability to combine these three areas—to bring all the streams of data together

across all the use cases—is what makes the idea of a streaming platform so appealing

to people.

Still, all of this is a bit different, and learning how to think and build applications ori‐

ented around continuous streams of data is quite a mindshift if you are coming from

the world of request/response style applications and relational databases. This book is

absolutely the best way to learn about Kafka; from internals to APIs, written by some

of the people who know it best. I hope you enjoy reading it as much as I have!

— Jay Kreps

Cofounder and CEO at

Conuent

Foreword | xv

Preface

The greatest compliment you can give an author of a technical book is “This is the

book I wish I had when I got started with this subject.” This is the goal we set for our‐

selves when we started writing this book. We looked back at our experience writing

Kafka, running Kafka in production, and helping many companies use Kafka to build

software architectures and manage their data pipelines and we asked ourselves,

“What are the most useful things we can share with new users to take them from

beginner to experts?” This book is a reflection of the work we do every day: run

Apache Kafka and help others use it in the best ways.

We included what we believe you need to know in order to successfully run Apache

Kafka in production and build robust and performant applications on top of it. We

highlighted the popular use cases: message bus for event-driven microservices,

stream-processing applications, and large-scale data pipelines. We also focused on

making the book general and comprehensive enough so it will be useful to anyone

using Kafka, no matter the use case or architecture. We cover practical matters such

as how to install and configure Kafka and how to use the Kafka APIs, and we also

dedicated space to Kafka’s design principles and reliability guarantees, and explore

several of Kafka’s delightful architecture details: the replication protocol, controller,

and storage layer. We believe that knowledge of Kafka’s design and internals is not

only a fun read for those interested in distributed systems, but it is also incredibly

useful for those who are seeking to make informed decisions when they deploy Kafka

in production and design applications that use Kafka. The better you understand how

Kafka works, the more you can make informed decisions regarding the many trade-

offs that are involved in engineering.

One of the problems in software engineering is that there is always more than one

way to do anything. Platforms such as Apache Kafka provide plenty of flexibility,

which is great for experts but makes for a steep learning curve for beginners. Very

often, Apache Kafka tells you how to use a feature but not why you should or

shouldn’t use it. Whenever possible, we try to clarify the existing choices, the trade‐

xvii

offs involved, and when you should and shouldn’t use the different options presented

by Apache Kafka.

Who Should Read This Book

Kaa: e Denitive Guide was written for software engineers who develop applica‐

tions that use Kafka’s APIs and for production engineers (also called SREs, devops, or

sysadmins) who install, configure, tune, and monitor Kafka in production. We also

wrote the book with data architects and data engineers in mind—those responsible

for designing and building an organization’s entire data infrastructure. Some of the

chapters, especially chapters 3, 4, and 11 are geared toward Java developers. Those

chapters assume that the reader is familiar with the basics of the Java programming

language, including topics such as exception handling and concurrency. Other chap‐

ters, especially chapters 2, 8, 9, and 10, assume the reader has some experience run‐

ning Linux and some familiarity with storage and network configuration in Linux.

The rest of the book discusses Kafka and software architectures in more general

terms and does not assume special knowledge.

Another category of people who may find this book interesting are the managers and

architects who don’t work directly with Kafka but work with the people who do. It is

just as important that they understand the guarantees that Kafka provides and the

trade-offs that their employees and coworkers will need to make while building

Kafka-based systems. The book can provide ammunition to managers who would

like to get their staff trained in Apache Kafka or ensure that their teams know what

they need to know.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program ele‐

ments such as variable or function names, databases, data types, environment

variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values deter‐

mined by context.

xviii | Preface

剩余321页未读，继续阅读

wzzf819

粉丝: 3

实时大数据流处理：Kafka权威指南

kafka the definitive guide

Kafka the Definitive Guide 2nd Edition

Kafka The Definitive Guide Real-Time Data and Stream Processing at Scale epub

Kafka The Definitive Guide Real-time data and stream processing at scale 无水印pdf

Kafka The Definitive Guide Real-Time Data and Stream Processing at Scale azw3

kafka_the definitive guide(201707)

cole_02_0507.pdf

工程硕士开题报告：无线传感器网络路由技术及能量优化LEACH协议研究

【东海期货-2025研报】东海贵金属周度策略：金价高位回落，阶段性回调趋势初现.pdf

图像数据处理工具+数据(帮助用户快速划分数据集并增强图像数据集。通过自动化数据处理流程，简化了深度学习项目的数据准备工作)

最新资源