Apache Kafka：分布式流处理平台详解

需积分: 10 11 浏览量更新于2024-07-18 收藏 8.52MB PDF 举报

"Apache Kafka是发布/订阅消息系统，用于解决数据流处理的问题。它被描述为‘分布式提交日志’或最近被称为‘分布式流处理平台’。Kafka设计为提供持久化的、有序的数据记录，可以重播以一致地构建系统状态。数据在Kafka中持久化存储，按顺序排列，并能以确定性方式读取。此外，数据可以在系统内分布，以提供对故障的额外保护以及性能扩展的机会。" 《Kafka：权威指南》由Neha Narkhede、Gwen Shapira和Todd Palino撰写，是关于实时数据和大规模流处理的详尽指南。这本书由Confluent开源提供，旨在帮助读者构建健壮的流处理应用。Confluent还提供了额外的客户端支持，包括Python、C/C++和.NET，以及易于升级到Confluent Enterprise的路径。 Kafka的核心特性包括： 1. **发布/订阅模式**：Kafka允许生产者发布消息到主题（topics），消费者则订阅这些主题来消费消息。这种模式使得解耦和扩展变得容易。 2. **分布式提交日志**：Kafka将消息存储在分区（partitions）中，每个分区都维护一个有序的消息序列。通过这种方式，它能提供高度可靠的数据流。 3. **持久化和复制**：Kafka将数据持久化到磁盘，并且可以配置多个副本以实现容错。即使在节点失败时，数据也能保持可用。 4. **高性能**：通过数据分区和并行处理，Kafka能够处理大量并发写入和读取操作，从而实现高吞吐量和低延迟。 5. **可扩展性**：随着数据量和用户需求的增长，Kafka可以通过添加更多节点轻松扩展。 6. **Schema Registry**：Confluent提供的Schema Registry用于管理消息的Avro schema，确保数据的兼容性和一致性。 7. **Connectors**：Kafka Connect允许开发者创建和使用连接器，方便地集成其他系统，如数据库、文件系统或外部服务。 8. **REST Proxy**：REST Proxy提供了一种与Kafka交互的HTTP接口，使得非Java应用程序也能轻松使用Kafka。这本书详细介绍了如何使用Kafka进行实时数据处理，涵盖了设置、部署、管理和优化Kafka集群的方方面面，同时讲解了如何构建高效的流处理应用。无论是初学者还是经验丰富的开发人员，都能从中受益，理解并掌握Kafka的核心概念和技术。

a platform oriented around real-time streams of events. This means that not only can

it connect off-the-shelf applications and data systems, it can power custom applica‐

tions built to trigger off of these same data streams. We think this architecture cen‐

tered around streams of events is a really important thing. In some ways these flows

of data are the most central aspect of a modern digital company, as important as the

cash flows you’d see in a financial statement.

The ability to combine these three areas—to bring all the streams of data together

across all the use cases—is what makes the idea of a streaming platform so appealing

to people.

Still, all of this is a bit different, and learning how to think and build applications ori‐

ented around continuous streams of data is quite a mindshift if you are coming from

the world of request/response style applications and relational databases. This book is

absolutely the best way to learn about Kafka; from internals to APIs, written by some

of the people who know it best. I hope you enjoy reading it as much as I have!

— Jay Kreps

Cofounder and CEO at

Conuent

Foreword | xv

Preface

The greatest compliment you can give an author of a technical book is “This is the

book I wish I had when I got started with this subject.” This is the goal we set for our‐

selves when we started writing this book. We looked back at our experience writing

Kafka, running Kafka in production, and helping many companies use Kafka to build

software architectures and manage their data pipelines and we asked ourselves,

“What are the most useful things we can share with new users to take them from

beginner to experts?” This book is a reflection of the work we do every day: run

Apache Kafka and help others use it in the best ways.

We included what we believe you need to know in order to successfully run Apache

Kafka in production and build robust and performant applications on top of it. We

highlighted the popular use cases: message bus for event-driven microservices,

stream-processing applications, and large-scale data pipelines. We also focused on

making the book general and comprehensive enough so it will be useful to anyone

using Kafka, no matter the use case or architecture. We cover practical matters such

as how to install and configure Kafka and how to use the Kafka APIs, and we also

dedicated space to Kafka’s design principles and reliability guarantees, and explore

several of Kafka’s delightful architecture details: the replication protocol, controller,

and storage layer. We believe that knowledge of Kafka’s design and internals is not

only a fun read for those interested in distributed systems, but it is also incredibly

useful for those who are seeking to make informed decisions when they deploy Kafka

in production and design applications that use Kafka. The better you understand how

Kafka works, the more you can make informed decisions regarding the many trade-

offs that are involved in engineering.

One of the problems in software engineering is that there is always more than one

way to do anything. Platforms such as Apache Kafka provide plenty of flexibility,

which is great for experts but makes for a steep learning curve for beginners. Very

often, Apache Kafka tells you how to use a feature but not why you should or

shouldn’t use it. Whenever possible, we try to clarify the existing choices, the trade‐

xvii

offs involved, and when you should and shouldn’t use the different options presented

by Apache Kafka.

Who Should Read This Book

Kaa: e Denitive Guide was written for software engineers who develop applica‐

tions that use Kafka’s APIs and for production engineers (also called SREs, devops, or

sysadmins) who install, configure, tune, and monitor Kafka in production. We also

wrote the book with data architects and data engineers in mind—those responsible

for designing and building an organization’s entire data infrastructure. Some of the

chapters, especially chapters 3, 4, and 11 are geared toward Java developers. Those

chapters assume that the reader is familiar with the basics of the Java programming

language, including topics such as exception handling and concurrency. Other chap‐

ters, especially chapters 2, 8, 9, and 10, assume the reader has some experience run‐

ning Linux and some familiarity with storage and network configuration in Linux.

The rest of the book discusses Kafka and software architectures in more general

terms and does not assume special knowledge.

Another category of people who may find this book interesting are the managers and

architects who don’t work directly with Kafka but work with the people who do. It is

just as important that they understand the guarantees that Kafka provides and the

trade-offs that their employees and coworkers will need to make while building

Kafka-based systems. The book can provide ammunition to managers who would

like to get their staff trained in Apache Kafka or ensure that their teams know what

they need to know.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program ele‐

ments such as variable or function names, databases, data types, environment

variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values deter‐

mined by context.

xviii | Preface

剩余321页未读，继续阅读

lylsdu

粉丝: 0
资源: 2

Apache Kafka：分布式流处理平台详解

R Graphs Cookbook Second Edition

Kafka_The Definitive Guide_Real-Time Data and Stream Processing at Scale

docker run -it --name kafka01 -p 19092:9092 -d -e KAFKA_BROKER_ID=0 -e KAFKA_ZOOKEEPER_CONNECT=192.168.233.129:12181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.233.129:19092 -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 wurstmeister/kafka:latest 根据这条命令帮我写个docker-compose

有关于rd_kafka_seek_partitions的代码范例不？

cd /mnt/kafka_2.12-2.4.1sh kafka_start.sh #启动kafka

KAFKA_LISTENER_SECURITY_PROTOCOL_MAP

翻译代码/opt/tiger/kafka_2.11-2.1.1/bin/kafka-console-consumer.sh --bootstrap-server $(sd config kafka_vpc) --topic test_topic

kafka_kraft_cluster_id not set - if using multiple nodes then you must use t

kafka的KAFKA_CREATE_TOPICS是什么

最新资源