《Kafka权威指南》：构建大规模实时流处理应用的基石

需积分: 2 199 浏览量更新于2024-07-15 收藏 9.95MB PDF 举报

"《Kafka权威指南》(Kafka: The Definitive Guide)是一本由NehaNarkhede、Gwen Shapira和Todd Palino合著的关于Apache Kafka的详尽指南。这本书旨在帮助学习者深入理解并掌握实时数据处理和大规模流处理技术。它经过了充分的测试和质量保证，不仅提供了Java客户端支持，还包括Python、C/C++和.NET等其他语言的客户端。对于想要使用Kafka构建健壮的流式应用的读者来说，这是一本不可或缺的参考资料。书中涵盖了Kafka的核心概念，如消息队列（Message Queue）的原理和设计，以及如何在分布式环境中高效地处理实时数据流。它还探讨了Kafka的架构，包括生产者、消费者、主题（Topics）、分区（Partitions）和复制策略等关键组件。此外，书中还特别提到了Confluent Enterprise的升级路径，这是一种基于Apache Kafka的增强版，提供了一系列企业级功能，如Schema Registry用于数据模型管理，REST Proxy则简化了API的访问。本书适合于初学者和有经验的开发者，无论你是想要学习Kafka的基本用法还是深入研究其高级特性，都能从中受益匪浅。作者们通过实例和实战演示，帮助读者理解如何在实际项目中应用Kafka进行实时数据处理，无论是数据收集、实时分析，还是构建微服务架构中的事件驱动系统。为了方便读者下载和获取更多资源，书后提供了链接引导读者访问Confluent官网，那里可以下载到100%开源的Apache Kafka发行版。版权方面，该书受到法律保护，所有权利归NehaNarkhede、Gwen Shapira和Todd Palino所有。《Kafka权威指南》是一本全面且实用的指南，是Kafka技术栈的学习者和实践者的必备工具，它为大数据和实时应用提供了强大而灵活的解决方案。"

what Apache Kafka is built to be. Getting used to this way of thinking about data

might be a little different than what you’re used to, but it turns out to be an incredibly

powerful abstraction for building applications and architectures. Kafka is often com‐

pared to a couple of existing technology categories: enterprise messaging systems, big

data systems like Hadoop, and data integration or ETL tools. Each of these compari‐

sons has some validity but also falls a little short.

Kafka is like a messaging system in that it lets you publish and subscribe to streams of

messages. In this way, it is similar to products like ActiveMQ, RabbitMQ, IBM’s

MQSeries, and other products. But even with these similarities, Kafka has a number

of core differences from traditional messaging systems that make it another kind of

animal entirely. Here are the big three differences: first, it works as a modern dis‐

tributed system that runs as a cluster and can scale to handle all the applications in

even the most massive of companies. Rather than running dozens of individual mes‐

saging brokers, hand wired to different apps, this lets you have a central platform that

can scale elastically to handle all the streams of data in a company. Secondly, Kafka is

a true storage system built to store data for as long as you might like. This has huge

advantages in using it as a connecting layer as it provides real delivery guarantees—its

data is replicated, persistent, and can be kept around as long as you like. Finally, the

world of stream processing raises the level of abstraction quite significantly. Messag‐

ing systems mostly just hand out messages. The stream processing capabilities in

Kafka let you compute derived streams and datasets dynamically off of your streams

with far less code. These differences make Kafka enough of its own thing that it

doesn’t really make sense to think of it as “yet another queue.”

Another view on Kafka—and one of our motivating lenses in designing and building

it—was to think of it as a kind of real-time version of Hadoop. Hadoop lets you store

and periodically process file data at a very large scale. Kafka lets you store and contin‐

uously process streams of data, also at a large scale. At a technical level, there are defi‐

nitely similarities, and many people see the emerging area of stream processing as a

superset of the kind of batch processing people have done with Hadoop and its vari‐

ous processing layers. What this comparison misses is that the use cases that continu‐

ous, low-latency processing opens up are quite different from those that naturally fall

on a batch processing system. Whereas Hadoop and big data targeted analytics appli‐

cations, often in the data warehousing space, the low latency nature of Kafka makes it

applicable for the kind of core applications that directly power a business. This makes

sense: events in a business are happening all the time and the ability to react to them

as they occur makes it much easier to build services that directly power the operation

of the business, feed back into customer experiences, and so on.

The final area Kafka gets compared to is ETL or data integration tools. After all, these

tools move data around, and Kafka moves data around. There is some validity to this

as well, but I think the core difference is that Kafka has inverted the problem. Rather

than a tool for scraping data out of one system and inserting it into another, Kafka is

xiv | Foreword

Preface

The greatest compliment you can give an author of a technical book is “This is the

book I wish I had when I got started with this subject.” This is the goal we set for our‐

selves when we started writing this book. We looked back at our experience writing

Kafka, running Kafka in production, and helping many companies use Kafka to build

software architectures and manage their data pipelines and we asked ourselves,

“What are the most useful things we can share with new users to take them from

beginner to experts?” This book is a reflection of the work we do every day: run

Apache Kafka and help others use it in the best ways.

We included what we believe you need to know in order to successfully run Apache

Kafka in production and build robust and performant applications on top of it. We

highlighted the popular use cases: message bus for event-driven microservices,

stream-processing applications, and large-scale data pipelines. We also focused on

making the book general and comprehensive enough so it will be useful to anyone

using Kafka, no matter the use case or architecture. We cover practical matters such

as how to install and configure Kafka and how to use the Kafka APIs, and we also

dedicated space to Kafka’s design principles and reliability guarantees, and explore

several of Kafka’s delightful architecture details: the replication protocol, controller,

and storage layer. We believe that knowledge of Kafka’s design and internals is not

only a fun read for those interested in distributed systems, but it is also incredibly

useful for those who are seeking to make informed decisions when they deploy Kafka

in production and design applications that use Kafka. The better you understand how

Kafka works, the more you can make informed decisions regarding the many trade-

offs that are involved in engineering.

One of the problems in software engineering is that there is always more than one

way to do anything. Platforms such as Apache Kafka provide plenty of flexibility,

which is great for experts but makes for a steep learning curve for beginners. Very

often, Apache Kafka tells you how to use a feature but not why you should or

shouldn’t use it. Whenever possible, we try to clarify the existing choices, the trade‐

xvii

offs involved, and when you should and shouldn’t use the different options presented

by Apache Kafka.

Who Should Read This Book

Kaa: e Denitive Guide was written for software engineers who develop applica‐

tions that use Kafka’s APIs and for production engineers (also called SREs, devops, or

sysadmins) who install, configure, tune, and monitor Kafka in production. We also

wrote the book with data architects and data engineers in mind—those responsible

for designing and building an organization’s entire data infrastructure. Some of the

chapters, especially chapters 3, 4, and 11 are geared toward Java developers. Those

chapters assume that the reader is familiar with the basics of the Java programming

language, including topics such as exception handling and concurrency. Other chap‐

ters, especially chapters 2, 8, 9, and 10, assume the reader has some experience run‐

ning Linux and some familiarity with storage and network configuration in Linux.

The rest of the book discusses Kafka and software architectures in more general

terms and does not assume special knowledge.

Another category of people who may find this book interesting are the managers and

architects who don’t work directly with Kafka but work with the people who do. It is

just as important that they understand the guarantees that Kafka provides and the

trade-offs that their employees and coworkers will need to make while building

Kafka-based systems. The book can provide ammunition to managers who would

like to get their staff trained in Apache Kafka or ensure that their teams know what

they need to know.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program ele‐

ments such as variable or function names, databases, data types, environment

variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values deter‐

mined by context.

xviii | Preface

剩余322页未读，继续阅读

scudehua

粉丝: 2
资源: 3

《Kafka权威指南》：构建大规模实时流处理应用的基石

Kafka the Definitive Guide 2nd Edition

Kafka The Definitive Guide

kafka-definitive-guide pdf

图解 kafka 之实战指南.pdf

kafka-client0.10.0.1.jar

error fatal error during kafkaserver startup. prepare to shutdown (kafka.server.kafkaserver) kafka.zookeeper.zookeeperclienttimeoutexception: timed out waiting for connection while in state: connecting

其他服务器上怎样验证kafka地址：192.168.100.221:21007 是否能连通

kafka报错： org.apache.kafka.requests.IsolationLevel

最新资源