高清彩版Kafka_TheDefinitiveGuide_Real-TimeDataandStreamProcessingatScale

kafka

需积分: 9 66 浏览量更新于2023-05-21 评论收藏 4.84MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Kafka: The Definitive Guide

by Gwen Shapira, Neha Narkhede, Todd Palino

Publisher: O'Reilly Media, Inc.

Release Date: September 2017

ISBN: 9781491936160

Topics: Apache Web Server / Message Queues

Book Description

Every enterprise application creates data, whether it’s log messages, metrics, user

activity, outgoing messages, or something else. And how to move all of this data

becomes nearly as important as the data itself. If you’re an application architect,

developer, or production engineer new to Apache Kafka, this practical guide shows you

how to use this open source streaming platform to handle real-time data feeds.

Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain

how to deploy production Kafka clusters, write reliable event-driven microservices,

and build scalable stream-processing applications with this platform. Through detailed

examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs,

and architecture details, including the replication protocol, the controller, and the

storage layer.

 Understand publish-subscribe messaging and how it fits in the big data ecosystem.

 Explore Kafka producers and consumers for writing and reading messages

 Understand Kafka patterns and use-case requirements to ensure reliable data delivery

 Get best practices for building data pipelines and applications with Kafka

 Manage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasks

 Learn the most critical metrics among Kafka’s operational measurements

 Explore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems

Foreword

It’s an exciting time for Apache Kafka. Kafka is being used by tens of thousands of

organizations, including over a third of the Fortune 500 companies. It’s among the

fastest growing open source projects and has spawned an immense ecosystem around it.

It’s at the heart of a movement towards managing and processing streams of data.

So where did Kafka come from? Why did we build it? And what exactly is it?

Kafka got its start as an internal infrastructure system we built at LinkedIn. Our

observation was really simple: there were lots of databases and other systems built to

store

data, but what was missing in our architecture was something that would help us

to handle the continuous

flow

of data. Prior to building Kafka, we experimented with

all kinds of off the shelf options; from messaging systems to log aggregation and ETL

tools, but none of them gave us what we wanted.

We eventually decided to build something from scratch. Our idea was that instead of

focusing on holding piles of data like our relational databases, key-value stores,

search indexes, or caches, we would focus on treating data as a continually evolving

and ever growing stream, and build a data system—and indeed a data architecture—

oriented around that idea.

This idea turned out to be even more broadly applicable than we expected. Though Kafka

got its start powering real-time applications and data flow behind the scenes of a

social network, you can now see it at the heart of next-generation architectures in

every industry imaginable. Big retailers are re-working their fundamental business

processes around continuous data streams; car companies are collecting and processing

real-time data streams from internet-connected cars; and banks are rethinking their

fundamental processes and systems around Kafka as well.

So what is this Kafka thing all about? How does it compare to the systems you already

know and use?

We’ve come to think of Kafka as a

streaming platform

: a system that lets you publish

and subscribe to streams of data, store them, and process them, and that is exactly

what Apache Kafka is built to be. Getting used to this way of thinking about data

might be a little different than what you’re used to, but it turns out to be an

incredibly powerful abstraction for building applications and architectures. Kafka is

often compared to a couple of existing technology categories: enterprise messaging

systems, big data systems like Hadoop, and data integration or ETL tools. Each of

these comparisons has some validity but also falls a little short.

Kafka is like a messaging system in that it lets you publish and subscribe to streams

of messages. In this way, it is similar to products like ActiveMQ, RabbitMQ, IBM’s

MQSeries, and other products. But even with these similarities, Kafka has a number of

core differences from traditional messaging systems that make it another kind of

animal entirely. Here are the big three differences: first, it works as a modern

distributed system that runs as a cluster and can scale to handle all the applications

in even the most massive of companies. Rather than running dozens of individual

messaging brokers, hand wired to different apps, this lets you have a central platform

that can scale elastically to handle all the streams of data in a company. Secondly,

Kafka is a true storage system built to store data for as long as you might like. This

has huge advantages in using it as a connecting layer as it provides real delivery

guarantees—its data is replicated, persistent, and can be kept around as long as you

like. Finally, the world of stream processing raises the level of abstraction quite

significantly. Messaging systems mostly just hand out messages. The stream processing

capabilities in Kafka let you compute derived streams and datasets dynamically off of

your streams with far less code. These differences make Kafka enough of its own thing

that it doesn’t really make sense to think of it as “yet another queue.”

Another view on Kafka—and one of our motivating lenses in designing and building it—

was to think of it as a kind of real-time version of Hadoop. Hadoop lets you store and

periodically process file data at a very large scale. Kafka lets you store and

continuously process streams of data, also at a large scale. At a technical level,

there are definitely similarities, and many people see the emerging area of stream

processing as a superset of the kind of batch processing people have done with Hadoop

and its various processing layers. What this comparison misses is that the use cases

that continuous, low-latency processing opens up are quite different from those that

naturally fall on a batch processing system. Whereas Hadoop and big data targeted

analytics applications, often in the data warehousing space, the low latency nature of

Kafka makes it applicable for the kind of core applications that directly power a

business. This makes sense: events in a business are happening all the time and the

ability to react to them as they occur makes it much easier to build services that

directly power the operation of the business, feed back into customer experiences, and

so on.

The final area Kafka gets compared to is ETL or data integration tools. After all,

these tools move data around, and Kafka moves data around. There is some validity to

this as well, but I think the core difference is that Kafka has inverted the problem.

Rather than a tool for scraping data out of one system and inserting it into another,

Kafka is a platform oriented around real-time streams of events. This means that not

only can it connect off-the-shelf applications and data systems, it can power custom

applications built to trigger off of these same data streams. We think this

architecture centered around streams of events is a really important thing. In some

ways these flows of data are the most central aspect of a modern digital company, as

important as the cash flows you’d see in a financial statement.

剩余342页未读，继续阅读

weixin_38669689

粉丝: 27
资源: 318

会员权益专享

高清彩版 Kafka_The Definitive Guide_Real-Time Data and Stream Proces...

评论0

会员权益专享

最新资源

高清彩版 Kafka_The Definitive Guide_Real-Time Data and Stream Proces...

评论0

Kafka The Definitive Guide Real-time data and stream processing at scale 无水印pdf

Kafka The Definitive Guide Real-Time Data and Stream Processing at Scale epub

Kafka the Definitive Guide 2nd Edition

kafka-topics.sh --create --topic kafka_direct0 --partitions 3--replication-factor 1 --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 Missing required argument "[replication-factor]"这个报错如何解决？

翻译代码/opt/tiger/kafka_2.11-2.1.1/bin/kafka-console-consumer.sh --bootstrap-server $(sd config kafka_vpc) --topic test_topic

bin/kafka-topics.sh --create --zookeeper 10.0.0.151:9192/kafka_vpc_lf --replication-factor 1 --partitions 1 --topic topic_30

docker run -it --name kafka01 -p 19092:9092 -d -e KAFKA_BROKER_ID=0 -e KAFKA_ZOOKEEPER_CONNECT=192.168.233.129:12181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.233.129:19092 -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 wurstmeister/kafka:latest 根据这条命令帮我写个docker-compose

dockerfile构建kafka

kafka_kraft_cluster_id not set - if using multiple nodes then you must use t

kafka查看当前偏移量对应的时间

docker 安装 Kafka 2.8.0 版本

怎么获取kafka topic的group id

Error while executing topic command : Topic 'kafka_direct0' already exists.这个报错如何解决

kafka的KAFKA_CREATE_TOPICS是什么

kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type topics --entity-name kafka_eagle

将kafka 消费组offset置为最新

flink+doris

会员权益专享

最新资源