没有合适的资源?快使用搜索试试~ 我知道了~
首页高清彩版 Kafka_The Definitive Guide_Real-Time Data and Stream Processing at Scale
高清彩版 Kafka_The Definitive Guide_Real-Time Data and Stream Proces...
需积分: 9 8 下载量 66 浏览量
更新于2023-05-21
评论
收藏 4.84MB PDF 举报
Kafka_The Definitive Guide_Real-Time Data and Stream Processing at Scale
资源详情
资源评论
资源推荐
Kafka: The Definitive Guide
by Gwen Shapira, Neha Narkhede, Todd Palino
Publisher: O'Reilly Media, Inc.
Release Date: September 2017
ISBN: 9781491936160
Topics: Apache Web Server / Message Queues
Book Description
Every enterprise application creates data, whether it’s log messages, metrics, user
activity, outgoing messages, or something else. And how to move all of this data
becomes nearly as important as the data itself. If you’re an application architect,
developer, or production engineer new to Apache Kafka, this practical guide shows you
how to use this open source streaming platform to handle real-time data feeds.
Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain
how to deploy production Kafka clusters, write reliable event-driven microservices,
and build scalable stream-processing applications with this platform. Through detailed
examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs,
and architecture details, including the replication protocol, the controller, and the
storage layer.
Understand publish-subscribe messaging and how it fits in the big data ecosystem.
Explore Kafka producers and consumers for writing and reading messages
Understand Kafka patterns and use-case requirements to ensure reliable data delivery
Get best practices for building data pipelines and applications with Kafka
Manage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasks
Learn the most critical metrics among Kafka’s operational measurements
Explore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems
Kafka: The Definitive Guide
by Neha Narkhede, Gwen Shapira, and Todd Palino
Copyright © 2017 Neha Narkhede, Gwen Shapira, Todd Palino. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://oreilly.com/safari). For
more information, contact our corporate/institutional sales department: 800-998-9938
or
corporate@oreilly.com
.
Editor: Shannon Cutt
Production Editor: Shiny Kalapurakkel
Copyeditor: Christina Edwards
Proofreader: Amanda Kersey
Indexer: WordCo Indexing Services, Inc.
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
July 2017: First Edition
Revision History for the First Edition
2017-07-07: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491936160 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
Kafka: The
Definitive Guide
, the cover image, and related trade dress are trademarks of O’Reilly
Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the authors disclaim all responsibility for errors or omissions, including without
limitation responsibility for damages resulting from the use of or reliance on this
work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is
subject to open source licenses or the intellectual property rights of others, it is
your responsibility to ensure that your use thereof complies with such licenses and/or
rights.
978-1-491-93616-0
[M]
Foreword
It’s an exciting time for Apache Kafka. Kafka is being used by tens of thousands of
organizations, including over a third of the Fortune 500 companies. It’s among the
fastest growing open source projects and has spawned an immense ecosystem around it.
It’s at the heart of a movement towards managing and processing streams of data.
So where did Kafka come from? Why did we build it? And what exactly is it?
Kafka got its start as an internal infrastructure system we built at LinkedIn. Our
observation was really simple: there were lots of databases and other systems built to
store
data, but what was missing in our architecture was something that would help us
to handle the continuous
flow
of data. Prior to building Kafka, we experimented with
all kinds of off the shelf options; from messaging systems to log aggregation and ETL
tools, but none of them gave us what we wanted.
We eventually decided to build something from scratch. Our idea was that instead of
focusing on holding piles of data like our relational databases, key-value stores,
search indexes, or caches, we would focus on treating data as a continually evolving
and ever growing stream, and build a data system—and indeed a data architecture—
oriented around that idea.
This idea turned out to be even more broadly applicable than we expected. Though Kafka
got its start powering real-time applications and data flow behind the scenes of a
social network, you can now see it at the heart of next-generation architectures in
every industry imaginable. Big retailers are re-working their fundamental business
processes around continuous data streams; car companies are collecting and processing
real-time data streams from internet-connected cars; and banks are rethinking their
fundamental processes and systems around Kafka as well.
So what is this Kafka thing all about? How does it compare to the systems you already
know and use?
We’ve come to think of Kafka as a
streaming platform
: a system that lets you publish
and subscribe to streams of data, store them, and process them, and that is exactly
what Apache Kafka is built to be. Getting used to this way of thinking about data
might be a little different than what you’re used to, but it turns out to be an
incredibly powerful abstraction for building applications and architectures. Kafka is
often compared to a couple of existing technology categories: enterprise messaging
systems, big data systems like Hadoop, and data integration or ETL tools. Each of
these comparisons has some validity but also falls a little short.
Kafka is like a messaging system in that it lets you publish and subscribe to streams
of messages. In this way, it is similar to products like ActiveMQ, RabbitMQ, IBM’s
MQSeries, and other products. But even with these similarities, Kafka has a number of
core differences from traditional messaging systems that make it another kind of
animal entirely. Here are the big three differences: first, it works as a modern
distributed system that runs as a cluster and can scale to handle all the applications
in even the most massive of companies. Rather than running dozens of individual
messaging brokers, hand wired to different apps, this lets you have a central platform
that can scale elastically to handle all the streams of data in a company. Secondly,
Kafka is a true storage system built to store data for as long as you might like. This
has huge advantages in using it as a connecting layer as it provides real delivery
guarantees—its data is replicated, persistent, and can be kept around as long as you
like. Finally, the world of stream processing raises the level of abstraction quite
significantly. Messaging systems mostly just hand out messages. The stream processing
capabilities in Kafka let you compute derived streams and datasets dynamically off of
your streams with far less code. These differences make Kafka enough of its own thing
that it doesn’t really make sense to think of it as “yet another queue.”
Another view on Kafka—and one of our motivating lenses in designing and building it—
was to think of it as a kind of real-time version of Hadoop. Hadoop lets you store and
periodically process file data at a very large scale. Kafka lets you store and
continuously process streams of data, also at a large scale. At a technical level,
there are definitely similarities, and many people see the emerging area of stream
processing as a superset of the kind of batch processing people have done with Hadoop
and its various processing layers. What this comparison misses is that the use cases
that continuous, low-latency processing opens up are quite different from those that
naturally fall on a batch processing system. Whereas Hadoop and big data targeted
analytics applications, often in the data warehousing space, the low latency nature of
Kafka makes it applicable for the kind of core applications that directly power a
business. This makes sense: events in a business are happening all the time and the
ability to react to them as they occur makes it much easier to build services that
directly power the operation of the business, feed back into customer experiences, and
so on.
The final area Kafka gets compared to is ETL or data integration tools. After all,
these tools move data around, and Kafka moves data around. There is some validity to
this as well, but I think the core difference is that Kafka has inverted the problem.
Rather than a tool for scraping data out of one system and inserting it into another,
Kafka is a platform oriented around real-time streams of events. This means that not
only can it connect off-the-shelf applications and data systems, it can power custom
applications built to trigger off of these same data streams. We think this
architecture centered around streams of events is a really important thing. In some
ways these flows of data are the most central aspect of a modern digital company, as
important as the cash flows you’d see in a financial statement.
剩余342页未读,继续阅读
weixin_38669689
- 粉丝: 27
- 资源: 318
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0