实时数据处理与流计算技术详解：Spark、Storm等实践应用

需积分: 9 90 浏览量更新于2024-07-19 3 收藏 3.66MB PDF 举报

《Streaming.Data.2017.5.pdf》是一本深度讲解实时数据处理的实用教程，作者是Andrew G. Psaltis。本书的核心内容围绕着如何有效地与快速流动的数据交互，提供了丰富的实例和应用案例，帮助读者理解和设计处理实时数据的应用程序。它涵盖了从数据读取、分析到分享和存储的全链条设计。首先，"实时数据管道"这一章节将引导读者理解实时数据处理的关键架构。在这个过程中，读者会了解到Spark、Storm、Kafka、Flink等关键技术在处理流式数据中的角色。这些技术在数据的实时处理中起着至关重要的作用，如Spark用于大规模数据处理，Storm负责实时事件驱动计算，而Kafka则作为消息队列，确保数据的可靠传输。其次，书中强调了长期存储的重要性。即使分析后得到的数据有时需要持久化，以便在未来查询或进一步利用。例如，浏览器、移动设备、自动售货机等场景中，可能需要回溯历史分析结果。因此，一个包含内存数据存储和长期数据存储的多层次结构被提出，这包括内存在线存储用于快速访问，以及用于长期存储和备份的持久化层。分析阶段是数据处理的核心环节，书中介绍了构建分析-tier，用于深度挖掘和实时分析数据。这个层次不仅涉及实时计算，还可能涉及到机器学习算法，以提取有价值的信息。除了分析，消息队列-tier（如RabbitMQ）也是不可或缺的，它确保了数据的有序传递，即使在系统负载变化时也能维持数据的实时性。最后，数据访问-tier关注的是如何高效地访问和检索处理后的数据，这对于提供实时服务至关重要。这可能包括数据库优化，缓存策略，或者API设计，以支持多种客户端应用程序的访问。尽管本书并未深入探讨所有细节，但读者可以从中获得对实时数据处理的整体框架和实践方法有深入的理解。对于那些希望通过实践掌握实时数据处理技术的人来说，这是一本非常有价值的参考资料。同时，读者还能了解到Manning出版社提供的购买优惠和服务联系方式，以及版权声明。《Streaming.Data.2017.5.pdf》是一本既具有理论深度又兼顾实践指导的实时数据处理指南。

ACKNOWLEDGMENTS

xiv

John Guthrie, Kosmas Chatzimichalis, Giuliano Bertoti, Carlos Curotto, Andy Kirsch,

Douglas Duncan, Jeff Smith, and Sergio Fernández González, Jaromir D.B. Nemec,

Jose Samonte, Jan Nonnen, Romit Singhai, Chris Allan, Jonathan Thoms, Steven Jenkins,

Lee Gilbert, Amandeep Khurana, Charlie Gaines. Without all of you, this book wouldn’t

be what it is today.

Many others contributed in various different ways. I can’t mention everyone by

name because the acknowledgments would just roll on and on, but a big thank you

goes out to everyone else who had a hand in helping make this possible!

ABOUT THIS BOOK

xvii

Chapter 4 dives into the common architectural patterns of distributed stream-

processing frameworks, covering topics such as what message delivery semantics mean

for this tier, how state is commonly handled, and what fault tolerance is and why we

need it.

Chapter 5 jumps from discussing architecture to querying a stream, the problems

with time, and the four popular summarization techniques. If chapter 4 is the what for

distributed stream-processing engines, chapter 5 is the how.

Chapter 6 discusses options for storing data in-memory during and post analysis. It

doesn’t spend much time discussing disk-based long-term storage solutions because

they’re often used out of band of a streaming analysis and don’t offer the perfor-

mance of the in-memory stores.

Chapter 7 is where we start to discuss what to do with the data we have collected

and analyzed. It talks about communications patterns and protocols used for sending

data to a streaming client. Along the way we’ll find out how to match up our business

requirements to the various protocols and how to choose the right one.

Chapter 8 explores concepts to keep in mind when building a streaming client.

This is not a chapter on just building an HTML web app; it goes much deeper into

lower-level things to consider when designing the client side of a streaming system.

Chapter 9 . . . at this point, if you have read all the way through, congrats! A lot of

material is covered in the first eight chapters. Chapter 9 is where we make it all come

to life. Here we build a complete streaming data pipeline and discuss taking our sam-

ple to production.

About the code

All the code shown in the final chapter of this book can be found in the sample source

code that accompanies this book. You can download the sample code free of charge from

the Manning website at www.manning.com/books/streaming-data. You may also find the

code on GitHub at https://github.com/apsaltis/StreamingData-Book-Examples.

The sample code is structured as separate Maven projects, one for each of the tiers

we walk through in chapter 9. Instructions for building and running the software are

provided during the walkthrough in chapter 9.

All source code in listings or in the text is in a

fixed-width

font

like this to sep-

arate it from ordinary text. In some listings, the code is annotated to point out the

key concepts.

About the author

NDREW

SALTIS

is deeply entrenched in streaming systems and obsessed with delivering

insight at the speed of thought. He spends most of his waking hours thinking about,

writing about, and building streaming systems. He helps customers of all sizes build

and/or fix complex streaming systems, speaks around the globe about streaming, and

teaches others how to build streaming systems. When he’s not busy being busy, he’s

spending time with his lovely wife, two kids, and watching as much lacrosse as possible.

ABOUT THIS BOOK

xviii

Author Online

The purchase of Streaming Data includes free access to a private forum run by Man-

ning Publications where you can make comments about the book, ask technical ques-

tions, and receive help from the author and other users. To access and subscribe to

the forum, point your browser to www.manning.com/books/streaming-data. This page

provides information on how to get on the forum once you’re registered, what kind of

help is available, and the rules of conduct in the forum.

Manning’s commitment to our readers is to provide a venue where meaningful

dialogue between individual readers and between readers and the author can take

place. It’s not a commitment to any specific amount of participation on the part of the

author, whose contribution to the book’s forum remains voluntary (and unpaid). We

suggest you try asking him challenging questions, lest his interest stray!

The Author Online forum and the archives of previous discussions will be accessi-

ble from the publisher’s website as long as the book is in print.

About the cover illustration

The figure on the cover of Streaming Data is captioned “Habit of a Moor of Morrocco

in winter in 1695.” The illustration is taken from Thomas Jefferys’ A Collection of the

Dresses of Different Nations, Ancient and Modern (four volumes), London, published

between 1757 and 1772. The title page states that these are hand-colored copperplate

engravings, heightened with gum arabic. Thomas Jefferys (1719–1771) was called “Geog-

rapher to King George III.” He was an English cartographer who was the leading map

supplier of his day. He engraved and printed maps for government and other official

bodies and produced a wide range of commercial maps and atlases, especially of North

America. His work as a mapmaker sparked an interest in local dress customs of the

lands he surveyed and mapped, which are brilliantly displayed in this collection.

Fascination with faraway lands and travel for pleasure were relatively new phe-

nomena in the late 18th century and collections such as this one were popular, intro-

ducing both the tourist as well as the armchair traveler to the inhabitants of other

countries. The diversity of the drawings in Jefferys’ volumes speaks vividly of the

uniqueness and individuality of the world’s nations some 200 years ago. Dress codes

have changed since then and the diversity by region and country, so rich at the time,

has faded away. It is now often hard to tell the inhabitant of one continent from

another. Perhaps, trying to view it optimistically, we have traded a cultural and visual

diversity for a more varied personal life. Or a more varied and interesting intellectual

and technical life.

At a time when it is hard to tell one computer book from another, Manning cele-

brates the inventiveness and initiative of the computer business with book covers based

on the rich diversity of regional life of two centuries ago, brought back to life by Jef-

freys’ pictures.

剩余218页未读，继续阅读

xinconan2

粉丝: 269

实时数据处理与流计算技术详解：Spark、Storm等实践应用

Streaming Data Understanding the real-time pipeline 无水印pdf

apache视频流插件mod_flvx.so mod_h264_streaming.so两个

AUTOSAR_RS_CommunicationManagement.pdf

Handbook of Big Data Technologies

Elastic Streaming与Spark Streaming：动态扩缩容与资源分配

人脸识别_深度学习_CNN_表情分析系统_1741778057.zip

Hono框架下基于TypeScript的Web应用构建指南：从项目初始化到模块全面实现（可复现，有问题请联系博主）

掌静脉识别算法源码（门禁）.zip

计算机视觉_手势识别_色域转换_控制应用_1741857836.zip

（参考GUI）MATLAB BP的交通标志系统.zip

最新资源