实时数据流：构建实时分析管道详解

5星 · 超过95%的资源需积分: 10 179 浏览量更新于2024-07-19 1 收藏 3.67MB PDF 举报

《实时数据流：理解实时管道》是一本深入讲解实时数据处理架构的书籍，作者是Andrew G. Psaltis。该书专注于介绍实时数据处理中的关键组件和工作流程，旨在帮助读者理解和构建高效、可扩展的实时数据管道。实时数据管道（Streaming Data Pipeline）通常包括以下几个主要阶段： 1. **数据收集**（Collection tier）: 这是数据流的第一步，涉及从各种来源捕获数据，如浏览器、设备或自动售货机等。这些数据可能是用户行为、传感器读数或其他实时事件。 2. **消息队列**（Message queuing tier）: 在这个阶段，收集到的数据被发送到一个消息队列系统，如Kafka或RabbitMQ，确保数据在处理过程中按顺序传递，并且能够处理高并发流量。这有助于解耦数据生产者和消费者，提高系统的弹性和容错性。 3. **内存数据存储**（In-memory datastore）: 数据被暂存于内存中，以便于快速访问和分析。内存数据库，如Redis或Memcached，用于存储热点数据，以减少延迟并提高处理速度。 4. **分析处理**（Analysis tier）: 实时数据在内存中被进一步处理，通过实时分析工具（如Apache Storm、Flink或Spark Streaming）进行计算，执行复杂的查询和机器学习算法，以便即时生成见解。 5. **长期存储**（Long-term storage）: 部分分析结果可能需要长期保存，即使不再需要实时更新。这通常涉及到将数据转移到持久化存储，如Hadoop HDFS或NoSQL数据库，以便后续查询和数据分析。 6. **数据访问**（Data access tier）: 结果数据可能通过API接口提供给最终用户或应用程序，允许实时查看分析结果，或者作为服务的一部分被其他系统调用。值得注意的是，尽管书中没有详细介绍持久化策略，但提到有时需要回溯已分析的数据，这表明实时管道的设计不仅要关注实时性能，还要考虑到数据的持久化和可访问性。此外，本书提供了一个全面的视角，但并非每个部分都详细讨论，对于特定的技术细节，读者可能需要参考更专业的资料进行深入研究。《实时数据流：理解实时管道》这本书适合IT专业人士，特别是那些对实时数据处理和实时分析感兴趣的开发人员、数据工程师和数据科学家，它可以帮助他们构建和优化实时数据处理环境，满足不断增长的实时业务需求。如果你希望深入了解这一领域，这本书将为你提供坚实的基础和实用的指导。

ACKNOWLEDGMENTS

xiv

John Guthrie, Kosmas Chatzimichalis, Giuliano Bertoti, Carlos Curotto, Andy Kirsch,

Douglas Duncan, Jeff Smith, and Sergio Fernández González, Jaromir D.B. Nemec,

Jose Samonte, Jan Nonnen, Romit Singhai, Chris Allan, Jonathan Thoms, Steven Jenkins,

Lee Gilbert, Amandeep Khurana, Charlie Gaines. Without all of you, this book wouldn’t

be what it is today.

Many others contributed in various different ways. I can’t mention everyone by

name because the acknowledgments would just roll on and on, but a big thank you

goes out to everyone else who had a hand in helping make this possible!

ABOUT THIS BOOK

xvii

Chapter 4 dives into the common architectural patterns of distributed stream-

processing frameworks, covering topics such as what message delivery semantics mean

for this tier, how state is commonly handled, and what fault tolerance is and why we

need it.

Chapter 5 jumps from discussing architecture to querying a stream, the problems

with time, and the four popular summarization techniques. If chapter 4 is the what for

distributed stream-processing engines, chapter 5 is the how.

Chapter 6 discusses options for storing data in-memory during and post analysis. It

doesn’t spend much time discussing disk-based long-term storage solutions because

they’re often used out of band of a streaming analysis and don’t offer the perfor-

mance of the in-memory stores.

Chapter 7 is where we start to discuss what to do with the data we have collected

and analyzed. It talks about communications patterns and protocols used for sending

data to a streaming client. Along the way we’ll find out how to match up our business

requirements to the various protocols and how to choose the right one.

Chapter 8 explores concepts to keep in mind when building a streaming client.

This is not a chapter on just building an HTML web app; it goes much deeper into

lower-level things to consider when designing the client side of a streaming system.

Chapter 9 . . . at this point, if you have read all the way through, congrats! A lot of

material is covered in the first eight chapters. Chapter 9 is where we make it all come

to life. Here we build a complete streaming data pipeline and discuss taking our sam-

ple to production.

About the code

All the code shown in the final chapter of this book can be found in the sample source

code that accompanies this book. You can download the sample code free of charge from

the Manning website at www.manning.com/books/streaming-data. You may also find the

code on GitHub at https://github.com/apsaltis/StreamingData-Book-Examples.

The sample code is structured as separate Maven projects, one for each of the tiers

we walk through in chapter 9. Instructions for building and running the software are

provided during the walkthrough in chapter 9.

All source code in listings or in the text is in a

fixed-width

font

like this to sep-

arate it from ordinary text. In some listings, the code is annotated to point out the

key concepts.

About the author

NDREW

SALTIS

is deeply entrenched in streaming systems and obsessed with delivering

insight at the speed of thought. He spends most of his waking hours thinking about,

writing about, and building streaming systems. He helps customers of all sizes build

and/or fix complex streaming systems, speaks around the globe about streaming, and

teaches others how to build streaming systems. When he’s not busy being busy, he’s

spending time with his lovely wife, two kids, and watching as much lacrosse as possible.

ABOUT THIS BOOK

xviii

Author Online

The purchase of Streaming Data includes free access to a private forum run by Man-

ning Publications where you can make comments about the book, ask technical ques-

tions, and receive help from the author and other users. To access and subscribe to

the forum, point your browser to www.manning.com/books/streaming-data. This page

provides information on how to get on the forum once you’re registered, what kind of

help is available, and the rules of conduct in the forum.

Manning’s commitment to our readers is to provide a venue where meaningful

dialogue between individual readers and between readers and the author can take

place. It’s not a commitment to any specific amount of participation on the part of the

author, whose contribution to the book’s forum remains voluntary (and unpaid). We

suggest you try asking him challenging questions, lest his interest stray!

The Author Online forum and the archives of previous discussions will be accessi-

ble from the publisher’s website as long as the book is in print.

About the cover illustration

The figure on the cover of Streaming Data is captioned “Habit of a Moor of Morrocco

in winter in 1695.” The illustration is taken from Thomas Jefferys’ A Collection of the

Dresses of Different Nations, Ancient and Modern (four volumes), London, published

between 1757 and 1772. The title page states that these are hand-colored copperplate

engravings, heightened with gum arabic. Thomas Jefferys (1719–1771) was called “Geog-

rapher to King George III.” He was an English cartographer who was the leading map

supplier of his day. He engraved and printed maps for government and other official

bodies and produced a wide range of commercial maps and atlases, especially of North

America. His work as a mapmaker sparked an interest in local dress customs of the

lands he surveyed and mapped, which are brilliantly displayed in this collection.

Fascination with faraway lands and travel for pleasure were relatively new phe-

nomena in the late 18th century and collections such as this one were popular, intro-

ducing both the tourist as well as the armchair traveler to the inhabitants of other

countries. The diversity of the drawings in Jefferys’ volumes speaks vividly of the

uniqueness and individuality of the world’s nations some 200 years ago. Dress codes

have changed since then and the diversity by region and country, so rich at the time,

has faded away. It is now often hard to tell the inhabitant of one continent from

another. Perhaps, trying to view it optimistically, we have traded a cultural and visual

diversity for a more varied personal life. Or a more varied and interesting intellectual

and technical life.

At a time when it is hard to tell one computer book from another, Manning cele-

brates the inventiveness and initiative of the computer business with book covers based

on the rich diversity of regional life of two centuries ago, brought back to life by Jef-

freys’ pictures.

剩余218页未读，继续阅读

ALo54

粉丝: 5
资源: 111

实时数据流：构建实时分析管道详解

Streaming Data Understanding the real-time pipeline 无水印pdf

DataStreaming

streaming tool-1.4.9

spark(42) -- sparkstreaming -- reducebykeyandwindow 函数详解

sparkstreaming----复习

spark-streaming-kafka-0-8_2.11-2.1.0.jar下载

spark streaming 指南--spark2.4.3

用英文直播带货写一篇英语短文

spark streaming详解----概述、基本概念、性能调优

spark(55) -- structuredstreaming -- continuous processing

最新资源