实时数据流处理：构建高效应用

需积分: 16 11 浏览量更新于2024-07-19 收藏 3.59MB PDF 举报

"Streaming Data"，作者Andrew Psaltis，由Manning出版，涵盖了实时数据处理、流数据架构设计以及相关技术应用。在信息技术领域，"流数据"（Streaming Data）是指持续不断产生的、需要实时处理的数据流。这种技术使得应用程序能够处理大量动态数据，如实时位置信息、实时设备故障监控和即时交易记录等。随着技术的不断发展，开发者现在有机会构建这样的应用，而无需具备专业的流数据处理经验。《Streaming Data》这本书旨在帮助读者理解如何高效地与快速流动的数据交互。通过丰富的实例和案例，读者将学习到构建处理流数据的应用的设计方法，包括数据的收集、分析、分享和存储。书中会介绍一系列关键技术和工具，例如Spark、Storm、Kafka、Flink和RabbitMQ等，讲解它们在不同场景下的应用。书中的内容包括但不限于： 1. 实时数据的正确采集方式：讨论如何在数据产生的瞬间就捕获并处理它，确保数据的时效性。 2. 构建流数据管道（Streaming Pipeline）：介绍如何设计一个可以处理大量实时数据的架构，从数据的源头到最终的存储和分析。 3. 数据分析：探讨如何在数据流中进行实时分析，提取有价值的信息。 4. 技术选型：指导读者根据具体需求选择合适的技术栈，理解各种工具的优势和应用场景。 5. 分析后的长期存储：虽然书中不会深入讨论，但提到了分析后数据的持久化可能的需求，以便未来再次使用或回溯。这本书适合对关系数据库有一定了解的开发者，无需有流数据或实时应用的经验。作者Andrew Psaltis是一名专注于大规模实时分析的软件工程师，他的专业知识和实践经验将为读者提供宝贵的指导。书中提到的实时数据处理管道通常包含以下几个层次： - 收集层（Collection Tier）：数据的来源，如浏览器、设备、自动贩卖机等，这些设备不断地产生数据。 - 消息队列层（Message Queuing Tier）：负责数据的传输和缓冲，确保数据流的稳定和高效。 - 内存数据存储层（In-memory Datastore）：用于临时存储和快速处理数据，提高处理速度。 - 分析层（Analysis Tier）：对数据进行实时分析，提取有价值的信息。 - 数据访问层（Data Access Tier）：提供对分析结果的访问和查询。 - 长期存储（Long-term Storage）：虽然书中不会详细讲解，但分析后的数据可能需要被持久化，以便后续使用。《Streaming Data》是一本面向实践的教程，旨在培养开发者对流数据处理的思维方式，并提供具体的实现细节。通过阅读本书，读者将能够掌握实时数据处理的关键概念和技术，为构建自己的实时应用程序打下坚实基础。

ACKNOWLEDGMENTS

xiv

John Guthrie, Kosmas Chatzimichalis, Giuliano Bertoti, Carlos Curotto, Andy Kirsch,

Douglas Duncan, Jeff Smith, and Sergio Fernández González, Jaromir D.B. Nemec,

Jose Samonte, Jan Nonnen, Romit Singhai, Chris Allan, Jonathan Thoms, Steven Jenkins,

Lee Gilbert, Amandeep Khurana, Charlie Gaines. Without all of you, this book wouldn’t

be what it is today.

Many others contributed in various different ways. I can’t mention everyone by

name because the acknowledgments would just roll on and on, but a big thank you

goes out to everyone else who had a hand in helping make this possible!

ABOUT THIS BOOK

xvii

Chapter 4 dives into the common architectural patterns of distributed stream-

processing frameworks, covering topics such as what message delivery semantics mean

for this tier, how state is commonly handled, and what fault tolerance is and why we

need it.

Chapter 5 jumps from discussing architecture to querying a stream, the problems

with time, and the four popular summarization techniques. If chapter 4 is the what for

distributed stream-processing engines, chapter 5 is the how.

Chapter 6 discusses options for storing data in-memory during and post analysis. It

doesn’t spend much time discussing disk-based long-term storage solutions because

they’re often used out of band of a streaming analysis and don’t offer the perfor-

mance of the in-memory stores.

Chapter 7 is where we start to discuss what to do with the data we have collected

and analyzed. It talks about communications patterns and protocols used for sending

data to a streaming client. Along the way we’ll find out how to match up our business

requirements to the various protocols and how to choose the right one.

Chapter 8 explores concepts to keep in mind when building a streaming client.

This is not a chapter on just building an HTML web app; it goes much deeper into

lower-level things to consider when designing the client side of a streaming system.

Chapter 9 . . . at this point, if you have read all the way through, congrats! A lot of

material is covered in the first eight chapters. Chapter 9 is where we make it all come

to life. Here we build a complete streaming data pipeline and discuss taking our sam-

ple to production.

About the code

All the code shown in the final chapter of this book can be found in the sample source

code that accompanies this book. You can download the sample code free of charge from

the Manning website at www.manning.com/books/streaming-data. You may also find the

code on GitHub at https://github.com/apsaltis/StreamingData-Book-Examples.

The sample code is structured as separate Maven projects, one for each of the tiers

we walk through in chapter 9. Instructions for building and running the software are

provided during the walkthrough in chapter 9.

All source code in listings or in the text is in a

fixed-width

font

like this to sep-

arate it from ordinary text. In some listings, the code is annotated to point out the

key concepts.

About the author

NDREW

SALTIS

is deeply entrenched in streaming systems and obsessed with delivering

insight at the speed of thought. He spends most of his waking hours thinking about,

writing about, and building streaming systems. He helps customers of all sizes build

and/or fix complex streaming systems, speaks around the globe about streaming, and

teaches others how to build streaming systems. When he’s not busy being busy, he’s

spending time with his lovely wife, two kids, and watching as much lacrosse as possible.

ABOUT THIS BOOK

xviii

Author Online

The purchase of Streaming Data includes free access to a private forum run by Man-

ning Publications where you can make comments about the book, ask technical ques-

tions, and receive help from the author and other users. To access and subscribe to

the forum, point your browser to www.manning.com/books/streaming-data. This page

provides information on how to get on the forum once you’re registered, what kind of

help is available, and the rules of conduct in the forum.

Manning’s commitment to our readers is to provide a venue where meaningful

dialogue between individual readers and between readers and the author can take

place. It’s not a commitment to any specific amount of participation on the part of the

author, whose contribution to the book’s forum remains voluntary (and unpaid). We

suggest you try asking him challenging questions, lest his interest stray!

The Author Online forum and the archives of previous discussions will be accessi-

ble from the publisher’s website as long as the book is in print.

About the cover illustration

The figure on the cover of Streaming Data is captioned “Habit of a Moor of Morrocco

in winter in 1695.” The illustration is taken from Thomas Jefferys’ A Collection of the

Dresses of Different Nations, Ancient and Modern (four volumes), London, published

between 1757 and 1772. The title page states that these are hand-colored copperplate

engravings, heightened with gum arabic. Thomas Jefferys (1719–1771) was called “Geog-

rapher to King George III.” He was an English cartographer who was the leading map

supplier of his day. He engraved and printed maps for government and other official

bodies and produced a wide range of commercial maps and atlases, especially of North

America. His work as a mapmaker sparked an interest in local dress customs of the

lands he surveyed and mapped, which are brilliantly displayed in this collection.

Fascination with faraway lands and travel for pleasure were relatively new phe-

nomena in the late 18th century and collections such as this one were popular, intro-

ducing both the tourist as well as the armchair traveler to the inhabitants of other

countries. The diversity of the drawings in Jefferys’ volumes speaks vividly of the

uniqueness and individuality of the world’s nations some 200 years ago. Dress codes

have changed since then and the diversity by region and country, so rich at the time,

has faded away. It is now often hard to tell the inhabitant of one continent from

another. Perhaps, trying to view it optimistically, we have traded a cultural and visual

diversity for a more varied personal life. Or a more varied and interesting intellectual

and technical life.

At a time when it is hard to tell one computer book from another, Manning cele-

brates the inventiveness and initiative of the computer business with book covers based

on the rich diversity of regional life of two centuries ago, brought back to life by Jef-

freys’ pictures.

剩余218页未读，继续阅读

shuhangwu

粉丝: 2
资源: 45

实时数据流处理：构建高效应用

Streaming Systems(EarlyRelease) mobi

Streaming Systems(EarlyRelease) 无水印pdf

Streaming Data Understanding the real-time pipeline 无水印pdf

streaming data

Streaming Data Mining-计算机科学

Flink FFA Flink Towards Streaming Data Warehouse

Streaming Data Understanding the real-time pipeline v2

Streaming Data - Understanding the real-time pipeline.zip

小天狼星：基于gRPC的Cloud Native Streaming Data RPC Proxy

了解大数据-企业级Hadoop和流数据的分析Understanding Big Data - Analytics for Enterprise Class Hadoop and Streaming Data

最新资源