没有合适的资源？快使用搜索试试~ 我知道了~

首页learning-spark-streaming

learning-spark-streaming

Spark

Streaming

3星 · 超过75%的资源需积分: 10 24 下载量 28 浏览量更新于2023-03-16 评论收藏 6.64MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

试读

288页

Structured Streaming 是一个可拓展，容错的，基于Spark SQL执行引擎的流处理引擎。使用小量的静态数据模拟流处理。伴随流数据的到来，Spark SQL引擎会逐渐连续处理数据并且更新结果到最终的Table中。你可以在Spark SQL上引擎上使用DataSet/DataFrame API处理流数据的聚集，事件窗口，和流与批次的连接操作等。最后Structured Streaming 系统快速，稳定，端到端的恰好一次保证，支持容错的处理。

资源详情

资源评论

资源推荐

https://www.iteblog.com

关注大数据猿（

bigdata_ai

）公众号及时获

取最新大数据相关的电子书。或者访问

https://www.iteblog.com/archives/tag/eboo

ks/

获取。

2. A distributed file system

3. Two higher-order functions

3. Optimizations in a reduce operation

1. Associativity : a necessary condition.

2. Shuffling

3. Map-side combiner

4. To Learn more about MapReduce

1. The Spark ecosystem, approach and polyglot APIs

2. Multiple frameworks, and a framework scheduler

3. A Data Processing engine

4. A polyglot API

5. A MapReduce extension

6. A SQL interface, expanding into a DataFrame interface.

7. A Real Time processing engine

8. In-memory computing, with impact on processing speed and latency

9. MapReduce and memory legacy

10. Spark’s Memory Usage

11. A customizable cache

12. Operation Latency

5. How Spark Streaming fits in the Big Picture

1. Micro-batching

2. A strong Streaming characteristic

3. A minimal delay

4. Throughput-oriented tasks

6. Why you would want to use Spark Streaming

1. Building a pipeline

2. Productive deployment of pipelines

3. Productive implementation of data analysis

7. To learn more about Spark

8. Conclusion

9. Bibliography

2. 2. Core Spark Streaming concepts

1. Apache Spark RDDs

1. Resilient Distributed Datasets

2. Transformations and Actions

3. The Shuffle

4. Partitions

5. Debugging RDDs

6. Witnessing caching

2. Spark Streaming Clusters

1. The Standalone Spark cluster

2. Yet Another Resource Negotiator (YARN)

3. Apache Mesos

4. Spark Streaming : a delicate deployment

3. To learn more about runinng Spark on a cluster

4. Fundamentals of a DStream

1. A Bulk-synchronous model

2. The Spark Streaming Context

1. 1. Introducing Spark Streaming

1. Large-scale data analytics and Apache Spark

2. More than MapReduce : how the model came about and how Spark extends it.

1. A Fault-tolerant MapReduce cluster

https://www.iteblog.com

3. Representing regular updates to a fixed window of data

4. The Receiver Model

5. Receiver parallelism

5. Conclusion

6. Bibliography

3. 3. Streaming application design

1. Starting with an example : Twitter analysis

1. The Spark Notebook

2. Creating a Streaming Application

3. Creating a Stream

4. Transformations

5. Actions and Dataflow

6. Expressing a Dataflow

7. Starting the Spark Streaming Context

8. Summary

2. Windowed Streams

1. Windowed Streams

2. A word on changing the batch interval

3. Slicing your Stream

3. Other Data Sources and Connectors

1. Apache Kafka

2. Apache Flume

3. Kinesis

4. Apache Bahir

5. How to write a quick stream generator for testing : SocketStream ,

FileStream , QueueStream

4. The Lambda Architecture

1. The evolution of ideas, rather than products

2. A classical but difficult example

3. Batch processing and a program’s life time

4. A Streaming improvement

5. A fundamental difficulty: back to the Lambda architecture ?

5. Saving Streams

1. Stream Output and other operations

2. A word on content selection

3. Reasons for saving a stream and scaling into real-time

4. How to Save Streams with DataFrames

6. Bibliography

4. 4. Creating robust deployments

1. Using spark-submit

2. Thinking about reliability in Spark Streaming: Closures and Function-Passing Style

3. Spark’s Reliability primitives

4. Spark’s Fault Tolerance Guarantees

1. The External shuffle service

2. Cluster-mode deployment

3. Checkpointing

4. A hot-swappable master through Zookeeper

5. Fault-tolerance in Spark Streaming: the context of the Receiver model

https://www.iteblog.com

6. Spark Streaming’s Zero Data Loss guarantees

7. Cluster managers and driver restart

8. Comparing cluster managers

9. Job stability: A time budget question

1. Batch interval and processing delay

2. Going deeper : scheduling delay and processing delay

3. Fixed-rate throttling

10. Backpressure

1. Why backpressure

2. Dynamic throttling

3. Tuning the backpressure PID

11. Fault tolerance in Spark Streaming

1. Planning for side effect stutter in transformations

2. Idempotent side effects for exactly once processing

3. Checkpointing and its importance

12. The Reliable Receiver and the Write-Ahead Log

13. Apache Kafka and the DirectKafkaReceiver

1. The Kafka model and its Receiver

14. Parallel consumers

1. The Receiver model vs. reliable receivers

15. Bibliography

5. 5. Streaming Programming API

1. Basic Stream transformations

1. Element-centric DStream Operations

2. RDD-centric DStream Operations

3. Counting

2. Output Operations

1. foreachRDD

2. 3rd Party Output Operations

3. Spark SQL and Spark Streaming

4. Spark SQL

1. Accessing Spark SQL Functions From Spark Streaming

2. Dealing with Data at Rest

3. Join Optimizations

4. Updating Reference Data

5. Stateful Streaming Computation

1. UpdateStateByKey

2. Statefulness at the scale of a stream

3. updateStateByKey and its limitations

4. mapwithState

5. Using mapWithState

6. Event-time Stream computation with mapWithState

6. Dynamic Windows

1. reduceByWindow

2. Invertible Aggregations

7. Caching

8. Measuring and Monitoring

1. The Streaming UI

2. The Monitoring API

3. Conclusion

9. Bibliography

https://www.iteblog.com

剩余287页未读，继续阅读

要努力啊要努力

2018-04-29

early release 版。云盘也可以搜到。谢谢！

粉丝: 4
资源: 7

上传资源快速赚钱

我的内容管理收起

我的资源快来上传第一个资源

我的收益

登录查看自己的收益

我的积分登录查看自己的积分

我的C币登录后查看C币余额

我的收藏

我的下载

下载帮助

会员权益专享

learning-spark-streaming

评论1

会员权益专享

最新资源

learning-spark-streaming

评论1

Spark Streaming教学讲解PPT

learning-spark-streaming.pdf

Spark_streaming_learning:男孩：火花流+水槽+卡夫卡+ Hbase 201806

spark(42) -- sparkstreaming -- reducebykeyandwindow 函数详解

spark streaming 指南--spark2.4.3

spark--sparkstreaming

spark-streaming-kafka-0-8_2.11-2.1.0.jar下载

spark、spark streaming 依赖包总结，及胖包和瘦包的配置

learning spark笔记17-spark sql

apache spark 2.2.0 中文文档 - spark streaming 编程指南

大数据最佳实践-spark structstreaming

spark（一）-- sparkcore（一） -- spark概述

spark-streaming_2.11

spark ----spark 核心概述

Spark Core、Spark SQL、Spark Streaming、MLlib、GraphX、SparkR、PySpark、Spark JobServer之间的依赖关系是什么

具体的HBase-Spark 任务

sparkr入门(二)------spark架构

sparkstreaming----复习

开发本地环境--支撑sparkstreaming开发调试

[spark版本更新]--spark-2.4.0 发布说明

会员权益专享

最新资源