没有合适的资源?快使用搜索试试~ 我知道了~
首页learning-spark-streaming
Structured Streaming 是一个可拓展,容错的,基于Spark SQL执行引擎的流处理引擎。使用小量的静态数据模拟流处理。伴随流数据的到来,Spark SQL引擎会逐渐连续处理数据并且更新结果到最终的Table中。你可以在Spark SQL上引擎上使用DataSet/DataFrame API处理流数据的聚集,事件窗口,和流与批次的连接操作等。最后Structured Streaming 系统快速,稳定,端到端的恰好一次保证,支持容错的处理。
资源详情
资源评论
资源推荐
2. A distributed file system
3. Two higher-order functions
3. Optimizations in a reduce operation
1. Associativity : a necessary condition.
2. Shuffling
3. Map-side combiner
4. To Learn more about MapReduce
1. The Spark ecosystem, approach and polyglot APIs
2. Multiple frameworks, and a framework scheduler
3. A Data Processing engine
4. A polyglot API
5. A MapReduce extension
6. A SQL interface, expanding into a DataFrame interface.
7. A Real Time processing engine
8. In-memory computing, with impact on processing speed and latency
9. MapReduce and memory legacy
10. Spark’s Memory Usage
11. A customizable cache
12. Operation Latency
5. How Spark Streaming fits in the Big Picture
1. Micro-batching
2. A strong Streaming characteristic
3. A minimal delay
4. Throughput-oriented tasks
6. Why you would want to use Spark Streaming
1. Building a pipeline
2. Productive deployment of pipelines
3. Productive implementation of data analysis
7. To learn more about Spark
8. Conclusion
9. Bibliography
2. 2. Core Spark Streaming concepts
1. Apache Spark RDDs
1. Resilient Distributed Datasets
2. Transformations and Actions
3. The Shuffle
4. Partitions
5. Debugging RDDs
6. Witnessing caching
2. Spark Streaming Clusters
1. The Standalone Spark cluster
2. Yet Another Resource Negotiator (YARN)
3. Apache Mesos
4. Spark Streaming : a delicate deployment
3. To learn more about runinng Spark on a cluster
4. Fundamentals of a DStream
1. A Bulk-synchronous model
2. The Spark Streaming Context
1. 1. Introducing Spark Streaming
1. Large-scale data analytics and Apache Spark
2. More than MapReduce : how the model came about and how Spark extends it.
1. A Fault-tolerant MapReduce cluster
https://www.iteblog.com
3. Representing regular updates to a fixed window of data
4. The Receiver Model
5. Receiver parallelism
5. Conclusion
6. Bibliography
3. 3. Streaming application design
1. Starting with an example : Twitter analysis
1. The Spark Notebook
2. Creating a Streaming Application
3. Creating a Stream
4. Transformations
5. Actions and Dataflow
6. Expressing a Dataflow
7. Starting the Spark Streaming Context
8. Summary
2. Windowed Streams
1. Windowed Streams
2. A word on changing the batch interval
3. Slicing your Stream
3. Other Data Sources and Connectors
1. Apache Kafka
2. Apache Flume
3. Kinesis
4. Apache Bahir
5. How to write a quick stream generator for testing : SocketStream ,
FileStream , QueueStream
4. The Lambda Architecture
1. The evolution of ideas, rather than products
2. A classical but difficult example
3. Batch processing and a program’s life time
4. A Streaming improvement
5. A fundamental difficulty: back to the Lambda architecture ?
5. Saving Streams
1. Stream Output and other operations
2. A word on content selection
3. Reasons for saving a stream and scaling into real-time
4. How to Save Streams with DataFrames
6. Bibliography
4. 4. Creating robust deployments
1. Using spark-submit
2. Thinking about reliability in Spark Streaming: Closures and Function-Passing Style
3. Spark’s Reliability primitives
4. Spark’s Fault Tolerance Guarantees
1. The External shuffle service
2. Cluster-mode deployment
3. Checkpointing
4. A hot-swappable master through Zookeeper
5. Fault-tolerance in Spark Streaming: the context of the Receiver model
https://www.iteblog.com
6. Spark Streaming’s Zero Data Loss guarantees
7. Cluster managers and driver restart
8. Comparing cluster managers
9. Job stability: A time budget question
1. Batch interval and processing delay
2. Going deeper : scheduling delay and processing delay
3. Fixed-rate throttling
10. Backpressure
1. Why backpressure
2. Dynamic throttling
3. Tuning the backpressure PID
11. Fault tolerance in Spark Streaming
1. Planning for side effect stutter in transformations
2. Idempotent side effects for exactly once processing
3. Checkpointing and its importance
12. The Reliable Receiver and the Write-Ahead Log
13. Apache Kafka and the DirectKafkaReceiver
1. The Kafka model and its Receiver
14. Parallel consumers
1. The Receiver model vs. reliable receivers
15. Bibliography
5. 5. Streaming Programming API
1. Basic Stream transformations
1. Element-centric DStream Operations
2. RDD-centric DStream Operations
3. Counting
2. Output Operations
1. foreachRDD
2. 3rd Party Output Operations
3. Spark SQL and Spark Streaming
4. Spark SQL
1. Accessing Spark SQL Functions From Spark Streaming
2. Dealing with Data at Rest
3. Join Optimizations
4. Updating Reference Data
5. Stateful Streaming Computation
1. UpdateStateByKey
2. Statefulness at the scale of a stream
3. updateStateByKey and its limitations
4. mapwithState
5. Using mapWithState
6. Event-time Stream computation with mapWithState
6. Dynamic Windows
1. reduceByWindow
2. Invertible Aggregations
7. Caching
8. Measuring and Monitoring
1. The Streaming UI
2. The Monitoring API
3. Conclusion
9. Bibliography
https://www.iteblog.com
剩余287页未读,继续阅读
WinterfellDuke
- 粉丝: 4
- 资源: 7
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1