没有合适的资源?快使用搜索试试~ 我知道了~
首页Learning Spark Streaming Best Practices for Scaling and Optimizing Apache 无水印pdf
Learning Spark Streaming Best Practices for Scaling and Optimizing Apache Spark(Early Release) 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
资源详情
资源评论
资源推荐
2. A distributed file system
3. Two higher-order functions
3. Optimizations in a reduce operation
1. Associativity : a necessary condition.
2. Shuffling
3. Map-side combiner
4. To Learn more about MapReduce
1. The Spark ecosystem, approach and polyglot APIs
2. Multiple frameworks, and a framework scheduler
3. A Data Processing engine
4. A polyglot API
5. A MapReduce extension
6. A SQL interface, expanding into a DataFrame interface.
7. A Real Time processing engine
8. In-memory computing, with impact on processing speed and latency
9. MapReduce and memory legacy
10. Spark’s Memory Usage
11. A customizable cache
12. Operation Latency
5. How Spark Streaming fits in the Big Picture
1. Micro-batching
2. A strong Streaming characteristic
3. A minimal delay
4. Throughput-oriented tasks
6. Why you would want to use Spark Streaming
1. Building a pipeline
2. Productive deployment of pipelines
3. Productive implementation of data analysis
7. To learn more about Spark
8. Conclusion
9. Bibliography
2. 2. Core Spark Streaming concepts
1. Apache Spark RDDs
1. Resilient Distributed Datasets
2. Transformations and Actions
3. The Shuffle
4. Partitions
5. Debugging RDDs
6. Witnessing caching
2. Spark Streaming Clusters
1. The Standalone Spark cluster
2. Yet Another Resource Negotiator (YARN)
3. Apache Mesos
4. Spark Streaming : a delicate deployment
3. To learn more about runinng Spark on a cluster
4. Fundamentals of a DStream
1. A Bulk-synchronous model
2. The Spark Streaming Context
1. 1. Introducing Spark Streaming
1. Large-scale data analytics and Apache Spark
2. More than MapReduce : how the model came about and how Spark extends it.
1. A Fault-tolerant MapReduce cluster
3. Representing regular updates to a fixed window of data
4. The Receiver Model
5. Receiver parallelism
5. Conclusion
6. Bibliography
3. 3. Streaming application design
1. Starting with an example : Twitter analysis
1. The Spark Notebook
2. Creating a Streaming Application
3. Creating a Stream
4. Transformations
5. Actions and Dataflow
6. Expressing a Dataflow
7. Starting the Spark Streaming Context
8. Summary
2. Windowed Streams
1. Windowed Streams
2. A word on changing the batch interval
3. Slicing your Stream
3. Other Data Sources and Connectors
1. Apache Kafka
2. Apache Flume
3. Kinesis
4. Apache Bahir
5. How to write a quick stream generator for testing : SocketStream ,
FileStream , QueueStream
4. The Lambda Architecture
1. The evolution of ideas, rather than products
2. A classical but difficult example
3. Batch processing and a program’s life time
4. A Streaming improvement
5. A fundamental difficulty: back to the Lambda architecture ?
5. Saving Streams
1. Stream Output and other operations
2. A word on content selection
3. Reasons for saving a stream and scaling into real-time
4. How to Save Streams with DataFrames
6. Bibliography
4. 4. Creating robust deployments
1. Using spark-submit
2. Thinking about reliability in Spark Streaming: Closures and Function-Passing Style
3. Spark’s Reliability primitives
4. Spark’s Fault Tolerance Guarantees
1. The External shuffle service
2. Cluster-mode deployment
3. Checkpointing
4. A hot-swappable master through Zookeeper
5. Fault-tolerance in Spark Streaming: the context of the Receiver model
6. Spark Streaming’s Zero Data Loss guarantees
7. Cluster managers and driver restart
8. Comparing cluster managers
9. Job stability: A time budget question
1. Batch interval and processing delay
2. Going deeper : scheduling delay and processing delay
3. Fixed-rate throttling
10. Backpressure
1. Why backpressure
2. Dynamic throttling
3. Tuning the backpressure PID
11. Fault tolerance in Spark Streaming
1. Planning for side effect stutter in transformations
2. Idempotent side effects for exactly once processing
3. Checkpointing and its importance
12. The Reliable Receiver and the Write-Ahead Log
13. Apache Kafka and the DirectKafkaReceiver
1. The Kafka model and its Receiver
14. Parallel consumers
1. The Receiver model vs. reliable receivers
15. Bibliography
5. 5. Streaming Programming API
1. Basic Stream transformations
1. Element-centric DStream Operations
2. RDD-centric DStream Operations
3. Counting
2. Output Operations
1. foreachRDD
2. 3rd Party Output Operations
3. Spark SQL and Spark Streaming
4. Spark SQL
1. Accessing Spark SQL Functions From Spark Streaming
2. Dealing with Data at Rest
3. Join Optimizations
4. Updating Reference Data
5. Stateful Streaming Computation
1. UpdateStateByKey
2. Statefulness at the scale of a stream
3. updateStateByKey and its limitations
4. mapwithState
5. Using mapWithState
6. Event-time Stream computation with mapWithState
6. Dynamic Windows
1. reduceByWindow
2. Invertible Aggregations
7. Caching
8. Measuring and Monitoring
1. The Streaming UI
2. The Monitoring API
3. Conclusion
9. Bibliography
Learning Spark Streaming
First Edition
Francois Garillot and Gerard Maas
剩余286页未读,继续阅读
yinkaisheng-nj
- 粉丝: 763
- 资源: 6953
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- 2022年中国足球球迷营销价值报告.pdf
- 房地产培训 -营销总每天在干嘛.pptx
- 黄色简约实用介绍_汇报PPT模板.pptx
- 嵌入式系统原理及应用:第三章 ARM编程简介_3.pdf
- 多媒体应用系统.pptx
- 黄灰配色简约设计精美大气商务汇报PPT模板.pptx
- 用matlab绘制差分方程Z变换-反变换-zplane-residuez-tf2zp-zp2tf-tf2sos-sos2tf-幅相频谱等等.docx
- 网络营销策略-网络营销团队的建立.docx
- 电子商务示范企业申请报告.doc
- 淡雅灰低面风背景完整框架创业商业计划书PPT模板.pptx
- 计算模型与算法技术:10-Iterative Improvement.ppt
- 计算模型与算法技术:9-Greedy Technique.ppt
- 计算模型与算法技术:6-Transform-and-Conquer.ppt
- 云服务安全风险分析研究.pdf
- 软件工程笔记(完整版).doc
- 电子商务网项目实例规划书.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1