没有合适的资源?快使用搜索试试~ 我知道了~
首页learing sprak
learing sprak
4星 · 超过85%的资源 需积分: 10 24 下载量 153 浏览量
更新于2023-06-07
评论
收藏 1.19MB PDF 举报
一本正在写的书,这是预览版,但感觉比packt的spark书看起来要好一些,市面上不多的spark书之一
资源详情
资源评论
资源推荐
2
Learning Spark
Holden Karau
Andy Konwinski
Patrick Wendell
Matei Zaharia
Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo
3
Table of Contents
Preface
................................................................................................................................................
5
Audience
................................................................................................................................................
5
How This Book is Organized
..............................................................................................................
6
Supporting Books
.................................................................................................................................
6
Code Examples
.....................................................................................................................................
7
Early Release Status and Feedback
...................................................................................................
7
Chapter 1. Introduction to Data Analysis with Spark
......................................................
8
What is Apache Spark?
.......................................................................................................................
8
A Unified Stack
.....................................................................................................................................
8
Who Uses Spark, and For What?
......................................................................................................
11
A Brief History of Spark
....................................................................................................................
13
Spark Versions and Releases
............................................................................................................
13
Spark and Hadoop
.............................................................................................................................
14
Chapter 2. Downloading and Getting Started
...................................................................
15
Downloading Spark
............................................................................................................................
15
Introduction to Spark’s Python and Scala Shells
..........................................................................
16
Introduction to Core Spark Concepts
.............................................................................................
20
Standalone Applications
...................................................................................................................
23
Conclusion
..........................................................................................................................................
25
Chapter 3. Programming with RDDs
...................................................................................
26
RDD Basics
.........................................................................................................................................
26
Creating RDDs
...................................................................................................................................
28
RDD Operations
................................................................................................................................
28
Passing Functions to Spark
..............................................................................................................
32
Common Transformations and Actions
.........................................................................................
36
Persistence (Caching)
........................................................................................................................
46
Conclusion
..........................................................................................................................................
48
Chapter 4. Working with Key-Value Pairs
.........................................................................
49
4
Motivation
..........................................................................................................................................
49
Creating Pair RDDs
...........................................................................................................................
49
Transformations on Pair RDDs
.......................................................................................................
50
Actions Available on Pair RDDs
......................................................................................................
60
Data Partitioning
................................................................................................................................
61
Conclusion
..........................................................................................................................................
70
Chapter 5. Loading and Saving Your Data
..........................................................................
71
Motivation
...........................................................................................................................................
71
Choosing a Format
.............................................................................................................................
71
Formats
...............................................................................................................................................
72
File Systems
........................................................................................................................................
88
Compression
.......................................................................................................................................
89
Databases
............................................................................................................................................
91
Conclusion
..........................................................................................................................................
93
About the Authors
........................................................................................................................
95
5
Preface
As parallel data analysis has become increasingly common, practitioners in many fields have
sought easier tools for this task. Apache Spark has quickly emerged as one of the most popular
tools for this purpose, extending and generalizing MapReduce. Spark offers three main benefits.
First, it is easy to use—you can develop applications on your laptop, using a high-level API that
lets you focus on the content of your computation. Second, Spark is fast, enabling interactive use
and complex algorithms. And third, Spark is a general engine, allowing you to combine multiple
types of computations (e.g., SQL queries, text processing and machine learning) that might
previously have required learning different engines. These features make Spark an excellent
starting point to learn about big data in general.
This introductory book is meant to get you up and running with Spark quickly. You’ll learn how
to learn how to download and run Spark on your laptop and use it interactively to learn the API.
Once there, we’ll cover the details of available operations and distributed execution. Finally,
you’ll get a tour of the higher-level libraries built into Spark, including libraries for machine
learning, stream processing, graph analytics and SQL. We hope that this book gives you the
tools to quickly tackle data analysis problems, whether you do so on one machine or hundreds.
Audience
This book targets Data Scientists and Engineers. We chose these two groups because they have
the most to gain from using Spark to expand the scope of problems they can solve. Spark’s rich
collection of data focused libraries (like MLlib) make it easy for data scientists to go beyond
problems that fit on single machine while making use of their statistical background. Engineers,
meanwhile, will learn how to write general-purpose distributed programs in Spark and operate
production applications. Engineers and data scientists will both learn different details from this
book, but will both be able to apply Spark to solve large distributed problems in their respective
fields.
Data scientists focus on answering questions or building models from data. They often have a
statistical or math background and some familiarity with tools like Python, R and SQL. We have
made sure to include Python, and wherever possible SQL, examples for all our material, as well
as an overview of the machine learning and advanced analytics libraries in Spark. If you are a
data scientist, we hope that after reading this book you will be able to use the same
mathematical approaches to solving problems, except much faster and on a much larger scale.
剩余94页未读,继续阅读
xcp881012
- 粉丝: 0
- 资源: 29
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1