没有合适的资源?快使用搜索试试~ 我知道了~
首页Big Data with Apache Spark and Python 无水印pdf
Big Data with Apache Spark and Python 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
资源详情
资源评论
资源推荐
Contents
1: Getting Started with Spark
b'Chapter 1: Getting Started with Spark'
b'Getting set up - installing Python, a JDK, and Spark and its
dependencies'
b'Installing the MovieLens movie rating dataset'
b'Run your first Spark program - the ratings histogram example'
b'Summary'
2: Spark Basics and Spark Examples
b'Chapter 2: Spark Basics and Spark Examples'
b'What is Spark?'
b'The Resilient Distributed Dataset (RDD)'
b'Ratings histogram walk-through'
b'Key/value RDDs and the average friends by age example'
b'Running the average friends by age example'
b'Filtering RDDs and the minimum temperature by location
example'
b'Running the minimum temperature example and modifying it for
maximums'
b'Running the maximum temperature by location example'
b'Counting word occurrences using flatmap()'
b'Improving the word-count script with regular expressions'
b'Sorting the word count results'
b'Find the total amount spent by customer'
b'Check your results and sort them by the total amount spent'
b'Check your sorted implementation and results against mine'
b'Summary'
3: Advanced Examples of Spark Programs
b'Chapter 3: Advanced Examples of Spark Programs'
b'Finding the most popular movie'
b'Using broadcast variables to display movie names instead of ID
numbers'
b'Finding the most popular superhero in a social graph'
b'Running the script - discover who the most popular superhero is'
b'Superhero degrees of separation - introducing the breadth-first
search algorithm'
b'Accumulators and implementing BFS in Spark'
b'Superhero degrees of separation - review the code and run it'
b'Item-based collaborative filtering in Spark, cache(), and persist()'
b'Running the similar-movies script using Spark's cluster manager'
b'Improving the quality of the similar movies example'
b'Summary'
4: Running Spark on a Cluster
b'Chapter 4: Running Spark on a Cluster'
b'Introducing Elastic MapReduce'
b'Setting up our Amazon Web Services / Elastic MapReduce
account and PuTTY'
b'Partitioning'
b'Creating similar movies from one million ratings - part 1'
b'Creating similar movies from one million ratings - part 2'
b'Creating similar movies from one million ratings
\xc3\xa2\xc2\x80\xc2\x93 part 3'
b'Troubleshooting Spark on a cluster'
b'More troubleshooting and managing dependencies'
b'Summary'
5: SparkSQL, DataFrames, and DataSets
b'Chapter 5: SparkSQL, DataFrames, and DataSets'
b'Introducing SparkSQL'
b'Executing SQL commands and SQL-style functions on a
DataFrame'
b'Using DataFrames instead of RDDs'
b'Summary'
6: Other Spark Technologies and Libraries
b'Chapter 6: Other Spark Technologies and Libraries'
b'Introducing MLlib'
b'Using MLlib to produce movie recommendations'
b'Analyzing the ALS recommendations results'
b'Using DataFrames with MLlib'
b'Spark Streaming and GraphX'
b'Summary'
7: Where to Go From Here? � Learning More About Spark and Data
Science
b'Chapter 7: Where to Go From Here? \xe2\x80\x93 Learning More
About Spark and Data Science'
Chapter 1. Getting Started with Spark
Spark is one of the hottest technologies in big data analysis right now, and
with good reason. If you work for, or you hope to work for, a company that
has massive amounts of data to analyze, Spark offers a very fast and very
easy way to analyze that data across an entire cluster of computers and spread
that processing out. This is a very valuable skill to have right now.
My approach in this book is to start with some simple examples and work our
way up to more complex ones. We'll have some fun along the way too. We
will use movie ratings data and play around with similar movies and movie
recommendations. I also found a social network of superheroes, if you can
believe it; we can use this data to do things such as figure out who's the most
popular superhero in the fictional superhero universe. Have you heard of the
Kevin Bacon number, where everyone in Hollywood is supposedly connected
to a Kevin Bacon to a certain extent? We can do the same thing with our
superhero data and figure out the degrees of separation between any two
superheroes in their fictional universe too. So, we'll have some fun along the
way and use some real examples here and turn them into Spark problems.
Using Apache Spark is easier than you might think and, with all the exercises
and activities in this book, you'll get plenty of practice as we go along. I'll
guide you through every line of code and every concept you need along the
way. So let's get started and learn Apache Spark.
Getting set up - installing Python, a JDK,
and Spark and its dependencies
Let's get you started. There is a lot of software we need to set up. Running
Spark on Windows involves a lot of moving pieces, so make sure you follow
along carefully, or else you'll have some trouble. I'll try to walk you through
it as easily as I can. Now, this chapter is written for Windows users. This
doesn't mean that you're out of luck if you're on Mac or Linux though. If you
open up the download package for the book or go to this URL,
http://media.sundog-soft.com/spark-python-install.pdf, you will find written
instructions on getting everything set up on Windows, macOS, and Linux.
So, again, you can read through the chapter here for Windows users, and I
will call out things that are specific to Windows, so you'll find it useful in
other platforms as well; however, either refer to that spark-python-
install.pdf file or just follow the instructions here on Windows and let's
dive in and get it done.
Installing Enthought Canopy
This book uses Python as its programming language, so the first thing you
need is a Python development environment installed on your PC. If you don't
have one already, just open up a web browser and head on to
https://www.enthought.com/, and we'll install Enthought Canopy:
剩余107页未读,继续阅读
yinkaisheng-nj
- 粉丝: 763
- 资源: 6953
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0