没有合适的资源?快使用搜索试试~ 我知道了~
首页Apache Spark Cheat Sheet
Apache Spark has become the engine to enhance many of the capabilities of the ever-present Apache Hadoop environment. For Big Data, Apache Spark meets a lot of needs and runs natively on Apache Hadoop’s YARN. By running Apache Spark in your Apache Hadoop environment, you gain all the security, governance, and scalability inherent to that platform. Apache Spark is also extremely well integrated with Apache Hive and gains access to all your Apache Hadoop tables utilizing integrated security.
资源详情
资源推荐
![](https://csdnimg.cn/release/download_crawler_static/10519751/bg1.jpg)
DZONE.COM/REFCARDZ
1
257204
Apache Spark
UPDATED BY TIM SPANN BIG DATA SOLUTIONS ENGINEER, HORTONWORKS
WRITTEN BY ASHWINI KUNTAMUKKALA SOFTWARE ARCHITECT, SCISPIKE
WHY APACHE SPARK?
Apache Spark has become the engine to enhance many of the
capabilities of the ever-present Apache Hadoop environment. For
Big Data, Apache Spark meets a lot of needs and runs natively on
Apache Hadoop’s YARN. By running Apache Spark in your Apache
Hadoop environment, you gain all the security, governance, and
scalability inherent to that platform. Apache Spark is also extremely
well integrated with Apache Hive and gains access to all your Apache
Hadoop tables utilizing integrated security.
Apache Spark has begun to really shine in the areas of streaming data
processing and machine learning. With first-class support of Python
as a development language, PySpark allows for data scientists,
engineers and developers to develop and scale machine learning with
ease. One of the features that has expanded this is the support for
Apache Zeppelin notebooks to run Apache Spark jobs for exploration,
data cleanup, and machine learning. Apache Spark also integrates
with other important streaming tools in the Apache Hadoop space,
namely Apache NiFi and Apache Kafka. I like to think of Apache Spark
+ Apache NiFi + Apache Kafka as the three amigos of Apache Big Data
ingest and streaming. The latest version of Apache Spark is 2.2.
ABOUT APACHE SPARK
Apache Spark is an open source, Hadoop-compatible, fast and
expressive cluster-computing data processing engine. It was created
at AMPLabs in UC Berkeley as part of Berkeley Data Analytics Stack
(BDAS). It is a top-level Apache project. The below figure shows the
various components of the current Apache Spark stack.
It has six major benefits:
1. Lightning speed of computation because data are loaded in
distributed memory (RAM) over a cluster of machines. Data can
be quickly transformed iteratively and cached on demand for
subsequent usage.
2. Highly accessible through standard APIs built in Java, Scala,
Python, R, and SQL (for interactive queries) and has a rich set of
machine learning libraries available out of the box.
3. Compatibility with existing Hadoop 2.x (YARN) ecosystems so
companies can leverage their existing infrastructure.
4. Convenient download and installation processes. Convenient
shell (REPL: Read-Eval-Print-Loop) to interactively learn the APIs.
5. Enhanced productivity due to high-level constructs that keep
the focus on content of computation.
6. Multiple user notebook environments supported by Apache
Zeppelin.
Also, Spark is implemented in Scala, which means that the code is
very succinct and fast and requires JVM to run.
HOW TO INSTALL APACHE SPARK
The following table lists a few important links and prerequisites:
Current Release
2.2.0 @ apache.org/dyn/closer.lua/
spark/spark-2.2.0/spark-2.2.0-bin-
hadoop2.7.tgz
Downloads Page
spark.apache.org/downloads.html
JDK Version (Required) 1.8 or higher
Scala Version (Required) 2.11 or higher
Python (Optional) [2.7, 3.5)
Simple Build Tool (Re-
quired)
scala-sbt.org
Development Version
github.com/apache/spark
CONTENTS
∠
WHY APACHE SPARK?
∠ ABOUT APACHE SPARK
∠ HOW TO INSTALL APACHE SPARK
∠ HOW APACHE SPARK WORKS
∠ RESILIENT DISTRIBUTED DATASET
∠ DATAFRAMES
∠ RDD PERSISTENCE
∠ SPARK SQL
∠ SPARK STREAMING
![.zip](https://img-home.csdnimg.cn/images/20210720083646.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://profile-avatar.csdnimg.cn/default.jpg!1)
过往记忆
- 粉丝: 4355
- 资源: 278
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![](https://csdnimg.cn/release/wenkucmsfe/public/img/voice.245cc511.png)
会员权益专享
最新资源
- 基于单片机的瓦斯监控系统硬件设计.doc
- 基于单片机的流量检测系统的设计_机电一体化毕业设计.doc
- 基于单片机的继电器设计.doc
- 基于单片机的湿度计设计.doc
- 基于单片机的流量控制系统设计.doc
- 基于单片机的火灾自动报警系统毕业设计.docx
- 基于单片机的铁路道口报警系统设计毕业设计.doc
- 基于单片机的铁路道口报警研究与设计.doc
- 基于单片机的流水灯设计.doc
- 基于单片机的时钟系统设计.doc
- 基于单片机的录音器的设计.doc
- 基于单片机的万能铣床设计设计.doc
- 基于单片机的简易安防声光报警器设计.doc
- 基于单片机的脉搏测量器设计.doc
- 基于单片机的家用防盗报警系统设计.doc
- 基于单片机的简易电子钟设计.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035111.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![](https://csdnimg.cn/release/wenkucmsfe/public/img/green-success.6a4acb44.png)