没有合适的资源?快使用搜索试试~ 我知道了~
首页Spark-The Definitive Guide Big Data Processing Made Simple
Spark-The Definitive Guide Big Data Processing Made Simple
5星 · 超过95%的资源 需积分: 14 145 下载量 179 浏览量
更新于2023-03-16
评论 4
收藏 8.41MB PDF 举报
Spark-The Definitive Guide Big Data Processing Made Simple 完美true pdf。 Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale.
资源详情
资源评论
资源推荐
Spark: The Definitive Guide
Big Data Processing Made Simple
Bill Chambers and Matei Zaharia
Spark: The Definitive Guide
by Bill Chambers and Matei Zaharia
Copyright © 2018 Databricks. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online
editions are also available for most titles (http://oreilly.com/safari). For more information,
contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor: Nicole Tache
Production Editor: Justin Billing
Copyeditor: Octal Publishing, Inc., Chris Edwards, and Amanda Kersey
Proofreader: Jasmine Kwityn
Indexer: Judith McConville
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
February 2018: First Edition
Revision History for the First Edition
2018-02-08: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491912218 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Spark: The Definitive Guide,
the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. Apache, Spark
and Apache Spark are trademarks of the Apache Software Foundation.
While the publisher and the authors have used good faith efforts to ensure that the information
and instructions contained in this work are accurate, the publisher and the authors disclaim all
responsibility for errors or omissions, including without limitation responsibility for damages
resulting from the use of or reliance on this work. Use of the information and instructions
contained in this work is at your own risk. If any code samples or other technology this work
contains or describes is subject to open source licenses or the intellectual property rights of
others, it is your responsibility to ensure that your use thereof complies with such licenses and/or
rights.
978-1-491-91221-8
[M]
Preface
Welcome to this first edition of Spark: The Definitive Guide! We are excited to bring you the
most complete resource on Apache Spark today, focusing especially on the new generation of
Spark APIs introduced in Spark 2.0.
Apache Spark is currently one of the most popular systems for large-scale data processing, with
APIs in multiple programming languages and a wealth of built-in and third-party libraries.
Although the project has existed for multiple years—first as a research project started at UC
Berkeley in 2009, then at the Apache Software Foundation since 2013—the open source
community is continuing to build more powerful APIs and high-level libraries over Spark, so
there is still a lot to write about the project. We decided to write this book for two reasons. First,
we wanted to present the most comprehensive book on Apache Spark, covering all of the
fundamental use cases with easy-to-run examples. Second, we especially wanted to explore the
higher-level “structured” APIs that were finalized in Apache Spark 2.0—namely DataFrames,
Datasets, Spark SQL, and Structured Streaming—which older books on Spark don’t always
include. We hope this book gives you a solid foundation to write modern Apache Spark
applications using all the available tools in the project.
In this preface, we’ll tell you a little bit about our background, and explain who this book is for
and how we have organized the material. We also want to thank the numerous people who
helped edit and review this book, without whom it would not have been possible.
About the Authors
Both of the book’s authors have been involved in Apache Spark for a long time, so we are very
excited to be able to bring you this book.
Bill Chambers started using Spark in 2014 on several research projects. Currently, Bill is a
Product Manager at Databricks where he focuses on enabling users to write various types of
Apache Spark applications. Bill also regularly blogs about Spark and presents at conferences and
meetups on the topic. Bill holds a Master’s in Information Management and Systems from the
UC Berkeley School of Information.
Matei Zaharia started the Spark project in 2009, during his time as a PhD student at UC
Berkeley. Matei worked with other Berkeley researchers and external collaborators to design the
core Spark APIs and grow the Spark community, and has continued to be involved in new
initiatives such as the structured APIs and Structured Streaming. In 2013, Matei and other
members of the Berkeley Spark team co-founded Databricks to further grow the open source
project and provide commercial offerings around it. Today, Matei continues to work as Chief
Technologist at Databricks, and also holds a position as an Assistant Professor of Computer
Science at Stanford University, where he does research on large-scale systems and AI. Matei
received his PhD in Computer Science from UC Berkeley in 2013.
剩余600页未读,继续阅读
寒沧
- 粉丝: 269
- 资源: 162
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- ExcelVBA中的Range和Cells用法说明.pdf
- 基于单片机的电梯控制模型设计.doc
- 主成分分析和因子分析.pptx
- 共享笔记服务系统论文.doc
- 基于数据治理体系的数据中台实践分享.pptx
- 变压器的铭牌和额定值.pptx
- 计算机网络课程设计报告--用winsock设计Ping应用程序.doc
- 高电压技术课件:第03章 液体和固体介质的电气特性.pdf
- Oracle商务智能精华介绍.pptx
- 基于单片机的输液滴速控制系统设计文档.doc
- dw考试题 5套.pdf
- 学生档案管理系统详细设计说明书.doc
- 操作系统PPT课件.pptx
- 智慧路边停车管理系统方案.pptx
- 【企业内控系列】企业内部控制之人力资源管理控制(17页).doc
- 温度传感器分类与特点.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论9