没有合适的资源?快使用搜索试试~ 我知道了~
首页High Performance Spark Best Practices for Scaling and Optimizing Apache 无水印pdf
High Performance Spark Best Practices for Scaling and Optimizing...
需积分: 9 147 浏览量
更新于2023-03-16
评论
收藏 6.92MB PDF 举报
High Performance Spark Best Practices for Scaling and Optimizing Apache Spark 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
资源详情
资源评论
资源推荐

Holden Karau &
Rachel Warren
High Performance
Spark
BEST PRACTICES FOR SCALING
& OPTIMIZING APACHE SPARK


Holden Karau and Rachel Warren
High Performance Spark
Best Practices for Scaling and
Optimizing Apache Spark
Boston Farnham Sebastopol
Tokyo
Beijing Boston Farnham Sebastopol
Tokyo
Beijing

978-1-491-94320-5
[LSI]
High Performance Spark
by Holden Karau and Rachel Warren
Copyright © 2017 Holden Karau, Rachel Warren. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor:
Shannon Cutt Indexer: Ellen Troutman-Zaig
Production Editor:
Kristen Brown Interior Designer: David Futato
Copyeditor:
Kim Cofer Cover Designer: Karen Montgomery
Proofreader:
James Fraleigh Illustrator: Rebecca Demarest
June 2017: First Edition
Revision History for the First Edition
2017-05-22: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. High Performance Spark, the cover
image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.

Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1.
Introduction to High Performance Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What Is Spark and Why Performance Matters 1
What You Can Expect to Get from This Book 2
Spark Versions 3
Why Scala? 3
To Be a Spark Expert You Have to Learn a Little Scala Anyway 3
The Spark Scala API Is Easier to Use Than the Java API 4
Scala Is More Performant Than Python 4
Why Not Scala? 4
Learning Scala 5
Conclusion 6
2.
How Spark Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
How Spark Fits into the Big Data Ecosystem 8
Spark Components 8
Spark Model of Parallel Computing: RDDs 10
Lazy Evaluation 11
In-Memory Persistence and Memory Management 13
Immutability and the RDD Interface 14
Types of RDDs 16
Functions on RDDs: Transformations Versus Actions 17
Wide Versus Narrow Dependencies 17
Spark Job Scheduling 19
Resource Allocation Across Applications 20
The Spark Application 20
The Anatomy of a Spark Job 22
iii
剩余355页未读,继续阅读


















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0