掌握Apache Spark：60个实战教程，涵盖Spark Core、SQL、Stream、MLlib与GraphX

5星 · 超过95%的资源需积分: 10 2 浏览量更新于2024-07-21 1 收藏 5.23MB PDF 举报

《Spark Cookbooks》是一本专为数据工程师、应用开发者和数据科学家打造的指南，作者是Rishi Yadav，由Packt Publishing出版。本书涵盖了Apache Spark的核心组件，包括Spark Core、Spark SQL、Spark Streaming、MLlib（机器学习库）以及GraphX（图处理库），共提供了超过60个实用的实战菜谱。书中内容不仅适合单机环境的学习者，也适用于在大规模集群上进行生产级操作。本书的目的是使读者成为使用Spark进行大数据处理的专家，通过一系列精心设计的步骤和实例，学习如何安装、配置Spark，并与不同的集群管理器协同工作。在开发环境中，你将学会如何设置Spark SQL进行交互式查询，以及利用Spark Streaming进行实时流数据分析，如处理Twitter流和Apache Kafka等数据源。在机器学习部分，书中的章节深入浅出地介绍了监督学习（包括回归和分类）和无监督学习，还展示了如何构建推荐引擎。对于图处理，作者会指导读者如何使用GraphX进行复杂网络分析。此外，书中还会探讨性能优化和故障排查的实践技巧，确保在大型数据集上的高效处理。《Spark Cookbooks》强调了Spark作为单一的大数据计算平台的重要性，通过内存持久化存储技术显著提升数据处理速度，最多可达到100倍。它旨在帮助读者掌握这个强大的工具，解决各种复杂的大型数据问题。版权信息表明，未经版权所有者书面许可，不得复制、存储或通过任何方式传播此书的内容。尽管作者和出版商已尽最大努力保证信息的准确性，但本书不提供任何形式的担保，也不承担因使用本书信息造成的直接或间接损失的责任。该书于2015年7月首次出版，生产和参考编号分别为2220715，展现了Packt Publishing在Spark领域的权威性和专业性。

Preface

vii

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous

section.

There's more…

This section consists of additional information about the recipe in order to make the reader

more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information the recipe.

Conventions

In this book, you will nd a number of text styles that distinguish between different kinds of

information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, lenames, le extensions,

pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Spark

expects Java to be installed and the JAVA_HOME environment variable to be set."

A block of code is set as follows:

lazy val root = (project in file("."))

settings(

name := "wordcount"

)

Any command-line input or output is written as follows:

$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.4.tgz

剩余225页未读，继续阅读

ramissue

粉丝: 354
资源: 1487

掌握Apache Spark：60个实战教程，涵盖Spark Core、SQL、Stream、MLlib与GraphX

Packt.Spark.Cookbook.1783987065

Spark.Cookbook-英文清晰-带目录

Packt.Spark.for.Data.Science.Cookbook.2016

Microservices.Deployment.Cookbook.epub

Flex.4.Cookbook.May.2010

Hadoop.MapReduce.v2.Cookbook pdf

Apache Spark 2.x Cookbook.pdf

Apache Spark 2.x Cookbook

Apache Spark 2.x Cookbook 无水印pdf

Apache Spark 2.x Cookbook_Cloud-ready recipes for analytics and data science

最新资源