使用Spark构建大规模机器学习应用

5星 · 超过95%的资源需积分: 11 9 浏览量更新于2024-07-22 9 收藏 4.74MB PDF 举报

"Machine Learning with Spark" 是一本由Nick Pentreath编著的书籍，主要讲解如何使用Apache Spark构建可扩展的机器学习应用，以驱动数据驱动的现代业务。本书面向具有Scala、Java或Python背景，并对机器学习和数据分析感兴趣的读者，即使没有Spark经验也能学习。书中通过实际案例，教你如何利用Spark开发自己的机器学习系统，结合多种技术和模型构建智能机器学习系统，以及如何加载、分析、清洗和转换数据。书中详细涵盖了Spark API的基础知识，用于数据处理和准备，以便输入到各种机器学习模型中。内容包括推荐系统、分类、回归、聚类和降维等常见模型的实现。此外，还涉及了大规模文本数据处理、在线机器学习和使用Spark Streaming进行模型评估的高级主题。书中的目录包括： 1. Spark的入门与启动：介绍Spark的安装、集群设置和编程模型，涵盖SparkContext、SparkConf、SparkShell、ResilientDistributedDatasets（RDD）以及Spark的基本操作。 2. 设计机器学习系统：以MovieStream为例，讨论机器学习系统的业务应用场景，如个性化推荐、目标营销和客户细分。 3. 使用Spark获取、处理和准备数据：教授如何利用Spark进行数据预处理工作。 4. 使用Spark构建推荐引擎：展示如何创建推荐系统。 5. 使用Spark构建分类模型：介绍分类模型的构建方法。 6. 使用Spark构建回归模型：阐述回归模型的实现过程。 7. 使用Spark构建聚类模型：讲解如何执行数据聚类。 8. 使用Spark进行降维：探讨降维技术的应用。 9. 使用Spark进行高级文本处理：介绍处理大规模文本数据的策略。 10. 使用Spark Streaming进行实时机器学习：讨论如何在实时环境中应用机器学习。这本书适合希望在分布式环境中进行大规模机器学习的开发者，通过实例和实践指导，帮助读者掌握Spark在机器学习领域的应用技巧。

MachineLearningwithSpark

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,

ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthe

publisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyofthe

informationpresented.However,theinformationcontainedinthisbookissoldwithout

warranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,andits

dealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecaused

directlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthe

companiesandproductsmentionedinthisbookbytheappropriateuseofcapitals.

However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:February2015

Productionreference:1170215

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78328-851-9

www.packtpub.com

CoverimagebyAkshayPaunikar(<akshaypaunikar4@gmail.com>)

剩余473页未读，继续阅读

ramissue

粉丝: 354
资源: 1487

使用Spark构建大规模机器学习应用

StumbleUpon Evergreen 数据集

kaggle中的stumbleupon数据集，机器学习资源（包含train,testsampleSubmission,raw_content）

stumbleupon.zip

[Building.Machine.Learning.Systems.with.Python(2013.7)].Willi.Richert.文字版

Practical.Machine.Learning.178439968X

Packt.Machine Learning with Spark.2015

Large Scale Machine Learning with Spark.pdf

Mastering Machine Learning with Spark 2.X

Packt Machine Learning with Spark 2nd.Edition 代码

最新资源