没有合适的资源？快使用搜索试试~ 我知道了~

首页Data Analytics with Hadoop(O'Reilly,2016)

Data Analytics with Hadoop(O'Reilly,2016)

Data

Analytics

with

Hadoop

5星 · 超过95%的资源需积分: 13 74 下载量 139 浏览量更新于2023-03-16 评论 3 收藏 7.08MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

试读

288页

Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data. Understand core concepts behind Hadoop and cluster computing Use design patterns and parallel analytical algorithms to create distributed data analysis jobs Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase Use Sqoop and Apache Flume to ingest data from relational databases Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib

资源详情

资源评论

资源推荐

Benjamin Bengfort & Jenny Kim

Data Analytics

with Hadoop

AN INTRODUCTION FOR DATA SCIENTISTS

www.allitebooks.com

Benjamin Bengfort and Jenny Kim

Data Analytics with Hadoop

An Introduction for Data Scientists

Boston Farnham Sebastopol

Tokyo

Beijing Boston Farnham Sebastopol

Tokyo

Beijing

www.allitebooks.com

978-1-491-91370-3

[LSI]

Data Analytics with Hadoop

by Benjamin Bengfort and Jenny Kim

Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are

also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/

institutional sales department: 800-998-9938 or corporate@oreilly.com.

Editor: Nicole Tache

Production Editor: Melanie Yarbrough

Copyeditor: Colleen Toporek

Proofreader: Jasmine Kwityn

Indexer: WordCo Indexing Services

Interior Designer: David Futato

Cover Designer: Randy Comer

Illustrator: Rebecca Demarest

June 2016: First Edition

Revision History for the First Edition

2016-05-25: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491913703 for release details.

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Data Analytics with Hadoop, the cover

image, and related trade dress are trademarks of O’Reilly Media, Inc.

While the publisher and the authors have used good faith efforts to ensure that the information and

instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility

for errors or omissions, including without limitation responsibility for damages resulting from the use of

or reliance on this work. Use of the information and instructions contained in this work is at your own

risk. If any code samples or other technology this work contains or describes is subject to open source

licenses or the intellectual property rights of others, it is your responsibility to ensure that your use

thereof complies with such licenses and/or rights.

www.allitebooks.com

Table of Contents

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Part I. Introduction to Distributed Computing

The Age of the Data Product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

What Is a Data Product? 4

Building Data Products at Scale with Hadoop 5

Leveraging Large Datasets 6

Hadoop for Data Products 7

The Data Science Pipeline and the Hadoop Ecosystem 8

Big Data Workflows 10

Conclusion 11

An Operating System for Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Basic Concepts 14

Hadoop Architecture 15