Hadoop大数据技术入门指南

需积分: 9 166 浏览量更新于2024-07-17 1 收藏 20.7MB PDF 举报

"Hadoop: The Definitive Guide by Tom White" 本书是《Hadoop权威指南》的英文版，由Tom White撰写，献给Eliane, Emilia, 和 Lottie。书中介绍了Hadoop的基础知识，包括如何安装Hadoop，如何操作HDFS（Hadoop分布式文件系统），以及如何使用YARN（Yet Another Resource Negotiator）资源调度框架，并深入讲解了MapReduce的工作处理机制。这本书特别适合英语水平较高的初学者和对Hadoop感兴趣的读者。 Hadoop起源于Nutch项目，最初是为了构建一个开源搜索引擎。在Google公开了GFS（Google File System）和MapReduce的论文后，Hadoop的发展方向逐渐明确。它解决的是在多台计算机上进行大规模计算时遇到的问题。起初，只有两名开发者半职投入，他们使Nutch能在20台机器上勉强运行。但随着互联网规模的扩大，需要在数千台机器上运行，这个任务超出了两人所能承受的范围。这时，雅虎（Yahoo!）对该项目产生了兴趣并迅速组建了一个团队，作者也加入了其中。他们将Nutch中的分布式计算部分剥离出来，单独命名为Hadoop。在雅虎的支持下，Hadoop迅速发展成为能够真正应对互联网规模的技术。 2006年，Tom White开始为Hadoop贡献代码。在此之前，他已经因为撰写的一篇关于Nutch的优秀文章而为人所知。Tom White的加入无疑为Hadoop的发展注入了新的活力，他的这本书详细阐述了Hadoop的核心概念和技术细节，对于理解Hadoop生态系统和大数据处理具有极高的价值。 HDFS是Hadoop的核心组件之一，它是一个分布式文件系统，设计用于存储大量数据并在集群中进行高效访问。HDFS的特点是高容错性和高可用性，即使在硬件故障情况下也能保证数据的完整性。用户可以通过HDFS API进行文件的创建、读取和删除等操作。 YARN作为Hadoop的资源管理器，负责集群中计算资源的分配和调度，使得不同应用能共享集群资源。它将原本由JobTracker承担的任务管理和资源调度职责分离，形成了Resource Manager和Application Master的概念，提高了系统的可扩展性和效率。 MapReduce是Hadoop处理大数据的关键计算模型，它将大规模数据处理分解为两个主要阶段：Map和Reduce。Map阶段将数据分片并并行处理，Reduce阶段则对结果进行聚合，以得到最终的输出。MapReduce模型非常适合批处理任务，如数据分析、日志处理等。通过学习这本书，读者可以掌握Hadoop的安装配置，理解HDFS的存储机制，熟悉YARN的资源管理，以及运用MapReduce编写分布式应用程序。这些知识对于进入大数据领域，进行数据处理和分析具有基础性的作用。

HowtoContactUs

Pleaseaddresscommentsandquestionsconcerningthisbooktothepublisher:

O’ReillyMedia,Inc.

1005GravensteinHighwayNorth

Sebastopol,CA95472

800-998-9938(intheUnitedStatesorCanada)

707-829-0515(internationalorlocal)

707-829-0104(fax)

Wehaveawebpageforthisbook,wherewelisterrata,examples,andanyadditional

information.Youcanaccessthispageathttp://bit.ly/hadoop_tdg_4e.

Tocommentorasktechnicalquestionsaboutthisbook,sendemailto

bookquestions@oreilly.com.

Formoreinformationaboutourbooks,courses,conferences,andnews,seeourwebsiteat

http://www.oreilly.com.

FindusonFacebook:http://facebook.com/oreilly

FollowusonTwitter:http://twitter.com/oreillymedia

WatchusonYouTube:http://www.youtube.com/oreillymedia

Acknowledgments

Ihavereliedonmanypeople,bothdirectlyandindirectly,inwritingthisbook.Iwould

liketothanktheHadoopcommunity,fromwhomIhavelearned,andcontinuetolearn,a

greatdeal.

Inparticular,IwouldliketothankMichaelStackandJonathanGrayforwritingthe

chapteronHBase.ThanksalsogotoAdrianWoodhead,MarcdePalol,JoydeepSen

Sarma,AshishThusoo,AndrzejBiałecki,StuHood,ChrisK.Wensel,andOwen

O’Malleyforcontributingcasestudies.

Iwouldliketothankthefollowingreviewerswhocontributedmanyhelpfulsuggestions

andimprovementstomydrafts:RaghuAngadi,MattBiddulph,ChristopheBisciglia,

RyanCox,DevarajDas,AlexDorman,ChrisDouglas,AlanGates,LarsGeorge,Patrick

Hunt,AaronKimball,PeterKrey,HairongKuang,SimonMaxen,OlgaNatkovich,

BenjaminReed,KonstantinShvachko,AllenWittenauer,MateiZaharia,andPhilip

Zeyliger.AjayAnandkeptthereviewprocessflowingsmoothly.Philip(“flip”)Kromer

kindlyhelpedmewiththeNCDCweatherdatasetfeaturedintheexamplesinthisbook.

SpecialthankstoOwenO’MalleyandArunC.Murthyforexplainingtheintricaciesofthe

MapReduceshuffletome.Anyerrorsthatremainare,ofcourse,tobelaidatmydoor.

Forthesecondedition,Ioweadebtofgratitudeforthedetailedreviewsandfeedback

fromJeffBean,DougCutting,GlynnDurham,AlanGates,JeffHammerbacher,Alex

Kozlov,KenKrugler,JimmyLin,ToddLipcon,SarahSproehnle,VinithraVaradharajan,

andIanWrigley,aswellasallthereaderswhosubmittederrataforthefirstedition.I

wouldalsoliketothankAaronKimballforcontributingthechapteronSqoop,andPhilip

(“flip”)Kromerforthecasestudyongraphprocessing.

Forthethirdedition,thanksgotoAlejandroAbdelnur,EvaAndreasson,EliCollins,Doug

Cutting,PatrickHunt,AaronKimball,AaronT.Myers,BrockNoland,ArvindPrabhakar,

AhmedRadwan,andTomWheelerfortheirfeedbackandsuggestions.RobWeltman

kindlygaveverydetailedfeedbackforthewholebook,whichgreatlyimprovedthefinal

manuscript.Thanksalsogotoallthereaderswhosubmittederrataforthesecondedition.

Forthefourthedition,IwouldliketothankJodokBatlogg,MeghanBlanchette,Ryan

Blue,JarekJarcecCecho,JulesDamji,DennisDawson,MatthewGast,KarthikKambatla,

JulienLeDem,BrockNoland,SandyRyza,AkshaiSarma,BenSpivey,MichaelStack,

KateTing,JoshWalter,JoshWills,andAdrianWoodheadforalloftheirinvaluable

reviewfeedback.RyanBrush,MicahWhitacre,andMattMassiekindlycontributednew

casestudiesforthisedition.Thanksagaintoallthereaderswhosubmittederrata.

IamparticularlygratefultoDougCuttingforhisencouragement,support,andfriendship,

andforcontributingtheForeword.

ThanksalsogotothemanyotherswithwhomIhavehadconversationsoremail

discussionsoverthecourseofwritingthebook.

Halfwaythroughwritingthefirsteditionofthisbook,IjoinedCloudera,andIwantto

thankmycolleaguesforbeingincrediblysupportiveinallowingmethetimetowriteand

togetitfinishedpromptly.

剩余804页未读，继续阅读

云里飞龙531

粉丝: 1
资源: 8

Hadoop大数据技术入门指南

Hadoop相关基础9篇英文论文

Hadoop大数据开发基础.rar

大数据hadoop从入门到精通

java大数据入门进阶

大数据技术之hadoop(入门)v3.3

头歌大数据入门到实战 第二章分布式文件系统hdfs

新手入门大数据经典必读书籍

轻松入门大数据 一站式完成核心能力构建 csdn下载

头歌大数据从入门到实战 - 第2章 分布式文件系统hdfs

大数据从入门到实战 - 第3章 mapreduce基础实战

最新资源

头歌大数据入门到实战第二章分布式文件系统hdfs

轻松入门大数据一站式完成核心能力构建 csdn下载

头歌大数据从入门到实战 - 第2章分布式文件系统hdfs