Hadoop实战2版：探索最新技术与实用案例

Hadoop

需积分: 5 81 浏览量更新于2024-07-09 收藏 10.09MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

《Hadoop in Practice 第二版》是一本由Manning出版社出版的专业书籍，专为那些希望深入了解Hadoop技术的读者精心打造。作为该领域的权威之作，第二版在保持第一版现代性和深度的基础上，进一步扩展了内容，涵盖了最新的商业发行版本如MapR，以及众多不断发展的Hadoop生态系统中的不同版本和API。书中强调了对高级Hadoop使用的全面讲解，不仅提供高质量的代码示例，让读者能够实操掌握，而且对于商业环境中Hadoop的实际应用有着深入剖析。它不仅仅是一本技术手册，更像是一座横跨Hadoop技术栈各部分的桥梁，将Hadoop的核心组件和技术有效地整合在一起。这使得读者能够从宏观角度理解整个系统，并找到适合自己项目的最佳实践。作者Alex Holmes以其实用且广泛的观点，带领读者探索Hadoop工具的多样性和实用性，激发创新思维。书中的话题涵盖广泛，既有基础入门，也有针对资深工程师的深入探讨，旨在满足不同层次学习者的需求。无论是技术架构师Mark Kemna，还是大型企业如The Walt Disney Company的高级软件工程师Chris Nauroth，都对本书给予了高度评价，认为它是通向Hadoop未来的关键指南。书中的案例分析和实战技巧共104种，为读者提供了丰富的实践操作经验，使他们在实际项目中能快速上手并解决问题。此外，作者Philipp K. Janert和Big Data Architect Ayon Sinha也肯定了这本书作为Hadoop技术基石的重要性，称它是推动技术进步和个人职业发展的重要资源。《Hadoop in Practice 第二版》是一本既适合初学者入门，又能满足高级开发者深化理解的实战指南，是Hadoop技术爱好者不可或缺的参考书籍，也是企业在大数据领域战略规划的重要参考资料。无论是在技术探索、项目实施还是职业发展道路上，它都能为读者提供强大的支持和指引。

资源详情

资源推荐

preface

I first encountered Hadoop in the fall of 2008 when I was working on an internet

crawl-and-analysis project at Verisign. We were making discoveries similar to those that

Doug Cutting and others at Nutch had made several years earlier about how to effi-

ciently store and manage terabytes of crawl-and-analyzed data. At the time, we were

getting by with our homegrown distributed system, but the influx of a new data stream

and requirements to join that stream with our crawl data couldn’t be supported by our

existing system in the required timeline.

After some research, we came across the Hadoop project, which seemed to be a

perfect fit for our needs—it supported storing large volumes of data and provided a

compute mechanism to combine them. Within a few months, we built and deployed a

MapReduce application encompassing a number of MapReduce jobs, woven together

with our own MapReduce workflow management system, onto a small cluster of 18

nodes. It was a revelation to observe our MapReduce jobs crunching through our data

in minutes. Of course, what we weren’t expecting was the amount of time that we

would spend debugging and performance-tuning our MapReduce jobs. Not to men-

tion the new roles we took on as production administrators—the biggest surprise in

this role was the number of disk failures we encountered during those first few

months supporting production.

As our experience and comfort level with Hadoop grew, we continued to build

more of our functionality using Hadoop to help with our scaling challenges. We also

started to evangelize the use of Hadoop within our organization and helped kick-start

other projects that were also facing big data challenges.

ABOUT THIS BOOK

xix

Roadmap

This book has 10 chapters divided into four parts.

Part 1 contains two chapters that form the introduction to this book. They review

Hadoop basics and look at how to get Hadoop up and running on a single host. YARN,

which is new in Hadoop version 2, is also examined, and some operational tips are

provided for performing basic functions in

YARN.

Part 2, “Data logistics,” consists of three chapters that cover the techniques and

tools required to deal with data fundamentals, how to work with various data formats,

how to organize and optimize your data, and getting data into and out of Hadoop.

Picking the right format for your data and determining how to organize data in

HDFS

are the first items you’ll need to address when working with Hadoop, and they’re cov-

ered in chapters 3 and 4 respectively. Getting data into Hadoop is one of the bigger

hurdles commonly encountered when working with Hadoop, and chapter 5 is dedi-

cated to looking at a variety of tools that work with common enterprise data sources.

Part 3 is called “Big data patterns,” and it looks at techniques to help you work effec-

tively with large volumes of data. Chapter 6 covers how to represent data such as graphs

for use with MapReduce, and it looks at several algorithms that operate on graph data.

Chapter 7 looks at more advanced data structures and algorithms such as graph pro-

cessing and using HyperLogLog for working with large datasets. Chapter 8 looks at how

to tune, debug, and test MapReduce performance issues, and it also covers a number

of techniques to help make your jobs run faster.

Part 4 is titled “Beyond MapReduce,” and it examines a number of technologies

that make it easier to work with Hadoop. Chapter 9 covers the most prevalent and

promising

SQL technologies for data processing on Hadoop, and Hive, Impala, and

Spark

SQL are examined. The final chapter looks at how to write your own YARN appli-

cation, and it provides some insights into some of the more advanced features you can

use in your applications.

The appendix covers instructions for the source code that accompanies this book,

as well as installation instructions for Hadoop and all the other related technologies

covered in the book.

Finally, there are two bonus chapters available from the publisher’s website at

www.manning.com/HadoopinPracticeSecondEdition: chapter 11 “Integrating R and

Hadoop for statistics and more” and chapter 12 “Predictive analytics with Mahout.”

What’s new in the second edition?

This second edition covers Hadoop 2, which at the time of writing is the current

production-ready version of Hadoop. The first edition of the book covered Hadoop 0.22

(Hadoop 1 wasn’t yet out), and Hadoop 2 has turned the world upside-down and

opened up the Hadoop platform to processing paradigms beyond MapReduce.

YARN,

the new scheduler and application manager in Hadoop 2, is complex and new to the

community, which prompted me to dedicate a new chapter 2 to covering YARN basics

and to discussing how MapReduce now functions as a

YARN application.

剩余512页未读，继续阅读

xm1223

粉丝: 2
资源: 36

Hadoop实战2版：探索最新技术与实用案例

Hadoop.in.Practice.2nd.Edition

Hadoop英文电子书集合

for host in hadoop101 hadoop102 hadoop103

Hadoop is not in the classpath/dependencies

hive Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path

Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path

%HADOOP_HOME%in

hadoop命令hadoop classpath

schematool -initSchema -dbType mysql -verbose Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path

hadoop3.1.4 hadoop-core

hadoop.the.definitive.guide.4th.edition.1491901632

hadoop1.0和hadoop3.0

hadoop2.0和hadoop3.0

配置Hadoop环境变量

hadoop exception in thread \main\" exitcodeexception exitcode=-1073741701:"

hadoop is not in the sudoers file. This incident will be reported.

中国石油大学(北京)克拉玛依校区在广东2021-2024各专业最低录取分数及位次表.pdf

浙江越秀外国语学院在广东2021-2024各专业最低录取分数及位次表.pdf

最新资源