没有合适的资源?快使用搜索试试~ 我知道了~
首页spring-data-hadoop官方文档
spring-data-hadoop官方文档
5星 · 超过95%的资源 需积分: 50 72 下载量 155 浏览量
更新于2023-03-16
评论 1
收藏 491KB PDF 举报
spring-data-hadoop的准官方文档、(既然是官方的、那就是英文的)内含xml的文件配置!!
资源详情
资源评论
资源推荐
Spring for Apache Hadoop Reference Manual
1.0.0.RELEASE
Costin Leau SpringSource, a division of VMware
Copyright ©
Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee
for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.
Spring Hadoop
1.0.0.RELEASE
Spring for Apache Hadoop
Reference Manual ii
Table of Contents
Preface ..................................................................................................................................... iv
I. Introduction ............................................................................................................................. 1
1. Requirements ................................................................................................................. 2
2. Additional Resources ...................................................................................................... 3
II. Spring and Hadoop ................................................................................................................ 4
3. Hadoop Configuration, MapReduce, and Distributed Cache .............................................. 5
3.1. Using the Spring for Apache Hadoop Namespace ................................................. 5
3.2. Configuring Hadoop ............................................................................................. 6
3.3. Creating a Hadoop Job ........................................................................................ 9
Creating a Hadoop Streaming Job .................................................................... 10
3.4. Running a Hadoop Job ...................................................................................... 10
Using the Hadoop Job tasklet ........................................................................... 11
3.5. Running a Hadoop Tool ..................................................................................... 11
Replacing Hadoop shell invocations with tool-runner ..................................... 13
Using the Hadoop Tool tasklet .......................................................................... 13
3.6. Running a Hadoop Jar ....................................................................................... 13
Using the Hadoop Jar tasklet ............................................................................ 15
3.7. Configuring the Hadoop DistributedCache ..................................................... 15
3.8. Map Reduce Generic Options ............................................................................ 16
4. Working with the Hadoop File System ........................................................................... 17
4.1. Configuring the file-system ................................................................................. 17
4.2. Scripting the Hadoop API ................................................................................... 18
Using scripts .................................................................................................... 20
4.3. Scripting implicit variables .................................................................................. 20
Running scripts ................................................................................................ 21
Using the Scripting tasklet ................................................................................ 21
4.4. File System Shell (FsShell) ................................................................................ 22
DistCp API ....................................................................................................... 23
5. Working with HBase ..................................................................................................... 24
5.1. Data Access Object (DAO) Support .................................................................... 24
6. Hive integration ............................................................................................................ 26
6.1. Starting a Hive Server ....................................................................................... 26
6.2. Using the Hive Thrift Client ................................................................................ 26
6.3. Using the Hive JDBC Client ............................................................................... 27
6.4. Running a Hive script or query ........................................................................... 27
Using the Hive tasklet ...................................................................................... 28
6.5. Interacting with the Hive API .............................................................................. 28
7. Pig support .................................................................................................................. 30
7.1. Running a Pig script .......................................................................................... 30
Using the Pig tasklet ........................................................................................ 31
7.2. Interacting with the Pig API ................................................................................ 31
8. Cascading integration ................................................................................................... 32
8.1. Using the Cascading tasklet ............................................................................... 35
8.2. Using Scalding .................................................................................................. 35
8.3. Spring-specific local Taps .................................................................................. 36
9. Using the runner classes .............................................................................................. 38
10. Security Support ......................................................................................................... 40
Spring Hadoop
1.0.0.RELEASE
Spring for Apache Hadoop
Reference Manual iii
10.1. HDFS permissions ........................................................................................... 40
10.2. User impersonation (Kerberos) ......................................................................... 40
III. Developing Spring for Apache Hadoop Applications .............................................................. 41
11. Guidance and Examples ............................................................................................. 42
11.1. Scheduling ...................................................................................................... 42
11.2. Batch Job Listeners ......................................................................................... 42
IV. Spring for Apache Hadoop sample applications .................................................................... 44
12. Sample prerequisites .................................................................................................. 45
13. Wordcount sample using the Spring Framework ........................................................... 46
13.1. Introduction ..................................................................................................... 46
14. Wordcount sample using Spring Batch ........................................................................ 47
14.1. Introduction ..................................................................................................... 47
14.2. Basic Spring for Apache Hadoop configuration .................................................. 47
14.3. Build and run the sample application ................................................................ 49
14.4. Run the sample application as a standlone Java application ............................... 50
V. Other Resources .................................................................................................................. 52
15. Useful Links ............................................................................................................... 53
VI. Appendices ......................................................................................................................... 54
A. Using Spring for Apache Hadoop with Amazon EMR ..................................................... 55
A.1. Start up the cluster ............................................................................................ 55
A.2. Open an SSH Tunnel as a SOCKS proxy ........................................................... 56
A.3. Configuring Hadoop to use a SOCKS proxy ........................................................ 56
A.4. Accessing the file-system .................................................................................. 57
A.5. Shutting down the cluster .................................................................................. 57
A.6. Example configuration ....................................................................................... 58
B. Using Spring for Apache Hadoop with EC2/Apache Whirr ............................................... 60
B.1. Setting up the Hadoop cluster on EC2 with Apache Whirr .................................... 60
C. Spring for Apache Hadoop Schema .............................................................................. 62
Spring Hadoop
1.0.0.RELEASE
Spring for Apache Hadoop
Reference Manual iv
Preface
Spring for Apache Hadoop provides extensions to Spring, Spring Batch, and Spring Integration to build
manageable and robust pipeline solutions around Hadoop.
Spring for Apache Hadoop supports reading from and writing to HDFS, running various types of Hadoop
jobs (Java MapReduce, Streaming), scripting and HBase, Hive and Pig interactions. An important goal
is to provide excellent support for non-Java based developers to be productive using Spring for Apache
Hadoop and not have to write any Java code to use the core feature set.
Spring for Apache Hadoop also applies the familiar Spring programming model to Java MapReduce
jobs by providing support for dependency injection of simple jobs as well as a POJO based MapReduce
programming model that decouples your MapReduce classes from Hadoop specific details such as
base classes and data types.
This document assumes the reader already has a basic familiarity with the Spring Framework and
Hadoop concepts and APIs.
While every effort has been made to ensure that this documentation is comprehensive and there are
no errors, nevertheless some topics might require more explanation and some typos might have crept
in. If you do spot any mistakes or even more serious errors and you can spare a few cycles during
lunch, please do bring the error to the attention of the Spring for Apache Hadoop team by raising an
issue. Thank you.
Part I. Introduction
Spring for Apache Hadoop provides integration with the Spring Framework to create and run Hadoop
MapReduce, Hive, and Pig jobs as well as work with HDFS and HBase. If you have simple needs to
work with Hadoop, including basic scheduling, you can add the Spring for Apache Hadoop namespace
to your Spring based project and get going quickly using Hadoop. As the complexity of your Hadoop
application increases, you may want to use Spring Batch and Spring Integration to regin in the complexity
of developing a large Hadoop application.
This document is the reference guide for Spring for Apache Hadoop project (SHDP). It explains the
relationship between the Spring framework and Hadoop as well as related projects such as Spring Batch
and Spring Integration. The first part describes the integration wtih the Spring framework to define the
base concepts and semantics of the integration and how they can be use effectively. The second part
describes how you can build upon these base concepts and create workflow based solutions provided
by the integration with Spring Batch.
剩余66页未读,继续阅读
来如飞花散似烟
- 粉丝: 1
- 资源: 17
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
- MW全能培训汽轮机调节保安系统PPT教学课件.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1