【免费】1、hadoop3.1.4简单介绍及部署、简单验证_hadoop3.1.4

需积分: 0 173 浏览量更新于2024-01-25 评论收藏 1.45MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

Hadoop是一个用Java语言实现的开源软件框架，旨在开发和运行处理大规模数据的软件平台。它允许使用简单的编程模型在大量计算机集群上对大型数据集进行分布式处理。作为Apache旗下的项目，Hadoop的核心组件包括HDFS、YARN和MapReduce。 HDFS是分布式文件系统，用于解决海量数据的存储问题。它将大文件分割成多个块，并在计算机集群中的多台机器上存储这些块，实现数据的分布式存储和高可靠性。HDFS的设计目标是在普通硬件上提供高吞吐量的数据访问。 YARN是用于作业调度和集群资源管理的框架。它负责向集群中的计算资源分配任务，并监控任务的运行情况。YARN可以支持不同的计算框架，如MapReduce、Spark等，使得集群资源的利用更加灵活和高效。 MapReduce是分布式运算编程框架，用于解决海量数据的计算问题。MapReduce的核心思想是将一个大任务划分为多个小任务，分发给计算机集群中的多台机器并行处理。每台机器将收到的数据进行局部处理，并生成中间结果，最后将这些中间结果合并起来得到最终结果。除了上述核心组件，Hadoop生态圈还包括许多其他项目，如Hive、HBase、Sqoop、Pig等。这些项目为Hadoop提供了补充或更高层次的抽象，使得用户可以更方便地操作和分析数据。 Hadoop的发展历程可以追溯到2004年，当时由Google的GFS（Google File System）和MapReduce的论文开启了分布式存储和计算的新时代。之后，Doug Cutting和Mike Cafarella在2005年创建了一个开源项目，取名为Hadoop，以实现Google的GFS和MapReduce模型。2006年，Hadoop成为Apache的一个顶级项目，并得到了许多贡献者的加入和支持。随着Hadoop的发展，越来越多的组织和企业开始使用Hadoop来处理大数据。Hadoop的生态系统也在不断增长，新的项目和工具层出不穷。目前，Hadoop已经成为处理大规模数据的事实标准，并广泛应用于互联网、金融、医疗等众多领域。在部署和验证Hadoop 3.1.4时，我们需要事先设置免密登录，并确保已安装JDK和正常运行的Zookeeper。具体的部署和验证步骤，请参考相关文章中的详细说明。总结起来，本文对Hadoop的发展历程进行了简要介绍，并介绍了Hadoop 3.1.4的特性、部署和简单验证方法。Hadoop作为一个开源的大数据处理平台，通过其分布式文件系统、作业调度和资源管理的框架，以及分布式运算编程框架，为用户提供了高性能、可扩展和可靠的解决方案。随着Hadoop生态圈的不断壮大，使用Hadoop进行大数据处理已经成为业界的趋势，并取得了广泛的应用和发展。

资源详情

资源评论

资源推荐

解压后的配置文件都是空的，形如下面，如果没有配置，系统会使用其自带的配置文件，例如core-

site.xml会使用core-default.xml。

该步骤是alanchan用户操作。

Apache Hadoop 3.3.4 – Hadoop Cluster Setup

Configuring Hadoop in Non-Secure Mode

Hadoop’s Java configuration is driven by two types of important configuration files:

Read-only default configuration - core-default.xml, hdfs-default.xml, yarn-default.xml and

mapred-default.xml.

Site-specific configuration - etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-

site.xml and etc/hadoop/mapred-site.xml.

Additionally, you can control the Hadoop scripts found in the bin/ directory of the distribution, by

setting site-specific values via the etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh.

To configure the Hadoop cluster you will need to configure the environment in which the Hadoop

daemons execute as well as the configuration parameters for the Hadoop daemons.

HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN daemons are

ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be used, then the

MapReduce Job History Server will also be running. For large installations, these are generally

running on separate hosts.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

 Licensed to the Apache Software Foundation (ASF) under one or more

 contributor license agreements. See the NOTICE file distributed with

 this work for additional information regarding copyright ownership.

 The ASF licenses this file to You under the Apache License, Version 2.0

 (the "License"); you may not use this file except in compliance with

 the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software

 distributed under the License is distributed on an "AS IS" BASIS,

 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 See the License for the specific language governing permissions and

 limitations under the License.

-->

</configuration>

1）、配置NameNode（core-site.xml）

源码位置：hadoop-3.1.4-src\hadoop-common-project\hadoop-common\src\main\resources\core-

default.xml

[alanchan@server bigdata]$ cd /usr/local/bigdata/hadoop-3.1.4/etc/hadoop

[alanchan@server hadoop]$ ll

-rw-r--r-- 1 alanchan root 9213 11月 4 2020 capacity-scheduler.xml

-rw-r--r-- 1 alanchan root 1335 11月 4 2020 configuration.xsl

-rw-r--r-- 1 alanchan root 1940 11月 4 2020 container-executor.cfg

-rw-r--r-- 1 alanchan root  774 11月 4 2020 core-site.xml

-rw-r--r-- 1 alanchan root 3999 11月 4 2020 hadoop-env.cmd

-rw-r--r-- 1 alanchan root 15903 11月 4 2020 hadoop-env.sh

-rw-r--r-- 1 alanchan root 3323 11月 4 2020 hadoop-metrics2.properties

-rw-r--r-- 1 alanchan root 11392 11月 4 2020 hadoop-policy.xml

-rw-r--r-- 1 alanchan root 3414 11月 4 2020 hadoop-user-functions.sh.example

-rw-r--r-- 1 alanchan root  775 11月 4 2020 hdfs-site.xml

-rw-r--r-- 1 alanchan root 1484 11月 4 2020 httpfs-env.sh

-rw-r--r-- 1 alanchan root 1657 11月 4 2020 httpfs-log4j.properties

-rw-r--r-- 1 alanchan root  21 11月 4 2020 httpfs-signature.secret

-rw-r--r-- 1 alanchan root  620 11月 4 2020 httpfs-site.xml

-rw-r--r-- 1 alanchan root 3518 11月 4 2020 kms-acls.xml

-rw-r--r-- 1 alanchan root 1351 11月 4 2020 kms-env.sh

-rw-r--r-- 1 alanchan root 1747 11月 4 2020 kms-log4j.properties

-rw-r--r-- 1 alanchan root  682 11月 4 2020 kms-site.xml

-rw-r--r-- 1 alanchan root 14713 11月 4 2020 log4j.properties

-rw-r--r-- 1 alanchan root  951 11月 4 2020 mapred-env.cmd

-rw-r--r-- 1 alanchan root 1764 11月 4 2020 mapred-env.sh

-rw-r--r-- 1 alanchan root 4113 11月 4 2020 mapred-queues.xml.template

-rw-r--r-- 1 alanchan root  758 11月 4 2020 mapred-site.xml

drwxr-xr-x 2 alanchan root 4096 11月 4 2020 shellprofile.d

-rw-r--r-- 1 alanchan root 2316 11月 4 2020 ssl-client.xml.example

-rw-r--r-- 1 alanchan root 2697 11月 4 2020 ssl-server.xml.example

-rw-r--r-- 1 alanchan root 2642 11月 4 2020 user_ec_policies.xml.template

-rw-r--r-- 1 alanchan root  10 11月 4 2020 workers

-rw-r--r-- 1 alanchan root 2250 11月 4 2020 yarn-env.cmd

-rw-r--r-- 1 alanchan root 6272 11月 4 2020 yarn-env.sh

-rw-r--r-- 1 alanchan root 2591 11月 4 2020 yarnservice-log4j.properties

-rw-r--r-- 1 alanchan root  690 11月 4 2020 yarn-site.xml

#修改core-site.xml文件 /usr/local/bigdata/hadoop-3.1.4/etc/hadoop/core-site.xml

#  以下内容均为增加部分

<name>fs.defaultFS</name>

<value>hdfs://server1:8020</value>

<description>配置NameNode的URL</description>

</property>

<name>fs.default.name</name>

<value>hdfs://server1:8020</value>

<description>Deprecated. Use (fs.defaultFS) property instead</description>

</property>

<!-- hadoop本地数据存储目录 format时自动生成数据存储目录最好是放在本工程的外面，避免扩容时需

要剔除该部分内容，本例没有放在外面-->

剩余40页未读，继续阅读

一瓢一瓢的饮alanchanchn

粉丝: 3045
资源: 69

会员权益专享

"hadoop3.1.4发展史、特性、部署与验证详解"

评论0

会员权益专享

最新资源

"hadoop3.1.4发展史、特性、部署与验证详解"

评论0

hadoop-3.1.4.tar.zip

hadoop-3.1.4.tar.gz

hadoop 3.1.4

hadoop3.1.4 hadoop-core

Centos上下载Hadoop 3.1.4 tar.gz 压缩包

hadoop3.1.4 spark hive hbase flink集成环境搭建

在hadoop3.1.4版本下使用IDEA进行MapReduce编程

hadoop3.1.4下载

Could not locate Hadoop executable: P:\program files\hadoop-3.1.4\bin\winutils.exe

[root@slave1 ~]# scp -qr /opt/programs/hadoop-3.1.4 root@slave1:/opt/programs/ /etc/bashrc:行95: /opt/hadoop/hadoop/bin/hadoop: 没有那个文件或目录

下载 hadoop 历史版本

Hadoop为什么要联合部署

dolphinscheduler3.1.4部署

hadoop伪集群部署

Hadoop怎么在linux上面部署

2.3Hadoop完全分布式部署

hadoop为什么要集群部署

ambari部署hadoop集群

实验1Hadoop集群部署实验报告

hadoop在linux下安装部署

会员权益专享

最新资源