hadoop词频统计
时间: 2023-12-14 10:33:36 浏览: 104
Hadoop词频统计是指使用Hadoop框架对大规模文本数据进行词频统计。下面是Hadoop词频统计的步骤:
1.准备数据:将需要进行词频统计的文本数据存储在HDFS中。
2.编写Mapper程序:Mapper程序的作用是将输入的文本数据进行分词,并将每个单词作为key,将出现次数作为value输出。
3.编写Reducer程序:Reducer程序的作用是将Mapper程序输出的key-value对进行合并,得到每个单词在文本中出现的总次数。
4.配置Job并提交任务:将Mapper和Reducer程序打包成jar包,并通过Hadoop提供的命令将任务提交到YARN集群中运行。
5.查看结果:任务运行完成后,可以通过Hadoop提供的命令将结果从HDFS中读取出来。
下面是一个简单的Hadoop词频统计的Mapper程序的Python代码:
```python
#!/usr/bin/env python
import sys
# 读取输入数据
for line in sys.stdin:
# 去除首尾空格
line = line.strip()
# 分词
words = line.split()
# 输出每个单词的出现次数
for word in words:
print('%s\t%s' % (word, 1))
```
下面是一个简单的Hadoop词频统计的Reducer程序的Python代码:
```python
#!/usr/bin/env python
import sys
# 初始化变量
current_word = None
current_count = 0
# 处理输入数据
for line in sys.stdin:
# 去除首尾空格
line = line.strip()
# 解析输入数据
word, count = line.split('\t', 1)
count = int(count)
# 如果当前单词与上一个单词不同,则输出上一个单词的统计结果
if current_word and current_word != word:
print('%s\t%s' % (current_word, current_count))
current_count = 0
# 更新当前单词的统计结果
current_word = word
current_count += count
# 输出最后一个单词的统计结果
if current_word:
print('%s\t%s' % (current_word, current_count))
```
阅读全文
相关推荐
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="zip"
data:image/s3,"s3://crabby-images/76d5d/76d5dcefc5ad32aa65e7d5f6e5b202b09b84830d" alt="rar"
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/67779/677799e3f0cb300878598cdf44af630e5aa7bdbb" alt="pdf"
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="-"
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="zip"
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""