首页爬取文本文档提取关键字显示文中关键词出现的次数

爬取文本文档提取关键字显示文中关键词出现的次数

时间: 2024-05-01 07:16:04 浏览: 59

实现这个功能需要用到以下步骤： 1. 读取文本文件内容 2. 对文本进行分词处理，得到词语列表 3. 去除停用词（如“的”、“了”等），得到关键词列表 4. 统计关键词列表中每个关键词出现的次数 5. 显示统计结果下面是一个简单的Python代码实现： ```python import jieba from collections import Counter # 读取文件内容 with open('test.txt', 'r', encoding='utf-8') as f: content = f.read() # 分词处理 words = list(jieba.cut(content)) # 去除停用词 stopwords = ['的', '了', '是', '我', '你', '他', '她'] keywords = [word for word in words if word not in stopwords] # 统计关键词出现次数 counter = Counter(keywords) # 显示统计结果 for word, count in counter.most_common(): print(word, count) ``` 这个代码使用了jieba库进行中文分词，并使用了collections库中的Counter类对关键词进行统计。可以根据需要调整停用词列表和统计结果的显示方式。

阅读全文