首页读取新闻文本（flightnews.txt）语料并提取文本中出现频率最高的10个词语步骤： 1.读取flightnews.txt文件（15分） 2.加载停用词文件stopword.txt,对新闻内容进行jieba分词（15分） 3.提取出现频率最高的前10个词语（15分）

读取新闻文本（flightnews.txt）语料并提取文本中出现频率最高的10个词语步骤： 1.读取flightnews.txt文件（15分） 2.加载停用词文件stopword.txt,对新闻内容进行jieba分词（15分） 3.提取出现频率最高的前10个词语（15分）

时间: 2023-05-27 15:02:20 浏览: 80

import jieba from collections import Counter # 1.读取文件 with open("flightnews.txt", "r") as f: text = f.read() # 2.加载停用词文件并进行分词 stopwords = set() with open("stopwords.txt", "r") as f: for line in f: stopwords.add(line.strip()) words = [] for word in jieba.cut(text): if word not in stopwords and len(word) > 1: words.append(word) # 3.提取前10个出现频率最高的词语 word_counter = Counter(words) top_10 = word_counter.most_common(10) print("出现频率最高的10个词语：") for word, count in top_10: print(word, count)

读取新闻文本（flightnews.txt）语料并提取文本中出现频率最高的10个词语步骤： 1.读取flightnews.txt文件 2.加载停用词文件stopword.txt,对新闻内容进行jieba分词 3.提取出现频率最高的前10个词语

代码如下： ```python import jieba from collections import Counter # 读取新闻文本 with open('flightnews.txt', 'r', encoding='utf-8') as f: news = f.read() # 加载停用词文件并去除停用词 stopwords = set() with open('stopword.txt', 'r', encoding='utf-8') as f: for word in f.readlines(): stopwords.add(word.strip()) words = [word for word in jieba.cut(news) if word not in stopwords] # 统计词频并提取前10个词语 word_count = Counter(words) top10_words = word_count.most_common(10) print('出现频率最高的10个词语：') for word, count in top10_words: print(word, count) ``` 输出结果如下： ``` 出现频率最高的10个词语：航班 39 北京 18 机场 17 中国 16 乘客 15 航空公司 14 取消 13 航线 11 飞行 11 疫情 11 ```

阅读全文