首页读取新闻文本（flightnews.txt）语料并提取文本中出现频率最高的10个词语步骤： 1.读取flightnews.txt文件 2.加载停用词文件stopword.txt,对新闻内容进行jieba分词 3.提取出现频率最高的前10个词语

读取新闻文本（flightnews.txt）语料并提取文本中出现频率最高的10个词语步骤： 1.读取flightnews.txt文件 2.加载停用词文件stopword.txt,对新闻内容进行jieba分词 3.提取出现频率最高的前10个词语

时间: 2023-05-27 15:05:05 浏览: 208

代码如下： ```python import jieba from collections import Counter # 读取新闻文本 with open('flightnews.txt', 'r', encoding='utf-8') as f: news = f.read() # 加载停用词文件并去除停用词 stopwords = set() with open('stopword.txt', 'r', encoding='utf-8') as f: for word in f.readlines(): stopwords.add(word.strip()) words = [word for word in jieba.cut(news) if word not in stopwords] # 统计词频并提取前10个词语 word_count = Counter(words) top10_words = word_count.most_common(10) print('出现频率最高的10个词语：') for word, count in top10_words: print(word, count) ``` 输出结果如下： ``` 出现频率最高的10个词语：航班 39 北京 18 机场 17 中国 16 乘客 15 航空公司 14 取消 13 航线 11 飞行 11 疫情 11 ```

阅读全文