首页使用Python对《西游记》进行分词、停用词过滤和词频统计

使用Python对《西游记》进行分词、停用词过滤和词频统计

时间: 2024-03-16 16:44:37 浏览: 332

基于Python实现词典的分词方法或统计分词方法【100010996】

1. 实现基于词典的分词方法和统计分词方法：两类方法中各实现一种即可； 2. 对分词结果进行词性标注，也可以在分词的同时进行词性标注； 3. 对分词及词性标注结果进行评价，包括4个指标：正确率、召回率、F1值和效率。

好的，下面是完整的代码： ```python import jieba from collections import Counter # 打开《西游记》文本 with open('journey_to_the_west.txt', 'r', encoding='utf-8') as f: text = f.read() # 分词 words = jieba.lcut(text) # 去除停用词 stopwords = [] with open('stopwords.txt', 'r', encoding='utf-8') as f: for line in f: stopwords.append(line.strip()) filtered_words = [] for word in words: if word not in stopwords and word != '\n': filtered_words.append(word) # 统计词频 word_count = Counter(filtered_words) # 输出前20个出现次数最多的词 print(word_count.most_common(20)) ``` 这段代码会输出前20个出现次数最多的词及其出现次数。需要注意的是，需要先下载停用词表，将其保存为`stopwords.txt`文件。

阅读全文