水浒传词频统计python
时间: 2023-12-21 11:32:08 浏览: 171
以下是使用Python进行《水浒传》词频统计的示例代码:
```python
import jieba
from collections import Counter
# 读取文本文件
with open('input水浒传.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词
words = jieba.cut(text)
# 去除停用词
stopwords = ['的', '了', '和', '是', '在', '他', '她', '它'] # 停用词列表,可以根据实际情况进行修改
filtered_words = [word for word in words if word not in stopwords]
# 统计词频
word_count = Counter(filtered_words)
# 输出词频结果
for word, count in word_count.most_common(10): # 只输出频率最高的10个词语
print(word, count)
```
这段代码首先使用jieba库对文本进行分词,然后去除停用词,最后使用Counter类统计词频。最后,我们输出频率最高的10个词语及其出现次数。
相关问题
水浒传词频统计python123
```python
import jieba
from collections import Counter
import matplotlib.pyplot as plt
from wordcloud import WordCloud
# 读取《水浒传》文本
with open('shuihuzhuan.txt', 'r', encoding='utf-8') as file:
text = file.read()
# 使用jieba分词
words = jieba.lcut(text)
# 读取停用词库
with open('stopwords.txt', 'r', encoding='utf-8') as file:
stopwords = file.read().splitlines()
# 进行词频统计
word_count = Counter([word for word in words if word not in stopwords])
# 生成词云
wordcloud = WordCloud(font_path="simhei.ttf").generate_from_frequencies(word_count)
# 可视化词云
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
```
python水浒传词频统计
使用Python进行《水浒传》的词频统计是一种常见的文本分析方法。可以通过分词工具jieba对文本进行分词处理,然后统计每个词语出现的频率。
在给定的代码中,使用了jieba库对《水浒传》文本进行了分词处理,然后使用字典counts记录词语出现的频率。在统计过程中,排除了一些无效词语,比如“两个”、“一个”、“只见”等,同时对于相同对象的不同称谓进行了同一化处理。
最后,按照词语的频率进行排序,并输出前十个高频词汇。
因此,使用Python进行《水浒传》词频统计的代码如下所示:
```python
import jieba
excludes = {"两个","一个","只见","如何","那里","哥哥","说道","军马","头领","众人","这里","兄弟","梁山泊","出来","小人","今日","这个","先锋","三个","因此","人马","问道","起来","便是","妇人","好汉","不是","不知","不曾","只是","如此","次日","我们","不得","如今","看时","不敢","来到","且说","一面","只得","山寨","原来","将军","却是"}
txt = open("水浒传.txt","r",encoding='gb18030', errors='ignore').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "宋江道" or word == "宋江":
rword = "宋江"
else:
rword = word
counts[rword = counts.get(rword, 0) + 1
for word in excludes:
del(counts[word])
items = list(counts.items())
items.sort(key=lambda x:x<span class="em">1</span><span class="em">2</span>
阅读全文