文本词频统计,Hamlet英文词频统计以及《三国演义》人物出场统计。python
时间: 2023-12-31 17:05:45 浏览: 79
好的,下面是 Python 代码实现。
### 文本词频统计
```python
# 读取文本文件
with open('text.txt', 'r') as f:
text = f.read()
# 将文本转换为小写,并去除标点符号和换行符
text = text.lower().replace('\n', ' ').translate(str.maketrans('', '', string.punctuation))
# 将文本按空格分割为单词列表
words = text.split()
# 使用 Counter 统计单词出现次数
from collections import Counter
word_counts = Counter(words)
# 输出出现次数前 10 的单词及其出现次数
for word, count in word_counts.most_common(10):
print(f'{word}: {count}')
```
### Hamlet英文词频统计
```python
# 读取 Hamlet 文本文件
with open('hamlet.txt', 'r') as f:
text = f.read()
# 将文本转换为小写,并去除标点符号和换行符
text = text.lower().replace('\n', ' ').translate(str.maketrans('', '', string.punctuation))
# 将文本按空格分割为单词列表
words = text.split()
# 使用 Counter 统计单词出现次数
from collections import Counter
word_counts = Counter(words)
# 输出出现次数前 10 的单词及其出现次数
for word, count in word_counts.most_common(10):
print(f'{word}: {count}')
```
### 《三国演义》人物出场统计
```python
# 读取《三国演义》文本文件
with open('sanguo.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 将文本按章节分割为列表
chapters = text.split('##')
# 定义人物列表
characters = ['刘备', '关羽', '张飞', '赵云', '曹操', '孙权', '周瑜']
# 统计每个人物在每个章节中出现次数
character_counts = {character: [0] * len(chapters) for character in characters}
for i, chapter in enumerate(chapters):
for character in characters:
character_counts[character][i] = chapter.count(character)
# 输出每个人物在每个章节中出现次数
for character in characters:
print(f'{character}: {character_counts[character]}')
```
注意:以上代码仅为示例,具体实现需要根据实际情况进行调整。
阅读全文