载红楼梦TXT电子书,首先分词并统计人物出现次数绘制柱状图,然后,建立人物关系表,构建人物关系图并可视化显示。
时间: 2024-03-18 20:40:11 浏览: 177
好的,让我来解答你的问题。
首先,为了进行分词和统计人物出现次数,我们需要使用一些自然语言处理工具,比如jieba分词库和Python中的collections模块。以下是一些代码示例:
```python
import jieba
from collections import Counter
# 读取红楼梦txt文件
with open('HongLouMeng.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词并统计词频
words = jieba.lcut(text)
word_count = Counter(words)
# 统计人物出现次数
person_count = {}
for word in word_count:
if len(word) == 2 and word[0] != '\n':
if word in person_count:
person_count[word] += word_count[word]
else:
person_count[word] = word_count[word]
# 绘制柱状图
import matplotlib.pyplot as plt
top_n = 20 # 取出现次数前20的人物
top_persons = sorted(person_count.items(), key=lambda x: x[1], reverse=True)[:top_n]
names, counts = zip(*top_persons)
plt.bar(names, counts)
plt.xticks(rotation=90)
plt.show()
```
接下来是建立人物关系表和构建人物关系图的部分。我们可以使用Python中的networkx库来进行图的构建和可视化。以下是一些代码示例:
```python
import networkx as nx
# 读取红楼梦txt文件
with open('HongLouMeng.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词并筛选人名
words = jieba.lcut(text)
persons = []
for i in range(len(words)-1):
if len(words[i]) == 2 and len(words[i+1]) == 2:
if words[i][0] != '\n' and words[i+1][0] != '\n':
persons.append((words[i], words[i+1]))
# 建立人物关系表
person_table = {}
for person in persons:
if person[0] not in person_table:
person_table[person[0]] = {}
if person[1] not in person_table[person[0]]:
person_table[person[0]][person[1]] = 0
person_table[person[0]][person[1]] += 1
# 构建人物关系图
G = nx.Graph()
for person in person_table:
G.add_node(person)
for connection in person_table[person]:
if person_table[person][connection] > 10: # 设定阈值过滤掉出现次数较少的关系
G.add_edge(person, connection, weight=person_table[person][connection])
# 可视化显示人物关系图
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 16))
pos = nx.spring_layout(G, k=0.2)
nx.draw_networkx_nodes(G, pos, node_size=10)
nx.draw_networkx_edges(G, pos, width=[d['weight']/100 for (u, v, d) in G.edges(data=True)])
nx.draw_networkx_labels(G, pos, font_size=12, font_family='Microsoft YaHei')
plt.axis('off')
plt.show()
```
以上就是将红楼梦进行分词、统计人物出现次数、绘制柱状图、建立人物关系表、构建人物关系图并可视化显示的完整过程。
阅读全文