如何将txt的文本文件导入python中,文件地址为:C:\Users\Admin\Desktop\jiba.txt,并且基于jieba实现中文的词频统计,并构建有20个节点的知识图谱
时间: 2024-01-19 18:46:45 浏览: 62
用Python中的jieba对文章进行词频统计
可以使用Python中的open()函数打开文本文件,并使用jieba模块对中文文本进行分词和词频统计。然后可以使用第三方库networkx构建知识图谱,最后使用matplotlib绘制图形。
以下是示例代码:
```python
import jieba
import networkx as nx
import matplotlib.pyplot as plt
# 打开文本文件并读取内容
with open('C:\\Users\\Admin\\Desktop\\jiba.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 使用jieba分词
words = jieba.lcut(text)
word_freq = {}
# 统计词频
for word in words:
if len(word) > 1:
if word in word_freq:
word_freq[word] += 1
else:
word_freq[word] = 1
# 构建知识图谱
graph = nx.Graph()
top_words = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)[:20]
for word, freq in top_words:
graph.add_node(word, freq=freq)
for i, (word1, freq1) in enumerate(top_words):
for word2, freq2 in top_words[i+1:]:
weight = len(set(word1) & set(word2))
if weight > 0:
graph.add_edge(word1, word2, weight=weight)
# 绘制图形
pos = nx.spring_layout(graph, k=0.5, iterations=50)
nx.draw(graph, pos, node_size=[d['freq']*100 for (n, d) in graph.nodes(data=True)], with_labels=True,
font_size=12, font_family='Microsoft YaHei', alpha=0.8, width=[d['weight']*0.1 for (u, v, d) in graph.edges(data=True)])
plt.axis('off')
plt.show()
```
上述代码中使用了spring_layout布局和不同的节点大小和边宽度来表示词频和联系紧密程度。您也可以根据需要调整这些参数。
阅读全文