Python共现语义网络
时间: 2024-01-07 09:23:00 浏览: 260
共现语义网络是一种用于分析文本数据中关键词之间关系的方法。它通过计算关键词之间的共现次数来构建网络,其中共现次数越多,表示关键词之间的联系越紧密。Python提供了一些库和工具,可以用来构建和可视化共现语义网络。
以下是一个使用Python构建共现语义网络的示例:
```python
import nltk
from nltk import FreqDist
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import networkx as nx
import matplotlib.pyplot as plt
# 文本数据
text = "Python is a popular programming language. It is used for web development, data analysis, and machine learning."
# 分词和去除停用词
tokens = word_tokenize(text)
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
# 构建共现矩阵
co_matrix = nltk.FreqDist(nltk.bigrams(filtered_tokens))
# 创建有向图
graph = nx.DiGraph()
# 添加节点和边
for word, freq in co_matrix.items():
graph.add_node(word[0])
graph.add_node(word[1])
graph.add_edge(word[0], word[1], weight=freq)
# 绘制网络图
pos = nx.spring_layout(graph)
nx.draw_networkx_nodes(graph, pos, node_size=200, node_color='lightblue')
nx.draw_networkx_edges(graph, pos, width=1, alpha=0.5, edge_color='gray')
nx.draw_networkx_labels(graph, pos, font_size=10, font_color='black')
plt.axis('off')
plt.show()
```
这段代码使用NLTK库进行分词和去除停用词,然后使用FreqDist计算共现矩阵。接下来,使用NetworkX库创建有向图,并添加节点和边。最后,使用Matplotlib库绘制共现语义网络图。
阅读全文