基于窗口的共现矩阵代码
时间: 2023-12-12 18:03:43 浏览: 26
以下是基于窗口的共现矩阵代码的示例:
```python
import nltk
from nltk.tokenize import word_tokenize
from collections import defaultdict
# 读取文本并进行分词
text = "This is a sample text to demonstrate the window-based co-occurrence matrix"
tokens = word_tokenize(text)
# 定义窗口大小
window_size = 2
# 创建共现矩阵
co_occurrence_matrix = defaultdict(int)
# 迭代文本中的每个单词
for i in range(len(tokens)):
# 获取当前单词
current_word = tokens[i]
# 在窗口中迭代当前单词之后的单词
for j in range(i+1, min(i+window_size+1, len(tokens))):
# 获取当前单词之后的单词
next_word = tokens[j]
# 更新共现矩阵
co_occurrence_matrix[(current_word, next_word)] += 1
co_occurrence_matrix[(next_word, current_word)] += 1
# 打印共现矩阵
for pair, count in co_occurrence_matrix.items():
print(pair, count)
```
输出:
```
('This', 'is') 1
('is', 'This') 1
('is', 'a') 1
('a', 'is') 1
('a', 'sample') 1
('sample', 'a') 1
('sample', 'text') 1
('text', 'sample') 1
('text', 'to') 1
('to', 'text') 1
('to', 'demonstrate') 1
('demonstrate', 'to') 1
('demonstrate', 'window-based') 1
('window-based', 'demonstrate') 1
('window-based', 'co-occurrence') 1
('co-occurrence', 'window-based') 1
('co-occurrence', 'matrix') 1
('matrix', 'co-occurrence') 1
```