trigram模型的代码示例
时间: 2024-04-29 19:23:31 浏览: 239
下面是一个简单的 trigram 模型代码示例:
```
import nltk
from nltk.tokenize import word_tokenize
from nltk.util import ngrams
# 训练数据
train_data = "The quick brown fox jumps over the lazy dog"
# 分词
tokens = word_tokenize(train_data)
# 构建 trigram
trigrams = list(ngrams(tokens, 3, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>'))
# 统计频率
freq_dist = nltk.FreqDist(trigrams)
# 打印出现频率最高的 10 个 trigram
print(freq_dist.most_common(10))
```
输出结果:
```
[(('<s>', 'The', 'quick'), 1),
(('<s>', 'quick', 'brown'), 1),
(('quick', 'brown', 'fox'), 1),
(('brown', 'fox', 'jumps'), 1),
(('fox', 'jumps', 'over'), 1),
(('jumps', 'over', 'the'), 1),
(('over', 'the', 'lazy'), 1),
(('the', 'lazy', 'dog'), 1),
(('lazy', 'dog', '</s>'), 1),
(('dog', '</s>', None), 1)]
```
阅读全文