python代码使用crf做一个购买火车票的模型,测试集训练集都用BIO,出发地第一个字B-LOC,最后一个I-LOC,目的地也是这样标记的标记了,请你做个模型算出precision recall f1-score support
时间: 2024-03-18 09:38:37 浏览: 22
好的,以下是Python代码实现CRF模型训练和测试,并计算precision、recall、f1-score和support的示例。代码中使用的是python-crfsuite库。
```
import pycrfsuite
# 训练集和测试集数据
train_data = [[('Beijing', 'B-LOC'), ('to', 'O'), ('Shanghai', 'B-LOC')],
[('I', 'O'), ('want', 'O'), ('to', 'O'), ('buy', 'O'), ('a', 'O'), ('ticket', 'O'), ('from', 'O'), ('Shanghai', 'B-LOC'), ('to', 'O'), ('Beijing', 'B-LOC')]]
test_data = [[('I', 'O'), ('want', 'O'), ('to', 'O'), ('go', 'O'), ('from', 'O'), ('Beijing', 'B-LOC'), ('to', 'O'), ('Shanghai', 'B-LOC')],
[('I', 'O'), ('need', 'O'), ('a', 'O'), ('train', 'O'), ('ticket', 'O'), ('from', 'O'), ('Shanghai', 'B-LOC'), ('to', 'O'), ('Beijing', 'B-LOC')]]
# 特征模板
def word2features(sent, i):
word = sent[i][0]
features = {
'bias': 1.0,
'word.lower()': word.lower(),
'word[-3:]': word[-3:],
'word[-2:]': word[-2:],
'word.isupper()': word.isupper(),
'word.istitle()': word.istitle(),
'word.isdigit()': word.isdigit(),
}
if i > 0:
word1 = sent[i-1][0]
features.update({
'-1:word.lower()': word1.lower(),
'-1:word.istitle()': word1.istitle(),
'-1:word.isupper()': word1.isupper(),
})
else:
features['BOS'] = True
if i < len(sent)-1:
word1 = sent[i+1][0]
features.update({
'+1:word.lower()': word1.lower(),
'+1:word.istitle()': word1.istitle(),
'+1:word.isupper()': word1.isupper(),
})
else:
features['EOS'] = True
return features
# 特征提取
def sent2features(sent):
return [word2features(sent, i) for i in range(len(sent))]
# 标签提取
def sent2labels(sent):
return [label for token, label in sent]
# 训练模型
trainer = pycrfsuite.Trainer(verbose=False)
for xseq, yseq in zip(train_data, train_labels):
trainer.append(sent2features(xseq), sent2labels(yseq))
trainer.set_params({
'c1': 1.0,
'c2': 1e-3,
'max_iterations': 50,
'feature.possible_transitions': True
})
trainer.train('crf.model')
# 测试模型
tagger = pycrfsuite.Tagger()
tagger.open('crf.model')
y_pred = [tagger.tag(sent2features(seq)) for seq in test_data]
y_true = [sent2labels(seq) for seq in test_data]
# 计算指标
from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
```
运行代码后,会输出每个标记的precision、recall、f1-score和support。其中,每个标记的precision、recall、f1-score和support都会单独列出来,方便您进行分析和评估模型的性能。