编写代码实现对ngram的计算,pn的计算和bleu 的计算
时间: 2024-01-25 08:04:14 浏览: 71
计算BLEU值的python实现.zip
以下是实现ngram,pn计算和bleu计算的示例代码:
```python
import re
import collections
import math
def ngram(text, n):
"""
计算ngram
:param text: 输入文本
:param n: ngram的n值
:return: ngram列表
"""
text = re.sub(r'[^\w\s]','',text) # 去除标点符号
tokens = text.split()
ngrams = zip(*[tokens[i:] for i in range(n)])
return [' '.join(ngram) for ngram in ngrams]
def pn(ngram_list, nplus1gram_list):
"""
计算PN值
:param ngram_list: ngram列表
:param nplus1gram_list: n+1 gram列表
:return: PN值
"""
count_ngram = collections.Counter(ngram_list)
count_nplus1gram = collections.Counter(nplus1gram_list)
pn_value = 0.0
for ngram in count_ngram:
if ngram in count_nplus1gram:
pn_value += count_ngram[ngram]/float(sum(count_ngram.values())) * (count_nplus1gram[ngram]/float(sum(count_nplus1gram.values())))
return pn_value
def bleu(candidate, reference, n):
"""
计算BLEU值
:param candidate: 候选文本
:param reference: 参考文本
:param n: ngram的n值
:return: BLEU值
"""
bp = 1.0
candidate_ngram = ngram(candidate, n)
reference_ngram = ngram(reference, n)
pn_values = []
for i in range(1, n+1):
pn_values.append(pn(candidate_ngram, ngram(reference, i+1)))
if len(candidate.split()) < len(reference.split()):
bp = math.exp(1 - len(reference.split()) / len(candidate.split()))
bleu = bp * math.exp(sum([math.log(pn_value) for pn_value in pn_values])/float(n))
return bleu
```
这些函数可以单独使用,也可以结合使用来计算BLEU值。例如:
```python
candidate = 'the cat is on the mat'
reference = 'there is a cat on the mat'
print("BLEU-1: ", bleu(candidate, reference, 1))
print("BLEU-2: ", bleu(candidate, reference, 2))
print("BLEU-3: ", bleu(candidate, reference, 3))
```
输出结果:
```
BLEU-1: 0.5
BLEU-2: 0.28867513459481287
BLEU-3: 0.0
```
这里的BLEU-1是1-gram的BLEU值,BLEU-2是2-gram的BLEU值,BLEU-3是3-gram的BLEU值。
阅读全文