1 train_iter, src_vocab, tgt_vocab = load_data_nmt(batch_size=2, num_steps=8) 2 for X, X_valid_len, Y, Y_valid_len in train_iter: 3 print('X:', X.type(torch.int32)) Cell In[62], line 5, in load_data_nmt(batch_size, num_steps, num_examples) 3 """返回翻译数据集的迭代器和词表""" 4 text = preprocess_nmt(read_data_nmt()) ----> 5 source, target = tokenize_nmt(text, num_examples) 6 src_vocab = d2l.Vocab(source, min_freq=2, 7 reserved_tokens=['<pad>', '<bos>', '<eos>']) 8 tgt_vocab = d2l.Vocab(target, min_freq=2, 9 reserved_tokens=['<pad>', '<bos>', '<eos>']) Cell In[56], line 11, in tokenize_nmt(lines, token) 9 return [list(line) for line in lines] 10 else: ---> 11 print('错误:未知词元类型:' + token) TypeError: can only concatenate str (not "int") to str
时间: 2024-03-01 17:53:41 浏览: 179
这个错误是因为在`tokenize_nmt`函数的`else`分支中,使用`print`语句时,将字符串和整数相加,导致类型不匹配而出错。建议将`print`语句改为输出字符串和整数的组合,例如使用字符串格式化方法将其组合为一个字符串,示例如下:
```python
print('错误:未知词元类型:%s' % token)
```
或者使用`format`方法:
```python
print('错误:未知词元类型:{}'.format(token))
```
相关问题
train_iter, src_vocab, tgt_vocab = d2l.load_data_nmt(batch_size, num_steps) 这是什么意思
这行代码的意思是使用d2l模块中的load_data_nmt函数来加载训练数据集,并将其划分为大小为batch_size、长度为num_steps的批次。函数还返回源语言和目标语言的词汇表,用于后续的模型训练和推理。
from torch.utils import data def load_data_nmt(batch_size, num_steps, num_examples=600): """返回翻译数据集的迭代器和词表""" with open(d2l.download('cmn-eng'), 'r') as f: lines = f.readlines() return lines num_lines = min(num_examples, len(raw_text.split('\n'))) text = raw_text.split('\n')[:num_lines] src_vocab, tgt_vocab = d2l.load_vocab('cmn-eng') src_iter= d2l.build_data_iter(text, src_vocab, tgt_vocab, batch_size, num_steps) return src_iter, src_vocab, tgt_vocab train_iter, src_vocab, tgt_vocab = load_data_nmt(batch_size=2, num_steps=8) for X, X_valid_len, Y, Y_valid_len in train_iter: print('X:', X.type(torch.int32)) print('X的有效长度:', X_valid_len) print('Y:', Y.type(torch.int32)) print('Y的有效长度:', Y_valid_len) break 报这个错误ValueError: not enough values to unpack (expected 3, got 2)
问题出在这一行代码上:
```
src_iter= d2l.build_data_iter(text, src_vocab, tgt_vocab, batch_size, num_steps)
```
这个函数返回的是一个迭代器对象,但是在你的 load_data_nmt 函数中却没有将其打包到元组中一起返回,导致在使用元组解包时数量不足的错误。你需要修改 load_data_nmt 函数,将 src_iter 也添加到返回的元组中,如下所示:
```
def load_data_nmt(batch_size, num_steps, num_examples=600):
"""返回翻译数据集的迭代器和词表"""
with open(d2l.download('cmn-eng'), 'r') as f:
lines = f.readlines()
num_lines = min(num_examples, len(lines))
lines = lines[:num_lines]
src_vocab, tgt_vocab = d2l.load_vocab('cmn-eng')
src_iter, tgt_iter = d2l.build_data_iter(lines, src_vocab, tgt_vocab,
batch_size, num_steps)
return src_iter, tgt_iter, src_vocab, tgt_vocab
```
这样修改后,你的代码就可以正确运行了。
阅读全文