torch下载IMDB数据集的代码
时间: 2023-11-19 10:05:08 浏览: 72
IMDB数据集 IMDB数据集
5星 · 资源好评率100%
以下是使用torch下载IMDB数据集的代码:
```
import torch
from torchtext.datasets import IMDB
# set up fields
TEXT = torchtext.data.Field(tokenize='spacy', lower=True)
LABEL = torchtext.data.LabelField(dtype=torch.float)
# make splits for data
train_data, test_data = IMDB.splits(TEXT, LABEL)
# build the vocabulary
MAX_VOCAB_SIZE = 25_000
TEXT.build_vocab(train_data,
max_size=MAX_VOCAB_SIZE,
vectors='glove.6B.100d',
unk_init=torch.Tensor.normal_)
LABEL.build_vocab(train_data)
# create the iterators
BATCH_SIZE = 64
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iterator, test_iterator = torchtext.data.BucketIterator.splits(
(train_data, test_data),
batch_size=BATCH_SIZE,
device=device)
# print information about the data
print(f'Number of training examples: {len(train_data)}')
print(f'Number of testing examples: {len(test_data)}')
print(f'Unique tokens in TEXT vocabulary: {len(TEXT.vocab)}')
print(f'Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}')
print(f'Number of batches in the train iterator: {len(train_iterator)}')
print(f'Number of batches in the test iterator: {len(test_iterator)}')
```
这段代码使用了torchtext库中的IMDB数据集,并通过构建词汇表和设置迭代器,将数据集准备好用于模型训练。注意,这里使用了预训练的GloVe词向量,需要提前下载并放置到指定路径中。
阅读全文