pytorch实现在VOC2012数据集中的dog类中随机采样图像块,,在图像块中挖掘同时具有判别性和频繁性的一类图片,并将图片可视化,挖掘算法采用LeNet
时间: 2023-12-19 07:04:14 浏览: 73
ConvNeXt算法实现pytorch框架下的图像分类
5星 · 资源好评率100%
首先,我们需要导入相关的库和数据集。假设你已经下载了VOC2012数据集并解压到了本地路径`/path/to/VOC2012`。
```python
import os
import random
import numpy as np
import torch
import torchvision.transforms as transforms
from PIL import Image
from torch.utils.data import Dataset, DataLoader
# 定义数据集类
class VOC2012Dataset(Dataset):
def __init__(self, root_dir, transform=None):
self.root_dir = root_dir
self.transform = transform
self.imgs = os.listdir(os.path.join(root_dir, "JPEGImages"))
def __len__(self):
return len(self.imgs)
def __getitem__(self, idx):
img_name = self.imgs[idx]
img_path = os.path.join(self.root_dir, "JPEGImages", img_name)
img = Image.open(img_path).convert("RGB")
if self.transform:
img = self.transform(img)
return img
# 定义变换操作
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
])
# 加载数据集
dataset = VOC2012Dataset("/path/to/VOC2012", transform=transform)
dataloader = DataLoader(dataset, batch_size=1, shuffle=True)
```
接下来,我们可以定义LeNet模型和挖掘算法。
```python
# 定义LeNet模型
class LeNet(torch.nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = torch.nn.Conv2d(3, 6, kernel_size=5, stride=1)
self.pool1 = torch.nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = torch.nn.Conv2d(6, 16, kernel_size=5, stride=1)
self.pool2 = torch.nn.MaxPool2d(kernel_size=2, stride=2)
self.fc1 = torch.nn.Linear(16 * 61 * 61, 120)
self.fc2 = torch.nn.Linear(120, 84)
self.fc3 = torch.nn.Linear(84, 10)
def forward(self, x):
x = self.conv1(x)
x = torch.nn.functional.relu(x)
x = self.pool1(x)
x = self.conv2(x)
x = torch.nn.functional.relu(x)
x = self.pool2(x)
x = x.view(-1, 16 * 61 * 61)
x = self.fc1(x)
x = torch.nn.functional.relu(x)
x = self.fc2(x)
x = torch.nn.functional.relu(x)
x = self.fc3(x)
return x
# 定义挖掘算法
def mine_images(model, images, num_samples=1000, threshold=0.5):
# 将模型设置为评估模式
model.eval()
# 使用随机梯度下降算法随机采样num_samples个图像块
samples = []
for image in images:
_, h, w = image.shape
for i in range(num_samples):
x1 = random.randint(0, w - 64)
y1 = random.randint(0, h - 64)
x2 = x1 + 64
y2 = y1 + 64
sample = image[:, y1:y2, x1:x2]
samples.append(sample)
samples = torch.stack(samples)
# 使用模型计算每个图像块的特征向量
with torch.no_grad():
features = model(samples).cpu().numpy()
# 计算每个特征向量与其他特征向量的相似度
similarities = np.dot(features, features.T)
# 对于每个图像块,计算与其他图像块的相似度平均值
scores = []
for i in range(num_samples * len(images)):
score = np.mean(similarities[i, :])
scores.append(score)
# 根据阈值选择得分高于阈值的图像块
selected_samples = [samples[i] for i in range(num_samples * len(images)) if scores[i] > threshold]
return selected_samples
```
最后,我们可以使用上面定义的LeNet模型和挖掘算法来挖掘VOC2012数据集中的dog类图片。假设你已经训练好了LeNet模型并保存在了本地路径`/path/to/lenet.pth`。
```python
# 加载LeNet模型
model = LeNet()
model.load_state_dict(torch.load("/path/to/lenet.pth"))
# 挖掘dog类图片
dog_images = []
for images in dataloader:
for image in images:
if image.shape[0] == 3:
output = model(image.unsqueeze(0))
_, predicted = torch.max(output.data, 1)
if predicted.item() == 5: # dog类的类别编号为5
dog_images.append(image)
if len(dog_images) == 10: # 挖掘10张dog类图片
break
if len(dog_images) == 10:
break
# 挖掘具有判别性和频繁性的一类图片
selected_images = mine_images(model, dog_images)
# 可视化图片
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10, 10))
for i in range(len(selected_images)):
ax = fig.add_subplot(5, 5, i + 1)
ax.imshow(selected_images[i].permute(1, 2, 0))
plt.show()
```
运行上面的代码,你应该可以看到10张dog类图片中的具有判别性和频繁性的一类图片。
阅读全文