用 pytorch 写一个动作识别的AI程序
时间: 2024-05-29 18:03:43 浏览: 156
好的,下面是一个简单的基于PyTorch的动作识别AI程序的示例,使用的是2D卷积神经网络(CNN)。
1. 数据准备
首先,需要准备数据集。我这里使用UCF101数据集作为示例,它是一个包含101个不同类别的视频数据集。你可以从UCF101官网上下载数据集。下载完成后,将其解压到一个目录下,并且按照以下方式组织:
```
data/
train/
class1/
video1.avi
video2.avi
...
class2/
video1.avi
video2.avi
...
...
test/
class1/
video1.avi
video2.avi
...
class2/
video1.avi
video2.avi
...
...
```
其中,train和test目录下分别包含训练集和测试集,class1,class2等目录下分别包含不同类别的视频文件。
2. 数据预处理
接下来,需要对数据进行预处理。首先,需要将视频文件转换为图像序列。我们可以使用OpenCV中的cv2.VideoCapture()函数读取视频文件,并且每隔几帧保存一张图像。我这里将每个视频文件转换为10张图像。
```python
import cv2
import os
def video_to_frames(video_path, frames_path, num_frames=10):
if not os.path.exists(frames_path):
os.makedirs(frames_path)
cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
frame_idxs = list(range(1, frame_count+1, int(frame_count/num_frames)))
for idx in frame_idxs:
cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
ret, frame = cap.read()
if not ret:
break
cv2.imwrite(os.path.join(frames_path, f"{idx}.jpg"), frame)
cap.release()
```
接下来,需要对图像进行预处理。我们可以使用PyTorch中的transforms模块来进行数据增强和标准化。这里我使用了随机裁剪、随机翻转、随机旋转等数据增强方法。
```python
import torchvision.transforms as transforms
def get_transforms():
return transforms.Compose([
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
```
3. 模型定义
接下来,需要定义神经网络模型。这里我使用了一个简单的2D卷积神经网络(CNN)。
```python
import torch.nn as nn
class ActionRecognitionCNN(nn.Module):
def __init__(self, num_classes):
super(ActionRecognitionCNN, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(256, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.fc = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.conv(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
```
4. 训练模型
接下来,需要定义训练函数和测试函数。这里我使用了PyTorch中的DataLoader来加载数据,并且使用了交叉熵损失函数和随机梯度下降优化器。
```python
import torch.optim as optim
def train(model, train_loader, criterion, optimizer, device):
model.train()
running_loss = 0.0
for i, (inputs, labels) in enumerate(train_loader):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
return running_loss / len(train_loader)
def test(model, test_loader, criterion, device):
model.eval()
correct = 0
running_loss = 0.0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
running_loss += loss.item()
pred = outputs.argmax(dim=1, keepdim=True)
correct += pred.eq(labels.view_as(pred)).sum().item()
return correct / len(test_loader.dataset), running_loss / len(test_loader)
```
5. 程序入口
最后,将上述函数组合起来,定义程序入口。
```python
import os
import torch
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
from tqdm import tqdm
def main():
data_path = "data"
train_path = os.path.join(data_path, "train")
test_path = os.path.join(data_path, "test")
num_classes = len(os.listdir(train_path))
train_dataset = ImageFolder(train_path, transform=get_transforms())
test_dataset = ImageFolder(test_path, transform=get_transforms())
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=4)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ActionRecognitionCNN(num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
for epoch in range(10):
train_loss = train(model, train_loader, criterion, optimizer, device)
test_acc, test_loss = test(model, test_loader, criterion, device)
tqdm.write(f"Epoch {epoch+1}: Train Loss={train_loss:.4f}, Test Acc={test_acc:.4f}, Test Loss={test_loss:.4f}")
if __name__ == "__main__":
main()
```
运行程序,就可以开始训练动作识别AI模型了。
阅读全文