以下是一份Ubuntu上PyTorch网络编程的简要教程:
首先安装Python和PyTorch,可使用pip或conda安装。以pip为例:
sudo apt update sudo apt install python3 python3-pip pip3 install torch torchvision torchaudio
若需GPU支持,安装CUDA和对应版本的PyTorch。
采用模块化设计,将网络不同部分封装成独立类或函数。可参考ResNet等经典架构,根据数据特性和任务需求设计。例如定义一个简单的全连接网络:
import torch.nn as nn import torch.nn.functional as F class SimpleNet(nn.Module): def __init__(self): super(SimpleNet, self).__init__() self.fc1 = nn.Linear(784, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = x.view(-1, 784) x = F.relu(self.fc1(x)) x = self.fc2(x) return x
编写训练和验证循环,使用交叉熵损失函数和Adam优化器:
import torch.optim as optim model = SimpleNet() criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) # 假设已有train_loader和val_loader for epoch in range(num_epochs): model.train() for inputs, labels in train_loader: optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() model.eval() with torch.no_grad(): total = 0 correct = 0 for inputs, labels in val_loader: outputs = model(inputs) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print(f'Epoch [{epoch+1}/{num_epochs}], Accuracy: {100 * correct / total:.2f}%')
若需分布式训练,可使用torch.distributed
模块。安装NCCL库(用于GPU通信),配置环境变量后编写分布式训练代码,使用mpirun
或torch.distributed.launch
启动训练。
使用torch.save
和torch.load
函数保存和加载模型:
# 保存模型 torch.save(model.state_dict(), 'model.pth') # 加载模型 model = SimpleNet() model.load_state_dict(torch.load('model.pth'))