{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "可以使用`torch.nn`包构建神经网络。\n", "\n", "现在你已经对`autograd`有了基本的了解,`nn`依赖`autograd`来定义模型并执行微分。一个`nn.Module`包含层和一个`forward(input)`方法用以返回`output`。\n", "\n", "例如,请看以下对图像分类网络:\n", "\n", "![](https://pytorch.org/tutorials/_images/mnist.png)\n", "\n", "这是一个简单的前馈网络。它获取输入,将其一层又一层地馈入,然后最终给出输出。\n", "\n", "神经网络的典型训练过程如下:\n", "\n", "定义具有一些可学习参数(或权重)的神经网络\n", "遍历输入数据集\n", "通过网络处理输入\n", "计算损失(输出正确的距离有多远)\n", "将梯度传播回网络参数\n", "通常使用简单的更新规则来更新网络的权重: weight = weight - learning_rate * gradient" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 定义网络\n", "我们来定义上面的网络:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Net(\n", " (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))\n", " (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))\n", " (fc1): Linear(in_features=576, out_features=120, bias=True)\n", " (fc2): Linear(in_features=120, out_features=84, bias=True)\n", " (fc3): Linear(in_features=84, out_features=10, bias=True)\n", ")\n" ] } ], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "\n", "class Net(nn.Module):\n", " \n", " def __init__(self):\n", " super(Net, self).__init__() # 多重继承\n", " # 一个图像输入channel,六个输出channel,3*3卷积核\n", " self.conv1 = nn.Conv2d(1,6,3)\n", " self.conv2 = nn.Conv2d(6,16,3)\n", " # 线性变换: y = Wx + b\n", " self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension\n", " self.fc2 = nn.Linear(120, 84)\n", " self.fc3 = nn.Linear(84, 10)\n", " \n", " def forward(self,x):\n", " # 经过卷积层1、relu激活函数、max_pool降采样\n", " x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))\n", " # 经过卷积层2、relu激活函数、max_pool降采样\n", " # 如果降采样尺寸为正方形,也可以只写一个数字\n", " x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n", " # resize\n", " x = x.view(-1, self.num_flat_features(x))\n", " # 线性变换\n", " x = F.relu(self.fc1(x))\n", " x = F.relu(self.fc2(x))\n", " x = self.fc3(x)\n", " return x\n", " \n", " def num_flat_features(self, x):\n", " size = x.size()[1:] # all dimensions except the batch dimension\n", " num_features = 1\n", " for s in size:\n", " num_features *= s\n", " return num_features\n", "\n", "\n", "net = Net()\n", "print(net)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "你刚刚已经定义了前向传播的函数,而反向传播的函数将会由`autograd`自动给出。\n", "\n", "网络的可学习由`net.parameters()`返回。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n", "torch.Size([6, 1, 3, 3])\n" ] } ], "source": [ "params = list(net.parameters())\n", "print(len(params))\n", "print(params[0].size()) # conv1's .weight" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "让我们尝试一个32×32随机输入。注意:该网络(LeNet)的预期输入大小为32x32。要在MNIST数据集上使用此网络,请将图像从数据集中调整为32×32。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 0.0203, 0.0255, 0.0466, 0.1283, 0.1069, 0.1514, -0.0276, -0.0390,\n", " -0.0746, -0.0476]], grad_fn=)\n" ] } ], "source": [ "input = torch.randn(1, 1, 32, 32)\n", "out = net(input)\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "用随机梯度将所有参数和反向传播器的梯度缓冲区归零:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "net.zero_grad()\n", "out.backward(torch.randn(1, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note:\n", "> `torch.nn`仅支持mini-batches。整个`torch.nn`仅支持作为mini-batch采样的输入,而非单采样的输入。\n", "例如,`nn.Conv2d`接受4维的张量:`nSamples x nChannels x Height x Width`.\n", "如果你有一个单采样的输入,则只需使用`input.unsqueeze(0)`即可添加伪造的批次尺寸。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在继续进行之前,让我们回顾一下到目前为止所看到的所有课程。\n", "\n", "**概括:**\n", "\n", "* `torch.Tensor` - 一个支持例如`backward()`等autograd操作的多维数组,同时包含关于张量的梯度。\n", "* `nn.Module` - 神经网络模块。一个封装参数的便捷途径,同时有将其移动到GPU、导出、加载的帮助器。\n", "* `nn.Parameter` - 一种张量,当给Module赋值时能够自动注册为一个参数。\n", "* `autograd.Function` - 使用autograd自动实现前向传播和反向传播。每个张量的操作都至少会生成一个独立的Function节点,与生成该张量的函数相连之后,记录下操作历史。\n", "\n", "**到这里,我们掌握了:**\n", "* 如何定义一个神经网络\n", "* 处理输入并调用反向传播\n", "\n", "**接下来我们还将了解到:**\n", "* 计算损失函数\n", "* 更新网络权重" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 损失函数\n", "\n", "损失函数采用一对(输出,目标)作为输入,并计算一个输出值来评估与目标的距离。\n", "\n", "`nn`包中有几个不同的损失函数。一个简单的损失函数是:`nn.MSELoss`,计算输入和目标之间的均方误差。\n", "\n", "例如:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(0.5661, grad_fn=)\n" ] } ], "source": [ "output = net(input)\n", "target = torch.randn(10) # a dummy target, for example\n", "target = target.view(1, -1) # make it the same shape as output\n", "criterion = nn.MSELoss()\n", "\n", "loss = criterion(output, target)\n", "print(loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "现在,如果你使用`.grad_fn`属性在反向传播方向跟踪`loss`,你将会看到这样一张计算图。\n", "::\n", "\n", " input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n", " -> view -> linear -> relu -> linear -> relu -> linear\n", " -> MSELoss\n", " -> loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "所以,当我们调用`loss.backward()`,整个图是有关损失的微分,图中所有`requires_grad=True`的张量的`.grad`属性中将会累积梯度。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n" ] } ], "source": [ "print(loss.grad_fn) # MSELoss\n", "print(loss.grad_fn.next_functions[0][0]) # Linear\n", "print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 反向传播\n", "\n", "要完成反向传播,我们所要做的是`loss.backward()`。你需要清空现有的梯度值,否则梯度将被累积到现有的梯度中。\n", "\n", "现在我们将调用`loss.backward()`,并观察conv1在反向传播之前和之后的偏置梯度。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "conv1.bias.grad before backward\n", "tensor([0., 0., 0., 0., 0., 0.])\n", "conv1.bias.grad after backward\n", "tensor([-0.0008, -0.0017, -0.0009, 0.0055, -0.0086, 0.0021])\n" ] } ], "source": [ "net.zero_grad() # zeroes the gradient buffers of all parameters\n", "\n", "print('conv1.bias.grad before backward')\n", "print(net.conv1.bias.grad)\n", "\n", "loss.backward()\n", "\n", "print('conv1.bias.grad after backward')\n", "print(net.conv1.bias.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "现在我们已经知道了如何使用损失函数。\n", "\n", "**只剩下:**\n", "\n", "* 如何更新网络权重" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 更新网络权重\n", "\n", "实践中最简单的权重更新规则是随机梯度下降(Stochastic Gradient Descent, SGD):\n", "\n", "`weight = weight - learning_rate * gradient`\n", "\n", "我们可以在python中这样实现出来:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "learning_rate = 0.01\n", "for f in net.parameters():\n", " f.data.sub_(f.grad.data * learning_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果希望使用各种不同的更新规则,例如SGD,Nesterov-SGD,Adam,RMSProp等,`torch.optim`实现所有这些方法。使用它非常简单:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import torch.optim as optim\n", "\n", "# 创建你的优化器\n", "optimizer = optim.SGD(net.parameters(), lr=0.01)\n", "\n", "# 在训练循环中\n", "optimizer.zero_grad() # zero the gradient buffers\n", "output = net(input)\n", "loss = criterion(output, target)\n", "loss.backward()\n", "optimizer.step() # Does the update" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }