60 Minutes of PyTorch - Part 1

Tensors and Autograd

Gregor Koehler

Let's install PyTorch (without GPU support for now). For more variants visit PyTorch Get Started. This tutorial is just an interactive port from here.

pip install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl

pip install torchvision

80.5s

Import PyTorch.

import torch

1.4s

Now let's get familiar with PyTorch tensors.

Construct a 5x3 matrix, uninitialized:

x = torch.Tensor(5, 3)
print x)

0.3s

Construct a randomly initialized matrix:

x = torch.rand(5, 3)
print(x

0.4s

Get its size:

print(x.size()

0.3s

Operations

Addition (Syntax 1):

0.4s

Addition (Syntax 2):

print torch.add(x, y))

0.4s

Giving a specific output tensor:

result = torch.Tensor(5, 3)
torch.add(x, y, out=result)
print result

0.3s

Addition in-place:

# adds x to y
y.add_(x
print y

0.3s

Note: Any operation that mutates a tensor in-place is post-fixed with an _ For example: x.copy_(y), x.t_(), will change x.

For more tensor operations visit the torch doc page.

Numpy-like PyTorch

The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.

You can even use standard numpy-like indexing for PyTorch tensors:


print x[:, 1]

0.4s

Converting Torch Tensor to Numpy Array

The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.

Converting torch Tensor to numpy Array:

a = torch.ones(5)

0.5s

b = a.numpy()
print b)

0.3s

See how the numpy array changed in value:

a.add_(1)
print a
print b

0.4s

Converting Numpy Array to Torch Tensor (aka the other way 'round):

import numpy as np

a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)

print a
print(b

0.4s

All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

CUDA Tensors

Tensors can be moved onto GPU using the .cuda function.

Note: Not usable as of now since this article runs on an instance without GPU support!

# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    print x + y

0.3s

Autograd: Automatic Differentiation

Central to all neural networks in PyTorch is the autograd package. Let’s first briefly visit this, and we will then go to training our first neural network.

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

Variable

autograd.Variable is the central class of the package. It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call .backward() and have all the gradients computed automatically. You can access the raw tensor through the .data attribute, while the gradient w.r.t. this variable is accumulated into .grad.

There’s one more class which is very important for autograd implementation - a Function.

Variable and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a .grad_fn attribute that references a Function that has created the Variable (except for Variables created by the user - their grad_fn is None).

If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that is a tensor of matching shape.

import torch
from torch.autograd import Variable

0.4s

Create a variable:

x = Variable(torch.ones(2, 2), requires_grad=True)
print x

0.3s

Do an operation of variable:

y = x + 2
print(y

0.3s

Since y was created as a result of an operation, it has a grad_fn.

print y.grad_fn

0.4s

Do more operations on y:

z = y * y * 3
out = z.mean()

print z, out)

0.4s

Gradients

Let's backprop now. out.backward() is equivalent to doing out.backward(torch.Tensor([1.0])).

out.backward()

0.2s

Print gradients d(out)/dx:

0.4s

Continue with Part 2: Neural Networks