Chapter 9, Convolutional Network

A Convolutional Network for recognising handwritten digits

based on a code from the Flux model zoo which can be found at

https://github.com/FluxML/model-zoo/blob/master/vision/mnist/conv.jl

Load packages and data

First we load the required packages:

using Flux, Flux.Data.MNIST, Images, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated, partition
using Printf
using Plots

19.7s

Julia

Now we read in the data:

labels = MNIST.labels();
images = MNIST.images();

11.7s

Julia

length(labels)

1.0s

Julia

60000

typeof(images[1])

0.7s

Julia

Array{Gray{Normed{UInt8,8}},2}

display(images[1])

21.2s

Julia

size(images[1])

0.6s

Julia

(28, 28)

labels[1]

0.0s

Julia

Batch the data

# Bundle images together with labels and group into minibatchess
function make_minibatch(X, Y, idxs)
    X_batch = Array{Float32}(undef, size(X[1])..., 1, length(idxs))
    for i in 1:length(idxs)
        X_batch[:, :, :, i] = Float32.(X[idxs[i]])
    end
    Y_batch = onehotbatch(Y[idxs], 0:9)
    return (X_batch, Y_batch)
end

0.7s

Julia

make_minibatch (generic function with 1 method)

# Prepare train set as a collection of minibatchs:
train_images = MNIST.images(1:10000)
train_labels = MNIST.labels(1:10000)
batch_size = 500
mb_idxs = partition(1:length(train_images), batch_size)
train_set = [make_minibatch(train_images, train_labels, i) for i in mb_idxs];

0.8s

Julia

length(train_images)

0.0s

Julia

10000

# Prepare test set as one giant minibatch:
test_images = MNIST.images(:test)
test_labels = MNIST.labels(:test)
test_set = make_minibatch(test_images, test_labels, 1:length(test_images));

0.4s

Julia

length(test_images)

0.0s

Julia

10000

Setup the convolutional model

# Define our model.  
# We will use a simple convolutional architecture with
# three iterations of Conv -> ReLU -> MaxPool, 
# followed by a final Dense layer that
# feeds into a softmax probability output.
@info("Constructing model...")
model = Chain(
    # First convolution, operating upon a 28x28 image
    Conv((3, 3), 1=>16, pad=(1,1), relu),
    MaxPool((2,2)),

    # Second convolution, operating upon a 14x14 image
    Conv((3, 3), 16=>32, pad=(1,1), relu),
    MaxPool((2,2)),

    # Third convolution, operating upon a 7x7 image
    Conv((3, 3), 32=>32, pad=(1,1), relu),
    MaxPool((2,2)),

    # Reshape 3d tensor into a 2d one, 
    # at this point it should be (3, 3, 32, N)
    # which is where we get the 288 in the `Dense` 
    # layer below:
    x -> reshape(x, :, size(x, 4)),
    Dense(288, 10),

    # Finally, softmax to get nice probabilities
    softmax,
)

2.5s

Julia

Chain(Conv((3, 3), 1=>16, NNlib.relu), MaxPool((2, 2), pad = (0, 0, 0, 0), stride = (2, 2)), Conv((3, 3), 16=>32, NNlib.relu), MaxPool((2, 2), pad = (0, 0, 0, 0), stride = (2, 2)), Conv((3, 3), 32=>32, NNlib.relu), MaxPool((2, 2), pad = (0, 0, 0, 0), stride = (2, 2)), getfield(Main, Symbol("##5#6"))(), Dense(288, 10), NNlib.softmax)

# Make sure our model is nicely precompiled 
# before starting our training loop
model(train_set[1][1])

10.8s

Julia

Tracked 10×500 Array{Float32,2}: 0.116719 0.0890316 0.352937 … 0.0759168 0.482644 0.286681 0.00205442 0.000673691 0.00150963 0.00144273 0.000346373 0.00220477 0.7098 0.356216 0.190664 0.610095 0.206334 0.414075 0.00123294 0.0031064 0.0132832 0.00115622 0.00141644 0.00428468 0.0138433 0.0115494 0.0220595 0.00517217 0.0137602 0.020465 0.000423318 0.0286348 0.00882138 … 0.00600064 0.00366313 0.00350024 0.0463217 0.16093 0.13481 0.113303 0.0252069 0.0294534 0.096597 0.252758 0.196833 0.172944 0.192404 0.220245 0.00135532 0.0573738 0.0187412 0.00272084 0.00606261 0.00497499 0.0116537 0.0397263 0.0603413 0.0112496 0.0681625 0.0141164

Define the loss function and select an optimiser

loss(x, y) = sum(Flux.crossentropy(model(x), y))
opt = ADAM(0.001) # Momentum(0.01)
accuracy(x, y) = mean(Flux.onecold(model(x), 1:10) .== Flux.onecold(y, 1:10))
n_epochs = 1

0.0s

Julia

Train the model

training for 1 epoch takes 3 min with no GPU, accuracy on the test set reaches 50% after 1 epoch, 77% after 2, 87% after 3, 91% after 4 and 93% after 5.

@Flux.epochs n_epochs Flux.train!(
	loss, params(model), train_set, opt,
	cb=() -> @show accuracy(test_set...)
)

164.1s

Julia

Display the results

pred_test_labels = Flux.onecold(model(test_set[1]), 1:10)
true_test_labels = Flux.onecold(test_set[2], 1:10)
acc = mean(pred_test_labels .== true_test_labels)
cm = zeros(Int64, 10, 10)
for i in 1:length(pred_test_labels)
      cm[pred_test_labels[i],true_test_labels[i]] += 1
end

7.4s

Julia

cm

0.0s

Julia

10×10 Array{Int64,2}: 969 0 7 1 2 5 46 4 10 6 0 1116 0 1 3 1 19 5 8 8 2 7 986 19 9 2 5 27 23 10 2 6 7 962 0 40 0 10 23 12 0 1 4 0 908 0 5 4 8 5 2 0 1 8 1 820 11 1 21 16 0 3 0 0 6 2 854 0 4 0 2 1 16 11 7 4 0 966 26 33 3 1 10 4 9 12 18 1 840 17 0 0 1 4 37 6 0 10 11 902

p2 = heatmap(cm, c=:dense, title="Confusion Matrix, accuracy = "*string(acc), ylabel="True label", xlabel= "Predicted label", xticks=(1:10, 0:9), yticks=(1:10, 0:9))

0.7s

Julia

Shift+Enter to run

Julia