Convolutional networks
This is an annotated version of Keras's example MNIST CNN code for how to implement convolutional networks.
The CIFAR-10 classification task is a classic machine learning benchmark. The data includes 50,000 images belonging to 10 classes, and the task is to identify them. Along with MNIST, CIFAR-10 classification is a sort of like a "hello world" for computer vision and convolutional networks, so a solution can be implemented quickly with an off-the-shelf machine learning library.
Since convolutional neural networks have thus far proven to be the best at computer vision tasks, we'll use the Keras library to implement a convolutional networks as our solution. Keras provides a well-designed and readable API on top of TensorFlow's backend, so we'll be done in a surprisingly short amount of steps!
Note, if you have been running these notebooks on a regular laptop without GPU until now, it's going to become more and more difficult to do so. The neural networks we will be training, starting with convolutional networks, will become increasingly memory and processing-intensive and may slow down laptops without good graphics processing.
%matplotlib inline import os import matplotlib.pyplot as plt import numpy as np import random import keras from keras.models import Sequential from keras.layers import Dense, Dropout from keras.layers import Conv2D, MaxPooling2D, Flatten from keras.layers import Activation
Recall that a basic neural network in Keras can be set up like this:
model = Sequential() model.add(Dense(100, activation='sigmoid', input_dim=3072)) model.add(Dense(100, activation='sigmoid')) model.add(Dense(10, activation='softmax')) model.summary()
We load CIFAR-10 dataset and reshape them as individual vectors.
from keras.datasets import cifar10 # load CIFAR (x_train, y_train), (x_test, y_test) = cifar10.load_data() num_classes = 10 # reshape CIFAR x_train = x_train.reshape(50000, 32*32*3) x_test = x_test.reshape(10000, 32*32*3) # make float32 x_train = x_train.astype('float32') x_test = x_test.astype('float32') # normalize to (0-1) x_train /= 255 x_test /= 255 # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) print('%d train samples, %d test samples'%(x_train.shape[0], x_test.shape[0])) print("training data shape: ", x_train.shape, y_train.shape) print("test data shape: ", x_test.shape, y_test.shape)
Let's see some of our samples.
samples = np.concatenate([np.concatenate([x_train[i].reshape((32,32,3)) for i in [int(random.random() * len(x_train)) for i in range(16)]], axis=1) for i in range(6)], axis=0) plt.figure(figsize=(16,6)) plt.imshow(samples, cmap='gray')
We can compile the model using categorical-cross-entropy loss, and train it for 30 epochs.
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=128, epochs=30, validation_data=(x_test, y_test))
Then we can evaluate the model.
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1])
Not very good! With more training time, perhaps 100 epochs, we might get 40% accuracy.
Now onto convolutional networks... The general architecture of a convolutional neural network is:
convolution layers, followed by pooling layers
fully-connected layers
a final fully-connected softmax layer
We'll follow this same basic structure and interweave some other components, such as [dropout](https://en.wikipedia.org/wiki/Dropout_(neural_networks), to improve performance.
To begin, we start with our convolution layers. We first need to specify some architectural hyperparemeters:
How many filters do we want for our convolution layers? Like most hyperparameters, this is chosen through a mix of intuition and tuning. A rough rule of thumb is: the more complex the task, the more filters. (Note that we don't need to have the same number of filters for each convolution layer, but we are doing so here for convenience.)
What size should our convolution filters be? We don't want filters to be too large or the resulting matrix might not be very meaningful. For instance, a useless filter size in this task would be a 28x28 filter since it covers the whole image. We also don't want filters to be too small for a similar reason, e.g. a 1x1 filter just returns each pixel.
What size should our pooling window be? Again, we don't want pooling windows to be too large or we'll be throwing away information. However, for larger images, a larger pooling window might be appropriate (same goes for convolution filters).
We start by designing a neural network with two alternating convolutional and max-pooling layers, followed by a 100-neuron fully-connected layer and a 10-neuron output. We'll have 64 and 32 filters in the two convolutional layers, and make the input shape a full-sized image (32x32x3) instead of an unrolled vector (3072x1). We also now use ReLU activation units instead of sigmoids, to avoid vanishing gradients.
model = Sequential() model.add(Conv2D(64, (3, 3), padding='same', input_shape=(32,32,3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(32, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(100)) model.add(Activation('relu')) model.add(Dense(num_classes)) model.add(Activation('softmax')) model.summary()
We need to reload the CIFAR-10 dataset and this time do not reshape them into unrolled input vectors -- let them stay as images, although continue to normalize them.
from keras.datasets import cifar10 # load CIFAR (x_train, y_train), (x_test, y_test) = cifar10.load_data() num_classes = 10 # do not reshape CIFAR if you have a convolutional input! # make float32 x_train = x_train.astype('float32') x_test = x_test.astype('float32') # normalize to (0-1) x_train /= 255 x_test /= 255 # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) print('%d train samples, %d test samples'%(x_train.shape[0], x_test.shape[0])) print("training data shape: ", x_train.shape, y_train.shape) print("test data shape: ", x_test.shape, y_test.shape)
Let's compile the model and test it again.
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=128, epochs=30, validation_data=(x_test, y_test))
Let's evaluate the model again.
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1])
63% accuracy is a big improvement on 40%! All of that is accomplished in just 30 epochs using convolutional layers and ReLUs.
Let's try to make the network bigger.
model = Sequential() model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(256)) model.add(Activation('relu')) model.add(Dense(100)) model.add(Activation('relu')) model.add(Dense(num_classes)) model.add(Activation('softmax')) model.summary()
Compile and train again.
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=128, epochs=50, validation_data=(x_test, y_test))
Evaluate test accuracy.
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1])
One problem you might notice is that the accuracy of the model is much better on the training set than on the test set. You can see that by monitoring the progress at the end of each epoch above or by evaluating it directly.
score = model.evaluate(x_train, y_train, verbose=0) print('Training loss:', score[0]) print('Training accuracy:', score[1])
77% accuracy on the training set but only 68% on the test set. Looking at the monitored training, the validation accuracy and training accuracy began to diverge around epoch 10.
Something must be wrong! This is a symptom of "overfitting". Our model has probably tried to bend itself a little too well towards predicting the training set but does not generalize very well to unseen data. This is a very common problem.
It's normal for the training accuracy to be better than the testng accuracy to some degree, because it's hard to avoid for the network to be better at predicting the data it sees. But a 9% difference is too much.
One way of helping this is by doing some regularization. We can add dropout to our model after a few layers.
model = Sequential() model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(256)) model.add(Activation('relu')) model.add(Dense(100)) model.add(Activation('relu')) model.add(Dropout(0.25)) model.add(Dense(num_classes)) model.add(Activation('softmax')) model.summary()
We compile and train again.
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=128, epochs=50, validation_data=(x_test, y_test))
We check our test loss and training loss again.
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) score = model.evaluate(x_train, y_train, verbose=0) print('Training loss:', score[0]) print('Training accuracy:', score[1])
Now our training accuracy is lower (72%) but our test accuracy is higher (69%). This is more like what we expect.
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) score = model.evaluate(x_train, y_train, verbose=0) print('Training loss:', score[0]) print('Training accuracy:', score[1])
Another way of improving performance is to experiment with different optimizers beyond just standard sgd. Let's try to instantiate the same network but use ADAM instead of sgd.
model = Sequential() model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(256)) model.add(Activation('relu')) model.add(Dense(100)) model.add(Activation('relu')) model.add(Dropout(0.25)) model.add(Dense(num_classes)) model.add(Activation('softmax')) model.summary()
model.compile( loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] ) model.fit(x_train, y_train, batch_size=128, epochs=50, validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) score = model.evaluate(x_train, y_train, verbose=0) print('Training loss:', score[0]) print('Training accuracy:', score[1])
78% accuracy! Our best yet. Looks heavily overfit though (99% accuracy on the training set... maybe needs more dropout?)
Still a long way to go to beat the record (96%). We can make a lot of progress by making the network (much) bigger, training for (much) longer and using a lot of little tricks (like data augmentation) but that is beyond the scope of this lesson for now.
Let's also recall how to predict a single value and look at its probabilities.
import matplotlib x_sample = x_test[0].reshape(1,32,32,3) y_prob = model.predict(x_sample)[0] y_pred = y_prob.argmax() y_actual = y_test[0].argmax() print("predicted = %d, actual = %d" % (y_pred, y_actual)) matplotlib.pyplot.bar(range(10), y_prob)
Let's also review here how to save and load trained keras models. It's easy! From Keras docuemtnation
from keras.models import load_model model.save('my_model.h5') # creates a HDF5 file 'my_model.h5' del model # deletes the existing model # returns a compiled model # identical to the previous one model = load_model('my_model.h5')