In this post I compare the performance of models that use max pooling and dropout in the convolutional layer with those that don’t. This experiment will be on a traffic sign classifier used in Udacity’s Self-Driving Car Nanodegree. The full code is on GitHub.
Recap: Max Pooling and Dropout
Max Pooling: A way of reducing the dimensionality of input (by making assumptions). Max pooling takes the maximum of each non-overlapping region of the input:
Dropout: Nodes (weights, biases) are dropped out at random with probability . Only the reduced network is trained on the data at that stage. This is expected to decrease overfitting and improve training time.
(Links to further reading at the end of this post.)
In this example, we train a three-layer convolutional neural network to classify traffic signs. This network consists of one convolutional layer followed by two fully connected layers and an output layer. The code for the network with pooling and dropout is given below. The remaining three networks can be obtained by removing the pooling or dropout code.
# Network training parameters
nb_epoch = 100
batch_size = 100
# Convnet parameters
nb_filters = 32
kernel_size = (3, 3)
input_shape = (32, 32, 3)
pool_size = (2, 2)
dropout_conv = 0.2
dropout_fc = 0.5
# Build model
model = Sequential()
# Layer 1: Conv layer
# Layer 2: Fully connected layer 1
# Layer 3: Fully connected layer 2
# Layer 4: Output layer
# Convert to 43 labels since 43 outputs
# Softmax to compute probabilities
output = model.add(Activation('softmax'))
# Compile and train the model.
history = model.fit(X_train, y_train,
verbose=1, validation_data=(X_val, y_val))
We train four networks: one with neither pooling nor dropout, one with only pooling, one with only dropout, and one with both pooling and dropout.
Data: There are 43 classes (types of traffic signs) in total. We have 39209 training samples and 12630 validation samples.
Here are some examples of traffic sign images we want to classify. We will use normalised data.
Carrying out the experiment
To make our results reproducible, I shuffled the training and test data with
I created copies of the data that were normalised and standardised. I then trained the model for 100 epochs on each version of the data with a batch size of 100.
Differences in convergent validation accuracies
Putting the networks in descending order of accuracy, we have
- Pooling and dropout (0.9942)
- Pooling with no dropout (0.9902)
- Dropout with no pooling (0.9896)
- No pooling or dropout (0.9870)
(Numbers in parentheses are means of validation accuracies for each network in epochs 80-100.)
This is unsurprising. Adding pooling and dropout makes the network more robust (compare with training accuracy orderings below). Notably, the networks with only pooling or only dropout perform similarly: their validation accuracies differ by only 0.06%.
Differences in convergent training accuracies
The differences in convergent training accuracies are much smaller than differences in validation accuracies. All the convergent accuracies are above 0.995 and within 0.2% of each other. Putting the networks in descending order of accuracy, we have
- Pooling with no dropout (0.9971)
- No pooling or dropout (0.9965)
- Dropout with no pooling (0.9958)
- Pooling and dropout (0.9951)
(Numbers in parentheses are means of training accuracies for each network in epochs 80-100.)
Pooling with dropout has the highest convergent validation accuracy but also the lowest convergent training accuracy. The difference between the two is only 0.9%. This suggests there may be worse overfitting in the remaining three networks. We will examine overfitting in each situation in more depth in the next section.
Differences in early training (accuracies in the first 10 epochs)
For completeness, here are the training accuracies for the first 10 epochs. The accuracies increase quickly in the first 3 epochs. The orderings do not deviate wildly from convergent accuracy orderings.
Differences between training and validation accuracy per network (overfitting)
The differences between training and validation accuracies is much smaller (tighter) for pooling and dropout compared to the other three networks. There is no consistent gap here.
Coming roughly in joint second are only dropout or only pooling, with a consistent gap of about 0.006.
Using neither pooling nor dropout has the highest consistent gap of about 0.01.
Means of training accuracy - validation accuracy in epochs 80-100 (lower gap first):
- Pooling and dropout (0.0009)
- Dropout but no pooling (0.0061)
- Pooling but no dropout (0.0069)
- No pooling or dropout (0.0094)
(Note: Be wary of the differences in y-axis scale across the graphs.)
Differences in training speed
Each network was trained on 31367 samples and validation on 7842 samples in each epoch. The training times per epoch are as follows:
- No pooling or dropout: 10s
- Pooling with no dropout: 5s
- Dropout with no pooling: 11s
- Pooling and dropout: 5s
Pooling seems to reduce training time by about 50%.
Stay tuned for a post explaining code for a Convolutional Neural Network in TensorFlow.