# Traffic Sign Classifier: Normalising Data

Jessica Yung1 Comment In this post, we’ll talk about (1) what normalising data is, (2) why you might want to do it, and (3) how you can do it (with examples).

#### Background: The Mystery of the Horrifically Inaccurate Model

Let me tell you a story. Once upon a time, I trained a few models to classify traffic signs for Udacity’s Self-Driving Car Nanodegree. I first copied neural networks from Tensorflow tutorials and adapted them for my image set. The tutorials said that their accuracy of 91% for the handwritten digits was embarrassing. So I thought that my models, though for a different dataset, would probably still achieve accuracy of above 50%. I’d seen people with over 95% accuracy implementing fairly straightforward models on the student Slack chats.

Guess what accuracy my model had? 5-6%.

FIVE TO SIX PERCENT. It was in that range after 2 iterations and after over 120 iterations.

Something was wrong.

Fortunately, I had a mini-project later on where I trained models to classify traffic signs using Keras. And in that project, I (1) normalised the data and (2) did not use convolutions (a type of neural network layer we’ll get to later). And guess what! I got an accuracy of over 60% in two iterations. So maybe – just maybe – normalising the data will be a quick fix!

#### Normalising data

Preview: WELCOME TO THE ZOMBIE APOCALYPSE. The original image in the top left looks more normal, but the transformed image on the right is the normalised one. 😉 #### What does normalising the data mean?

Normalisation scales all numeric variables in the range [0,1]. You can implement this with $x_{new} = \frac{x-x_{min}}{x_{max}-x_{min}}$.

• Disadvantage: If you have outliers in your dataset (e.g. one datapoint with value 10,000 when all the others are between 0 and 100), normalising your data will scale most of the data to a very small interval. Most datasets have outliers.

Another common preprocessing technique is standardisation.

Standardisation transforms your data to have a mean (average) of zero and a variance of one. You can implement this with $x_{new} = \frac{x-\mu}{\sigma}$.

• Variance is the standard deviation $\sigma$ squared. The standard deviation is a measure of how far away from the mean (average) the datapoint is.

Why do we care about standard deviations?

When data are normally (read: prettily) distributed, about two-thirds of the data is within one standard deviation away from the mean. Normal distribution. Image creds: StatisticsHowto

The normal distribution is special because it turns out that no matter how your data is distributed, as the sample size gets large, the means are normally distributed (Central Limit Theorem). It’s like magic.

Aside 1: If you’ve worked with normal distributions, you might think that normalisation is standardisation because standardisation is how you get the Z-statistic. I did.

Aside 2: A normal distribution is also called a Gaussian distribution. The shape is referred to as a bell curve.

#### Why might we normalise the data?

The same range of values for each of the inputs to the neural network can guarantee stable convergence of weights and biases. (Source: Mahmoud Omid on ResearchGate)
Suppose we have one image that’s really dark (almost all black) and one that’s really bright (almost all white). Our model has to address both cases using the same parameters (weights and biases). It’s hard for our model to be accurate and generalise well if it has to tackle both extreme cases.

#### How do we normalise (or standardise) the data?

We just translate the formulae given above into code:

Some examples Top left: original. Top right: normalised. Bottom: standardised.

But at other times, normalising images brings out features we wouldn’t have been able to see otherwise. Top left: original. Top right: normalised. Bottom: standardised.

You will also notice that the normalised representation of this image (top right) is different from the standardised representation. The normalised version is humanly readable but there is little contrast in the sign, whereas the standardised version has much more contrast. These are things you’d want to consider when choosing between normalisation and standardisation for preprocessing.

Bonus: Here’s the function plot_norm_images I used to quickly plot the normalised and un-normalised images next to each other.  #### Results

So time for the big reveal. Did normalising the data save the model?

No. After 15 epochs (iterations), my model still had an accuracy of only 5.9%.

Okay, let’s try altering the network architecture next.

PS: We will compare the performance of un-normalised vs normalised data input to models later on, so stay tuned!

1. 