- Take an input.
- Turn input into logits using a linear model.
- Feed logits (scores) into softmax to turn them into probabilities.
- Convert softmax probabilities to one-hot encoded labels (predictions) using cross-entropy.
1) Take an input. This might mean converting the 50×50 pixel image into a 50×50 matrix of numbers, with each number representing the colour of one pixel.
2) Turn the input into scores for each category (frog, dog, log) using a model. This is where much of the action happens.
- Here we’re using a linear model. (Linear means a straight line.) In uDacity’s diagram, this is written as WX + b, where W is a matrix of weights, X is the input and b is the bias.
- Think of the weights as importance you assign to each feature (row of an input).
- Bias is independent of features. If the bias for that row (category) is high, then the score for that row will be higher.
- Example:
-
An example of a linear model WX + b
-
- Logits are scores. Each row of the score vector represents one label: e.g. frog, dog, log. The higher the score in the ‘Frog’ row, the more confident we are that the image is that of a frog.
- Think of vectors as a way of representing information in a column for now.
3) Convert these scores into probabilities. Higher scores translate into higher probabilities. This is because a higher score for e.g. ‘Frog’ -> we’re more confident that the image is a frog -> the higher probability we assign to the image being a frog.
- Softmax is a function that turns our vector of scores into probabilities that sum to 1.
4) Convert probabilities into predictions. One-hot encoding means that the vector has value 1.0 for the row of the correct class (‘frog’) and is 0 everywhere else. So it’s ‘hot’ for precisely one row (the category the model thinks the input belongs to). So it’s ‘one-hot’. Easy.

Classification Process Outline, simplified