A Classification Process Outlined in Simple Terms

Jessica YungData Science

Classification is determining which category an input belongs to. An example is trying to determine if an image is of a frog, a dog or a log.
uDacity’s Deep Learning course gives a nice overview of one classification process (Multinomial Logistic Classification) with the following diagram and four steps (we’ll translate this in a moment):
  1. Take an input.
  2. Turn input into logits using a linear model.
  3. Feed logits (scores) into softmax to turn them into probabilities.
  4. Convert softmax probabilities to one-hot encoded labels (predictions) using cross-entropy.

uDacity Deep Learning Course Lesson 1 Video 21: Cross-Entropy, 01:12

What does that mean?
Take the example of trying to determine whether an image is of a frog, a dog or a log. Then all we’re saying is:

1) Take an input. This might mean converting the 50×50 pixel image into a 50×50 matrix of numbers, with each number representing the colour of one pixel.

2) Turn the input into scores for each category (frog, dog, log) using a model. This is where much of the action happens.

  1. Here we’re using a linear model. (Linear means a straight line.) In uDacity’s diagram, this is written as WX + b, where W is a matrix of weights, X is the input and b is the bias.
    1. Think of the weights as importance you assign to each feature (row of an input).
    2. Bias is independent of features. If the bias for that row (category) is high, then the score for that row will be higher.
    3. Example:
      1. An example of a linear model WX + b

        An example of a linear model WX + b

  2. Logits are scores. Each row of the score vector represents one label: e.g. frog, dog, log. The higher the score in the ‘Frog’ row, the more confident we are that the image is that of a frog.
  3. Think of vectors as a way of representing information in a column for now.

3) Convert these scores into probabilities. Higher scores translate into higher probabilities. This is because a higher score for e.g. ‘Frog’ -> we’re more confident that the image is a frog -> the higher probability we assign to the image being a frog.

  1. Softmax is a function that turns our vector of scores into probabilities that sum to 1.

4) Convert probabilities into predictions. One-hot encoding means that the vector has value 1.0 for the row of the correct class (‘frog’) and is 0 everywhere else. So it’s ‘hot’ for precisely one row (the category the model thinks the input belongs to). So it’s ‘one-hot’. Easy.

Classification Process Outline, simplified

Classification Process Outline, simplified

That’s it! We’ll examine logistic classifiers (a kind of model), softmax and cross-entropy in more detail later on.


Learn more with: 
Disclaimer: Translations are not rigorous.