MSE is a commonly used error metric. But is it principly justified? In this post we show that minimising the mean-squared error (MSE) is not just something vaguely intuitive, but emerges from maximising the likelihood on a linear Gaussian model. Defining the terms Linear Gaussian Model Assume the data is described by the linear model , where . Assume is … Read More

## Maximum Likelihood as minimising KL Divergence

Sometimes you come across connections that are simple and beautiful. Here’s one of them! What the terms mean Maximum likelihood is a common approach to estimating parameters of a model. An example of model parameters could be the coefficients in a linear regression model , where is Gaussian noise (i.e. it’s random). Here we choose parameter values that maximise the … Read More

## Python Lists vs Dictionaries: The space-time tradeoff

If you had to write a script to check whether a person had registered for an event, what Python data structure would you use? It turns out that looking up items in a Python dictionary is much faster than looking up items in a Python list. How much faster? Suppose you want to check if 1000 items (needles) are in … Read More

## Remembering which way Jacobians go – Taking derivatives of vectors with respect to vectors

Matrices of the derivative of vectors with respect to vectors (Jacobians) take a specific form: Here, note that each column is the partial of f with respect to one component , whereas each row is the partial of with respect to the . That is, the rows ‘cover’ the range of f. You can then easily remember that C: the columns … Read More

## RNNs as State-space Systems

It’s fantastic how you can often use concepts from one field to investigate ideas in another area and improve your understanding of both areas. That’s one of the things I enjoy most. We’ve just started studying state-space models in 3F2 Systems and Control (a third-year Engineering course at Cambridge). It’s reminded me strongly of recurrent neural networks (RNNs). Look at … Read More

## Effective Deep Learning Resources: A Shortlist

A lot of people ask me how to get started with deep learning. In this post I’ve listed a few resources I recommend for getting started. I’ve only chosen a few because I’ve found precise recommendations to be more helpful. Let me know if you have any comments or suggestions! Prelude: If you’re new to machine learning Deep learning is … Read More

## AlphaGo Zero: An overview of the algorithm

In this post I go through the algorithms presented in the groundbreaking AlphaGo Zero paper using pseudocode. The objective is to provide a high-level idea of what the model does. Why AlphaGo Zero matters Last week, Google DeepMind published their final iteration of AlphaGo, AlphaGo Zero. To say its performance is remarkable is an understatement. AlphaGo Zero made two breakthroughs: … Read More

## Counterintuitive Probabilities: Typical Sets from Information Theory

Suppose we have a coin that has a 3/4 chance of landing on heads (call this 0) and a 1/4 chance of landing on tails (1). Which of the 16-toss sequences below is most likely? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 … Read More

## How I completed Udacity’s Machine Learning ND in just over one month

How can we learn more effectively in a short amount of time? In this post, I describe how I went about finishing Udacity’s Machine Learning Nanodegree in about a month when it usually takes 6-12 months. I hope this will give you some insight and ideas as to how you might work more effectively to accomplish your own learning goals. Sections in … Read More

## How to use pickle to save and load variables in Python

pickle is a module used to convert Python objects to a character stream. You can (1) use it to save the state of a program so you can continue running it later. You can also (2) transmit the (secured) pickled data over a network. The latter is important for parallel and distributed computing. How to save variables to a .pickle file: … Read More