What makes Numpy Arrays Fast: Memory and Strides

Jessica YungMachine Learning, Programming

How is Numpy so fast? In this post we find out how Numpy’s ndarray is stored and how it is usually manipulated by Numpy functions using strides.

Getting to know the ndarray

A NumPy ndarray is a N-dimensional array. You can create one like this:

These arrays are homogenous arrays of fixed-sized items. That is, all the items in an array are of the same datatype and of the same size. For example, you cannot put a string 'hello' and an integer 16 in the same  ndarray.

Ndarrays have two key characteristics: shape and dtype. The shape describes the length of each dimension of the array, i.e. the number of items directly in that dimension, counting an array as one item. For example, the array X above has shape (2,3). We can visualise it like this:

np.array([[0,1,2],[3,4,5]])

The dtype (data type) defines the item size. For example, each int16 item has a size of 16 bits, i.e. 16/8=2 bytes. (One byte is equal to 8 bits.) Thus X.itemsize is 2. Specifying the dtype is optional.

How NumPy arrays are stored in memory

Numpy arrays are stored in a single contiguous (continuous) block of memory. There are two key concepts relating to memory: dimensions and strides.

Strides are the number of bytes you need to step in each dimension when traversing the array.

Let’s see what the memory looks like for the array¬† X we described earlier:

Memory for np.array([[0,1,2],[3,4,5]]).

Calculating strides: If you want to move across one array in dimension 0, you need to move across three items. Each item has size 2 bytes. So the stride in dimension 0 is 2 bytes x 3 items = 6 bytes.

Similarly, if you want to move across one unit in dimension 1, you need to move across 1 item. So the stride in dimension 1 is 2 bytes x 1 item = 2 bytes. The stride in the last dimension is always equal to the itemsize.

We can check the strides of an array using .strides:

Why do we care about strides?

Firstly, many Numpy functions use strides to make things fast. Examples include integer slicing (e.g. X[1,0:2]) and broadcasting. Understanding strides helps us better understand how Numpy operates.

Secondly, we can directly use strides to make our own code faster. This can be particularly useful for data pre-processing in machine learning.

Example: Faster data pre-processing in machine learning

For example, we may want to predict the closing price of a stock using the closing prices from ten days prior. We thus want to create an array of features X that looks like this:

Array of features X (closing prices from 10 days prior) and target y. Each row is one feature-target pair.

One way is to just loop through the days, copying the prices as we go. A faster way is using as_strided, but this can be risky because it doesn’t check that you’re accessing memory within the array. I advise you to use the option writeable=False when using as_strided, which ensures you at least don’t write to the original array.

The second method is significantly faster than the first:

If you want to find out how to make your code faster, I recommend looking at Nicolas Rougier’s guide ‘From Python to Numpy’, which describes how to vectorise your code and problems to make the most of Numpy’s speed boosts.

References