How is Numpy so fast? In this post we find out how Numpy’s ndarray is stored and how it is usually manipulated by Numpy functions using strides.
Getting to know the ndarray
A NumPy ndarray is a N-dimensional array. You can create one like this:
X = np.array([[0,1,2],[3,4,5]], dtype='int16')
These arrays are homogenous arrays of fixed-sized items. That is, all the items in an array are of the same datatype and of the same size. For example, you cannot put a string 'hello' and an integer 16 in the same ndarray.
Ndarrays have two key characteristics: shape and dtype. The shape describes the length of each dimension of the array, i.e. the number of items directly in that dimension, counting an array as one item. For example, the array X above has shape (2,3). We can visualise it like this:The dtype (data type) defines the item size. For example, each int16 item has a size of 16 bits, i.e. 16/8=2 bytes. (One byte is equal to 8 bits.) Thus X.itemsize is 2. Specifying the dtype is optional.
How NumPy arrays are stored in memory
Numpy arrays are stored in a single contiguous (continuous) block of memory. There are two key concepts relating to memory: dimensions and strides.
Strides are the number of bytes you need to step in each dimension when traversing the array.
Let’s see what the memory looks like for the array X we described earlier:
Calculating strides: If you want to move across one array in dimension 0, you need to move across three items. Each item has size 2 bytes. So the stride in dimension 0 is 2 bytes x 3 items = 6 bytes.
Similarly, if you want to move across one unit in dimension 1, you need to move across 1 item. So the stride in dimension 1 is 2 bytes x 1 item = 2 bytes. The stride in the last dimension is always equal to the itemsize.
We can check the strides of an array using .strides:
Why do we care about strides?
Firstly, many Numpy functions use strides to make things fast. Examples include integer slicing (e.g. X[1,0:2]) and broadcasting. Understanding strides helps us better understand how Numpy operates.
Secondly, we can directly use strides to make our own code faster. This can be particularly useful for data pre-processing in machine learning.
Example: Faster data pre-processing in machine learning
For example, we may want to predict the closing price of a stock using the closing prices from ten days prior. We thus want to create an array of features X that looks like this:
One way is to just loop through the days, copying the prices as we go. A faster way is using as_strided, but this can be risky because it doesn’t check that you’re accessing memory within the array. I advise you to use the option writeable=False when using as_strided, which ensures you at least don’t write to the original array.
The second method is significantly faster than the first:
import numpy as np
from timeit import timeit
from numpy.lib.stride_tricks import as_strided
# Adapted from Alex Rogozhnikov (linked below)
# Generate array of (fake) closing prices
prices = np.random.randn(100)
# We want closing prices from the ten days prior
window = 10
# Create array of closing prices to predict
y = prices[window:]
# Create array of zeros the same size as our final desired array
X1 = np.zeros([len(prices) - window, window])
# For each day in the appropriate range
for day in range(len(X1)):
# take prices for ten days from that day onwards
X1[day,:] = prices[day:day+window]
# Save stride (num bytes) between each item
stride, = prices.strides
desired_shape = [len(prices) - window, window]
# Get a view of the prices with shape desired_shape, strides as defined, don't write to original array
X2 = as_strided(prices, desired_shape, strides=[stride, stride], writeable=False)
timeit(make_X1) # 56.7 seconds
timeit(make_X2) # 7.7 seconds, over 7x faster!
If you want to find out how to make your code faster, I recommend looking at Nicolas Rougier’s guide ‘From Python to Numpy’, which describes how to vectorise your code and problems to make the most of Numpy’s speed boosts.