Numpy Views vs Copies: Avoiding Costly Mistakes

Jessica YungData Science, Programming

In this post we will talk about the differences between views and copies. It’s really important you’re aware of the difference between the two. Otherwise you might run into problems like accidentally modifying arrays.

What are views and copies?

With a view, it’s like you are viewing the original (base) array. The view is actually part of the original array even though it looks like you’re working with something else. These are analogous to shallow copies in Python.

Copies are separate objects from the original array, though right after copying the two look the same. These are analogous to deep copies in Python.

Checking if something is a copy

How can you check if something is a copy? You can check if the base of the array using [array].base: if it’s a view, the base will be the original array; if it’s a copy, the base will be None.

Views vs Copies: The main differences

Here are the main differences between views and copies:

  1. Modifying the array
    • Modifying a view modifies the base (original) array, whereas modifying a copy does not modify the base array.
  2. Time taken
    • Making a copy takes more time, often 1.5x-2x longer.
  3. Base of the array
    • A view has the same base as its base array. A copy does not.
    • (See section above ‘Checking if something is a copy’ for what this means.)
  4. Memory
    • A view also shares memory with the base array, whereas a copy does not.

Why this matters

1. The biggest one: if you do not make a copy when you need a copy, you will have problems.

  • Suppose we have an array of closing prices of different stocks from Monday to Friday, but we think some of the values may be wrong. Suppose each row contains prices of one stock, and each column corresponds to data for one day. We want to make corrections to the prices of Stock 0 while keeping a copy of the original data (e.g. to compare differences in predictions between the two, or just to keep a copy in case).
  • If we write corrected_prices = prices[0,:] and proceed to edit an entry, e.g. corrected_prices[corrected_prices > 1000] = 1000 because we know Stock 0’s price can’t exceed 1000, we will also edit prices. So make sure you use something like corrected_prices = np.copy(prices[0,:]) or corrected_prices = prices[[0],:]!
  • Takeaway: Basically whenever you want to edit a copy of the data but not the original, use np.copy(). This is the safest way to ensure you actually make a copy. Otherwise a view is fine and saves time and memory.

2. Making copies is 1.5x-2x slower and uses more memory. But this is usually not an issue.

  • Note that np.copy() is not the only way you make a copy.
    • Often copies are implicitly made, e.g. when you do X += 2*Y. (Copies made: 2*Y, X+2*Y.)
    • An in-place alternative would be 
    • That doesn’t mean you should use views whenever you can – there is a tradeoff between performance and how readable your code is. In day-to-day cases, you don’t need to optimise that much.

When you get a view vs a copy

So when do you get a view and when do you get a copy?

ViewCopy
SlicesIndexing, e.g.
Z[0,:]
Fancy indexing, e.g.
Z[[0],:]
(see below for details)
Changing dtype/ W = Z.as_type(np.float32)
Converting to 1D array Z.ravel() Z.flatten()

Fancing indexing is when selection object (the thing you put inside the square brackets [ ]) is a

  • non-tuple sequence object or ndarray
    • e.g. Z[[1,2,3],:] or  A[[1]] ,
  • OR a tuple with at least one sequence object or ndarray
    • e.g. x[(1,2,3),] and x[[1,2,3]] are fancy indexing. 

If we put the above bullet points in a table to make it easier to digest, we have:

IndexIndexing
(view)
Fancy indexing
(copy)
Non-tuple (2D array) Z[1:4,:] Z[[1,2,3],:]
Non-tuple (1D array) A[1] A[[1]]
Tuple A[(1,2,3)] A[[1,2,3]]
A[(1,2,3),]

Fancy indexing returns a copy. If your fancy index is complicated, you may want to keep a copy of it so you can use it again later if needed.

You can find more details as to when something is a view vs a copy in the SciPy Cookbook.

Conclusion

The takeaway is that whenever you want to edit a copy of the data but not the original, use np.copy(). Or fancy indexing like Z[[0],:] if you trust yourself to remember what that is. 😉

I hope this has helped – all the best in your machine learning endeavours!

References: