If you had to write a script to check whether a person had registered for an event, what Python data structure would you use?
It turns out that looking up items in a Python dictionary is much faster than looking up items in a Python list. How much faster? Suppose you want to check if 1000 items (needles) are in a dataset (haystack) with items. If you search through 10 million items, using a dict or set is over 100,000x faster than using a list!
Then why not always use dictionaries? Looking up entries in Python dictionaries is fast, but dicts use a lot of memory.* This is a classic example of a space-time tradeoff.
(*Note: This is a much smaller problem when you are only checking whether keys (items) are present. E.g. to store 10 million floats, a dict uses 4.12x the memory of a list. According to Ramalho, it’s nested dictionaries that can really be a problem. So maybe you should use dicts much more often!
Update: From Python 3.6, dictionaries don’t use that much space. So it’s not even a space-time tradeoff any more.)
Why is looking up entries in a dictionary so much faster? It’s because of the way Python implements dictionaries using hash tables. Dictionaries are Python’s built-in mapping type and so have also been highly optimised. Sets are implemented in a similar way.
In the coming posts, we will look more closely at how Python implements dictionaries and sets, and how Python implements lists. Knowing how Python implements these data structures can help you pick the most suitable data structure for your applications and can really deepen your understanding of the language, since these are the building blocks you’ll use all the time.
- Fluent Python by Luciano Ramalho, Chapter 3: Dictionaries and Sets