Not using iteritems()
to iterate over a large dictionary in Python 2¶
PEP 234 defines iteration interface for objects. It also states it has significant impact on performance of dict iteration.
Note
This anti-pattern only applies to Python versions 2.x. In Python 3.x items()
returns an iterator (consequently, iteritems()
and Python 2’s iterative range()
function, xrange()
, have been removed from Python 3.x).
Anti-pattern¶
The code below defines one large dictionary (created with dictionary comprehension) that generates large amounts of data. When using items()
method, the iteration needs to be completed and stored in-memory before for
loop can begin iterating. The prefered way is to use iteritems
. This uses (~1.6GB).
d = {i: i * 2 for i in xrange(10000000)}
# Slow and memory hungry.
for key, value in d.items():
print("{0} = {1}".format(key, value))
Best-practice¶
Use iteritems()
to iterate over large dictionary¶
The updated code below uses iteritems()
instead of items()
method. Note how the code is exactly the same, but memory usage is 50% less (~800MB). This is the preferred way to iterate over large dictionaries.
d = {i: i * 2 for i in xrange(10000000)}
# Memory efficient.
for key, value in d.iteritems():
print("{0} = {1}".format(key, value))