When I first met Python generators, I found them quite obscure and not easy to understand. I didn't find a clear introduction to them, or maybe I didn't search too much. That's why I wrote this article, to go directly to the gist of those Python beasts.
Python generator definition
A Python generator is:
- a Python function or method
- which acts as an iterator
- which keeps track of when it's called (stateful)
- and returns data to its caller using the yield keyword
A simple example to start
Consider this function:
def generator1(): yield 1 yield 2 yield 3 Calling this function directly simply returns a generator object:
>>> generator1() <generator object generator1 at 0x7fac361d8bf8> The iter and next() methods are automatically implemented:
>>> gen1 = generator1() >>> dir(gen1) ['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__name__', '__ne__', '__new__', '__next__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 'send', 'throw'] and usable:
>>> next(gen1) 1 >>> next(gen1) 2 >>> next(gen1) 3 >>> next(gen1) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration > The StopIteration is returned because the generator function has been called 3 times and there's nothing more to yield.
Being iterable, it's directly callable using built-in functions like list():
>>> gen1 = generator1() >>> list(gen1) [1, 2, 3] >>> list(gen1) [] You can see that after the second call, the generator object has been exhausted and an empty list is returned.
Same with tuple() or set() built-in functions:
>>> gen1 = generator1() >>> tuple(gen1) (1, 2, 3) >>> gen1 = generator1() >>> set(gen1) {1, 2, 3} Of course the for in construct is available here:
gen1 = generator1() # this will print out 1,2,3 for i in gen1: print(i) Moving beyond
The above example was only meant to make you understand the yield mechanism.
We can go beyond, passing parameters to the generator function:
# returns a Fibonacci number < fib_max def fibonacci1(fib_max: int) -> int: # initial values fib_n_2 = 0 fib_n_1 = 1 yield fib_n_2 yield fib_n_1 # now general case fib_n = fib_n_2 + fib_n_1 while fib_n <= fib_max: yield fib_n fib_n_2 = fib_n_1 fib_n_1 = fib_n fib_n = fib_n_2 + fib_n_1 # gives: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987] print(list(fibonacci1(1000))) Note this is NOT a recursive function. It just stops at yield kind of breakpoints, which is both memory and stack efficient.
It's easy to modify the previous function to yield infinite values:
from itertools import islice # returns an infinite sequence of Fibonacci numbers def fibonacci2() -> int: # initial values fib_n_2 = 0 fib_n_1 = 1 yield fib_n_2 yield fib_n_1 # now general case fib_n = fib_n_2 + fib_n_1 while True: yield fib_n fib_n_2 = fib_n_1 fib_n_1 = fib_n fib_n = fib_n_2 + fib_n_1 # gives: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987] print(list(islice(fibonacci2(), 17))) Python generator expressions
Generator expressions are akin to list comprehensions, at least when comparing to the syntax.
They are used to create generator objects with a simple expression rather than a function, but they are less flexible and less powerful:
# first 100 squares squares_gen = (x*x for x in range(100)) # only created here squares = list(squares_gen) They are lazily evaluated, meaning there are executed only when it's necessary.
Hope this helps !
Top comments (2)
Wish I had seen this when I was just starting with Python.
Trying to iterate twice over an exhausted generator got me many times...
Thanks, it was meant exactly for that purpose!