Python 3home |
A generator is like an iterator, but may generate an indefinite number of items.
A generator is a special kind of object that returns a succession of items, one at a time. Unlike functions that create a list of results in memory and then return the entire list (like the range() function in Python 2), generators perform lazy fetching, using up only enough memory to produce one item, returning it, and then proceeding to the next item retrieval. For example, in Python 2 range() produced a list of integers:
import sys; print(sys.version) # 2.7.10
x = range(10)
print(x) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But in Python 3, range() produces a special range() object that can be iterated over to obtain the list:
import sys; print(sys.version) # 3.7.0
x = range(10)
print(x) # range(0,10)
for el in x:
print(el) # 0
# 1
# 2 etc...
It makes sense that range() should use lazy fetching, since most of the time using it we're only interested in iterating over it, one item at a time. (Strictly speaking range() is not a generator, but we can consider its behavior in that context when discussing lazy fetching.) If we do want a list of integers, we can simply pass the object to list():
print(list(range(10))) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Since list() is an explicit call, this draws the reader's attention to the memory being allocated, in line with Python's philosophy of "explicit is better than implicit". Without this explicit call, the memory allocation might not be so clear. A generator comprehension is a list comprehension that uses lazy fetching to produce a generator object, rather than producing an entirely new list:
convert_list = ['THIS', 'IS', 'QUITE', 'UPPER', 'CASE']
lclist = [ x.lower() for x in convert_list ] # list comprehension (square brackets)
gclist = ( x.lower() for x in convert_list ) # generator comprehension (parentheses)
print(lclist) # ['this', 'is', 'quite', 'upper', 'case']
print(gclist) # <generator object <genexpr> at 0x10285e7d0>
We can then iterate over the generator object to retrieve each item in turn. In Python 3, a list comprehension is just "syntactic sugar" for a generator comprehension wrapped in list():
lclist = list(( x.lower() for x in convert_list ))
We can create our own generator functions, which might be necessary if we don't want our list-returning function to produce the entire list in memory.
Writing your own generator function would be useful, or even needed, if: 1) we are designing a list-producing function 2) the items from the list were coming from a generator-like source (for example, calculating prime numbers, or looping through a file and modifying each line of the file) 3) the list coming back from the function was too big to be conveniently held in memory (or too big for memory altogether). The generator function contains a new statement, yield, that returns an item produced by the function but remembers its place in the list-generating process. Here is a simplest version of a generator, containing 3 yield statements:
def return_val():
yield 'hello'
yield 'world'
yield 'wassup'
for msg in return_val():
print(msg, end=' ') # hello world wassup
x = return_val()
print(x) # <generator object return_val at 0x10285e7d0>
As with range() or a generator comprehension, a generator function produces an object that performs lazy fetching. Consider this simulation of the range() function, which generates a sequence of integers starting at 0:
def my_range(max):
x = 0
while x < max:
yield x
x += 1
xr = my_range(5)
print(xr) # <generator object my_range at 0x10285e870>
for val in my_range(5):
print(val) # 0 1 2 3 4
print(list(my_range(5))) # [0, 1, 2, 3, 4]
Generators are particularly useful in producing a sequence of n values, i.e. not a fixed sequence, but an unlimited sequence. In this example we have prepared a generator that generates primes up to the specified limit.
def get_primes(num_max):
""" prime number generator """
candidate = 2
found = []
while True:
if all(candidate % prime != 0 for prime in found):
yield candidate
found.append(candidate)
candidate += 1
if candidate >= num_max:
raise StopIteration
my_iter = get_primes(100)
print(next(my_iter)) # 2
print(next(my_iter)) # 3
print(next(my_iter)) # 5
for i in get_primes(100):
print(i)
A recursive function calls itself until a condition has been reached.
Recursive functions are appropriate for processes that iterate over a structure of an unknown "depth" of items or events. A factorial is the product of a range of numbers (1 * 2 * 3 * 4 ...).
factorial: "linear" approach
def factorial_linear(n):
prod = 1
for i in range(1, n+1):
prod = prod * i
return prod
factorial: "recursive" approach
def factorial(n):
if n < 1: # base case (reached 0): returns
return 1
else:
return_num = n * factorial(n - 1) # recursive call
return return_num
print(factorial(5)) # 120
Recursive functions are appropriate for processes that iterate over a structure of an unknown "depth" of items or events. Such situations could include files within a directory tree, where listing the directory is performed over and over until all directories within a tree are exhausted; or similarly, visiting links to pages within a website, where listing the links in a page is performed repeatedly. Recursion features three items: a recursive call, which is a call by the function to itself; the function process itself; and a base condition, which is the point at which the chain of recursions finally returns. A directory tree is a recursive structure in that requires the same operation (listing files in a directory) to be applied to "nodes" of unknown depth:
Recurse through a directory tree
import os
def list_dir(this_dir):
print('* entering list_dir {} *'.format(this_dir))
for name in os.listdir(this_dir):
pathname = os.path.join(this_dir, name)
if os.path.isdir(pathname):
list_dir(pathname)
else:
print(' ' + name)
print('* leaving list_dir *') # base condition: looping is complete
list_dir('/Users/david/test')
* entering list_dir /Users/david/test * recurse.py * entering list_dir /Users/david/test * file1 file2 * entering list_dir /Users/david/test/test2 * file3 file4 * leaving list_dir * * entering list_dir /Users/david/test/test3 * file5 file6 * leaving list_dir * * leaving list_dir * * entering list_dir /Users/david/test4 * file7 file8 * leaving list_dir * * leaving list_dir *
The function process is the listing of the items in a directory and printing the files. The recursive call is the call to walk(path) at the bottom of the loop -- this is called when the directory looping encounters a directory. The base condition occurs when the file listing is completed. There are no more directories to loop through, so the function call returns.