Python 3

home

List Processing

Identifying Iterables

An iterable is anything that can be looped over (for example, with 'for').

The most familiar iterables are containers. A container is an object that contains other objects -- so iterating over a container means having access to each of its items in turn. Built-In Containers

list
dict
set
tuple

Besides containers there are a number of objects that are considered iterable:

Generators -- these functions generate iterable objects

range()
enumerate()
dict .keys()
dict .values()
dict .items()
zip()

Other iterable objects

"file" (TextIOWrapper returned from open())
csv.reader
many other "exotic" objects offered by various modules -- iteration is a central part of what we do in many situations.

Identifying Iterating Statements, Operators or Functions

These would be any function or statement that is designed to loop over each item in an iterable.

Statements and Operators

for
list comprehension, dict and set comprehensions
in

Functions

sorted()
sum(), max(), min()
list(), set(), tuple(), dict()
map() and filter()

"Summary" Functions sorted(), sum(), max(), min()

These functions take any iterable to perform their operations.

The sorted() function takes any sequence as argument and returns a list of the elements sorted by numeric or string value.

x = {1.8, 0.9, 15.2, 3.5, 2}

y = sorted(x)                       # [0.9, 1.8, 2, 3.5, 15.2]

Irregardless of the sequence passed to sorted(), a list is returned.

Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?

mylist = [1, 3, 5, 7, 9]        # initialize a list
mytup = (99, 98, 95.3)          # initialize a tuple
myset = {2.8, 2.9, 1.7, 3.8}        # initialize a set

print(len(mylist))                   # 5
print(sum(mytup))                    # 292.3 sum of values in mytup
print(min(mylist))                   # 1 smallest value in mylist
print(max(myset))                    # 3.8 largest value in myset

Generator Functions

As of Python 3, several built-in features return iterable objects rather than lists.

dict .keys() and dict .values() dict .items() zip() with a dict's items range(): generate an integer sequence The range() function takes one, two or three arguments and produces an iterable that returns a sequence of integers.

counter = range(10)
for i in counter:
    print(i)                        # prints integers 0 through 9

for i in range(3, 8):               # prints integers 3 through 7
    print(i)

If we need an literal list of integers, we can simply pass the iterable to a list:

intlist = list(range(5))
print(intlist)                      # [0, 1, 2, 3, 4]

enumerate(): generate an integer count while looping through an iterable enumerate() saves us from having to set a separate integer counter.

passing an iterable to enumerate() produces a generator that delivers 2-item tuples (count, item) starting at 0 and incrementing with each iteration:

mylist = ['a', 'b', 'c']

for count, item in enumerate(mylist):
    print(count, item)


           # 1 a
           # 2 b
           # 3 c

"Constructor" Functions list(), tuple(), set(), dict()

These convert any iterable to a container type.

Just consider that in order to build a container with items, a function has to iterate through the items of an iterable in order to add the item to the container. So any iterable (container, generator, other iterable) can be passed to a constructor to produce a container with the items from that iterable.

convert between containers

mylist = ['a', 'a', 'b', 'b', 'c', 'c']

myset = set(mylist)          # set, {'b', 'a', 'c'}


mytuple = ('a', 'b', 'c')

mylist = list(mytuple)       # list, ['a', 'b', 'c']

make a list out of a generator or iterator

myrange = range(1, 11)

print(myrange)            # range(1, 11)

mynums = list(myrange)    # list, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


mydict = {'a': 1, 'b': 2, 'c': 3}

keys = mydict.keys()                 # dict_keys(['a', 'b', 'c'])

keys_list = list(mydict.keys())      # list, ['a', 'b', 'c']



fh = open('myfile.txt')

lines = list(fh)          # a list of strings, each string a line from the file

List comprehensions: filtering a container's elements

List comprehensions abbreviate simple loops into one line.

Consider this loop, which filters a list so that it contains only positive integer values:

myints = [0, -1, -5, 7, -33, 18, 19, 55, -100]
myposints = []
for el in myints:
    if el > 0:
        myposints.append(el)

print(myposints)                   # [7, 18, 19, 55]

This loop can be replaced with the following one-liner:

myposints = [ el for el in myints if el > 0 ]

See how the looping and test in the first loop are distilled into the one line? The first el is the element that will be added to myposints - list comprehensions automatically build new lists and return them when the looping is done.

The operation is the same, but the order of operations in the syntax is different:

# this is pseudo code
# target list = item for item in source list if test

Hmm, this makes a list comprehension less intuitive than a loop. However, once you learn how to read them, list comprehensions can actually be easier and quicker to read - primarily because they are on one line. This is an example of a filtering list comprehension - it allows some, but not all, elements through to the new list.

List comprehensions: transforming a container's elements

Consider this loop, which doubles the value of each value in it:

nums = [1, 2, 3, 4, 5]
dblnums = []
for val in nums:
    dblnums.append(val*2)

print(dblnums)                          # [2, 4, 6, 8, 10]

This loop can be distilled into a list comprehension thusly:

dblnums = [ val * 2 for val in nums ]

This transforming list comprehension transforms each value in the source list before sending it to the target list:

# this is pseudo code
# target list = item transform for item in source list

We can of course combine filtering and transforming:

vals = [0, -1, -5, 7, -33, 18, 19, 55, -100]
doubled_pos_vals = [ i*2 for i in vals if i > 0 ]
print(doubled_pos_vals)                # [14, 36, 38, 110]

List comprehensions: examples

If they only replace simple loops that we already know how to do, why do we need list comprehensions? As mentioned, once you are comfortable with them, list comprehensions are much easier to read and comprehend than traditional loops. They say in one statement what loops need several statements to say - and reading multiple lines certainly takes more time and focus to understand.

Some common operations can also be accomplished in a single line. In this example, we produce a list of lines from a file, stripped of whitespace:

stripped_lines = [ i.rstrip() for i in open('FF_daily.txt').readlines() ]

Here, we're only interested in lines of a file that begin with the desired year (1972):

totals = [ i for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]

If we want the MktRF values for our desired year, we could gather the bare amounts this way:

mktrf_vals = [ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]

And in fact we can do part of an earlier assignment in one line -- the sum of MktRF values for a year:

mktrf_sum = sum([ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ])

From experience I can tell you that familiarity with these forms make it very easy to construct and also to understand them very quickly - much more quickly than a 4-6 line loop.

dict .items(), dict() and zip()

A dict can also be expressed as a list of 2-item tuples.

The dict .items() method produces a list of 2-item tuples:

mydict =  {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': 1, 'f': 4}

my_items = mydict.items()      # list, [('a',5), ('b',0), ('c',-3), ('d',2), ('e',1), ('f',4)]

Such a list of 2-element tuples can be converted back to a dictionary with dict()

my_items = [('a',5), ('b',0), ('c',-3), ('d',2), ('e',1), ('f',4)]

mydict2 = dict(my_items)       # dict, {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': 1, 'f': 4}

It becomes very easy to filter or transform a dictionary using this structure. Here, we're filtering a dictionary by value - accepting only those pairs whose value is larger than 0:

mydict = {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': -22, 'f': 4}
filtered_dict = dict([ (i, j) for (i, j) in mydict.items() if j > 0 ])

Here we're switching the keys and values in a dictionary, and assigning the resulting dict back to mydict, thus seeming to change it in-place:

mydict = dict([ (j, i) for (i, j) in list(mydict.items()) ])

The Python database module returns database results as tuples. Here we're pulling two of three values returned from each row and folding them into a dictionary.

# 'tuple_db_results' simulates what a database returns
tuple_db_results = [
    ('joe', 22, 'clerk'),
    ('pete', 34, 'salesman'),
    ('mary', 25, 'manager'),
]

names_jobs = dict([ (name, role) for name, age, role in tuple_db_results ])

Of course the same can be done with a dict comprehension, which in a sense is just another way of working with dict():

names_jobs = { name: role for name, age, role in tuple_db_results }

"Every Other" List Processing

Sometimes a list is "embedded" with pairs.

Consider this list, which is really a list of pairs run straight together in a list -- this may occur in certain situations involving files or other external formats:

p = ['a', 1, 'b', 2, 'c', 3, 'd', 4]

The list of pairs really ought to be converted to a dict, but how? A "step" slice can extract every other item, so we can pull the keys and values separately:

p = ['a', 1, 'b', 2, 'c', 3, 'd', 4]

keys = p[0::2]
values = p[1::2]

tuples = zip(keys, values)

print(tuples)  # [('a', 1), ('b', 2), ('c', 3)]

d = dict(tuples)

Set Comparisons

The set allows comparisons of sets of objects.

These methods answer such questions as "what is in one set that is not in the other?" and "what is common between the sets?"; in other words, membership tests between two sets.

set_a = set([1, 2, 3, 4])
set_b = set([3, 4, 5, 6])


# set of items contained in both
print(set_a.union(set_b))         # set([1, 2, 3, 4, 5, 6])  (set_a + set_b)


# what is in one set that is not in another
print(set_a.difference(set_b))    # set([1, 2]) (set_a - set_b)


# what is common between sets
print(set_a.intersection(set_b))  # set([3, 4])  (what is common between them?)

[pr]