Python 3home |
Introduction to Python
davidbpython.com
In this unit we will complete our tour of the core Python data processing features.
So far we have explored the reading and parsing of data; the loading of data into built-in structures; and the aggregation and sorting of these structures. This unit explores advanced tools for container processing. list comprehensions and set comparisons are two "power tools" which can do basic things we have been able to do before -- like looping through a list and doing the same thing to each element in a list, loop through and select items from a list, and compare two collections to see what is common or different between them.
set operations
a = {'a', 'b', 'c'}
b = {'b', 'c', 'd'}
print(a.difference(b)) # {'a'}
print(a.union(b)) # {'a', 'b', 'c', 'd'}
print(a.intersection(b)) # {'b', 'c'}
print(a.symmetric_difference(b)) # {'a', 'd'}
list comprehensions
a = ['hello', 'there', 'harry']
print([ var.upper() for var in a if var.startswith('h') ])
# ['HELLO', 'HARRY']
ternary assignment
rev_sort = True if user_input == 'highest' else False
pos_val = x if x >= 0 else x * -1
conditional assignment
val = this or that # 'this' if this is True else 'that'
val = this and that # 'this' if this is False else 'that'
We have used the set to create a unique collection of objects. The set also allows comparisons of sets of objects. Methods like set.union (complete member list of two or more sets), set.difference (elements found in this set not found in another set) and set.intersection (elements common to both sets) are fast and simple to use.
set_a = {1, 2, 3, 4}
set_b = {3, 4, 5, 6}
print(set_a.union(set_b)) # {1, 2, 3, 4, 5, 6} (set_a + set_b)
print(set_a.difference(set_b)) # {1, 2} (set_a - set_b)
print(set_a.intersection(set_b)) # {3, 4} (what is common between them?)
List comprehensions abbreviate simple loops into one line.
Consider this loop, which filters a list so that it contains only positive integer values:
myints = [0, -1, -5, 7, -33, 18, 19, 55, -100]
myposints = []
for el in myints:
if el > 0:
myposints.append(el)
print(myposints) # [7, 18, 19, 55]
This loop can be replaced with the following one-liner:
myposints = [ el for el in myints if el > 0 ]
See how the looping and test in the first loop are distilled into the one line? The first el is the element that will be added to myposints - list comprehensions automatically build new lists and return them when the looping is done.
The operation is the same, but the order of operations in the syntax is different:
# this is pseudo code
# target list = item for item in source list if test
Hmm, this makes a list comprehension less intuitive than a loop. However, once you learn how to read them, list comprehensions can actually be easier and quicker to read - primarily because they are on one line. This is an example of a filtering list comprehension - it allows some, but not all, elements through to the new list.
Consider this loop, which doubles the value of each value in it:
nums = [1, 2, 3, 4, 5]
dblnums = []
for val in nums:
dblnums.append(val*2)
print(dblnums) # [2, 4, 6, 8, 10]
This loop can be distilled into a list comprehension thusly:
dblnums = [ val * 2 for val in nums ]
This transforming list comprehension transforms each value in the source list before sending it to the target list:
# this is pseudo code
# target list = item transform for item in source list
We can of course combine filtering and transforming:
vals = [0, -1, -5, 7, -33, 18, 19, 55, -100]
doubled_pos_vals = [ i*2 for i in vals if i > 0 ]
print(doubled_pos_vals) # [14, 36, 38, 110]
If they only replace simple loops that we already know how to do, why do we need list comprehensions? As mentioned, once you are comfortable with them, list comprehensions are much easier to read and comprehend than traditional loops. They say in one statement what loops need several statements to say - and reading multiple lines certainly takes more time and focus to understand.
Some common operations can also be accomplished in a single line. In this example, we produce a list of lines from a file, stripped of whitespace:
stripped_lines = [ i.rstrip() for i in open(r'FF_daily.txt').readlines() ]
Here, we're only interested in lines of a file that begin with the desired year (1972):
totals = [ i for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]
If we want the MktRF values (the leftmost floating-point value on each line) for our desired year, we could gather the bare amounts this way:
mktrf_vals = [ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]
And in fact we can do part of an earlier assignment in one line -- the sum of MktRF values for a year:
mktrf_sum = sum([ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ])
From experience I can tell you that familiarity with these forms make it very easy to construct and also to decode them very quickly - much more quickly than a 4-6 line loop.
Remember that dictionaries can be expressed as a list of 2-element tuples, converted using items(). Such a list of 2-element tuples can be converted back to a dictionary with dict():
mydict = {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': 1, 'f': 4}
my_items = list(mydict.items()) # my_items is now [('a',5), ('b',0), ('c',-3), ('d',2), ('e',1), ('f',4)]
mydict2 = dict(my_items) # mydict2 is now {'a':5, 'b':0, 'c':-3, 'd':2, 'e':1, 'f':4}
It becomes very easy to filter or transform a dictionary using this structure. Here, we're filtering a dictionary by value - accepting only those pairs whose value is larger than 0:
mydict = {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': -22, 'f': 4}
filtered_dict = dict([ (i, j) for (i, j) in mydict.items() if j > 0 ])
Here we're switching the keys and values in a dictionary, and assigning the resulting dict back to mydict, thus seeming to change it in-place:
mydict = dict([ (j, i) for (i, j) in mydict.items() ])
The Python database module returns database results as tuples. Here we're pulling two of three values returned from each row and folding them into a dictionary.
# 'tuple_db_results' simulates what a database returns
tuple_db_results = [
('joe', 22, 'clerk'),
('pete', 34, 'salesman'),
('mary', 25, 'manager'),
]
names_jobs = dict([ (name, role) for name, age, role in tuple_db_results ])