Python 3

home

Introduction to Python

davidbpython.com

Dictionaries: Aggregations

dict aggregations

A "counting" or "summing" dictionary answers the question "how many of each" or "how much of each".

Aggregations may answer the following questions:

How many students are from each state or country? (count)
How many cars are sold by each automaker? (count)
What is the total $ sales generated by each sales associate? (sum)
What are the total number of hours billed to each client? (sum)

The dict is used to store this information. Each unique key in the dict will be associated with a count or a sum, depending on how many we found in the data source or the sum of values associated with each key in the data source.

building a counting dict

A "counting" dict increments the value associated with each key, and adds keys as new ones are found.

state_count = {}                  # empty dict

fh = open('revenue.csv')

for line in fh:                   # str, "Haddad's,PA,239.50\n"

    items = line.split(',')       # list, ["Haddad's", 'PA', '239.50\n']
    state = items[1]              # str, 'PA'

    if state not in state_count:
        state_count[state] = 0

    state_count[state] = state_count[state] + 1

print(state_count)                # {'PA': 2, 'NJ': 2, 'NY': 3}
fh.close()

as we loop through the file, we look at the 2nd value in the line, the state
we will see several of the states repeated
if the state is not found in the dict, we add it, with a 0 as value
then (whether the state is newly added or not) we increment the value associated with that state
at the end, we have a count of the # of occurrences of each state

Ex. 6.16

building a summing dict

A "summing" dict sums the value associated with each key, and adds keys as new ones are found.

state_sum = {}                  # empty dict

fh = open('revenue.csv')        # 'file' object

for line in fh:                 # str, "Haddad's,PA,239.50\n"

    items = line.split(',')     # ["Haddad's", 'PA', '239.50']
    state = items[1]            # str, 'PA'
    value = float(items[2])     # float, 239.5

    if state not in state_sum:
        state_sum[state] = 0

    state_sum[state] = state_sum[state] + value

print(state_sum)      # {'PA': 263.45, 'NJ': 265.4, 'NY': 133.16}

fh.close()

the summing dictionary is very similar to the counting dict
the only difference is that we are summing values by state rather than counting

dictionary size with len()

len() counts the pairs in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

print(len(mydict))                 # 3 (number of keys in dict)

Ex. 6.21

sidebar: dict .get() method

This method may be used to retrieve a value without checking the dict to see if the key exists.

mydict = {'a': 1, 'b': 2, 'c': 3}

xx = mydict.get('a', 0)          # 1 (key exists so paired value is returned)

yy = mydict.get('zzz', 0)        # 0 (key does not exist so the
                                 #    default value is returned)

.get() works like dict subscripting - given a key, it returns the value for that key
however if the key is missing, .get() return a default value
the second argument to get is the default value
this method is sometimes used to avoid the KeyError exception that occurs when trying to read a nonexistent key

Ex. 6.22

sidebar: obtaining keys in a dict

The .keys() method gives access to the keys in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

these_keys = mydict.keys()

for key in these_keys:
    print(key)

print(list(these_keys))            # ['a', 'c', 'b']

the object returned from .keys() is known as a generator
looping through the object retrieves the keys
in order to see all of them in a list, we must pass the object to the list() function

sidebar: obtaining values in a dict

The .values() method gives access to the values in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

values = list(mydict.values())     # [1, 2, 3]

if 'c' in mydict.values():
    print("'c' was found")

for value in mydict.values():
    print(value)

the values cannot be used to get the keys
however, we might want to check for a value to see if it is present
we might also want to sort or sum the values
(.values() also returns a generator, so we can use list() to retrieve them in one list)

sidebar: using the dict .items() method

.items() gives key/value pairs as 2-item tuples.

mydict = {'a': 1, 'b': 2, 'c': 3}

for key, value in mydict.items():
    print(key, value)               # a 1
                                    # b 2
                                    # c 3

print(list(mydict.items()))         # [('a', 1), ('c', 3), ('b', 2)]

.items() is usually used as another approach for looping through a dict
each item returned is a 2-item tuple, with the first item as the key and the second as the value
when looping with 'for', we can assign the tuple's two items (key and value) to variable names and use them immediately, rather than resorting to subscripting. This is usually easier and it is also more efficient.

Ex. 6.23

sidebar: working with dict items()

dict items() can give us a list of 2-item tuples. dict() can convert this list back to a dictionary.

mydict = {'a': 1, 'b': 2, 'c': 3}
these_items = list(mydict.items())    # [('a', 1), ('c', 3), ('b', 2)]

some_items = these_items[0:3]         # [('a', 1), ('c', 3)]

newdict = dict(some_items)

print(newdict)                        # {'a': 1, 'b': 2}

2-item tuples can be sorted and sliced, so they are a handy alternate structure.

sidebar: converting parallel lists to tuples

zip() zips up parallel lists into tuples; dict() can convert this to dict.

list1 = ['a', 'b', 'c', 'd']
list2 = [ 1,   2,   3,   4 ]

tupes = list(zip(list1, list2))

print(tupes)          # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print(dict(tupes))    # {'a': 1,    'b': 2,   'c': 3,   'd': 4}

Occasionally we are faced with two lists that relate to each other one a 1-to-1 basis... or, we sometimes even shape our data into this form. Paralell lists like these can be zipped into multi-item tuples.

[pr]