Python 3

home

Introduction to Python

davidbpython.com

Dictionaries: Aggregations

dict aggregations

A "counting" or "summing" dictionary answers the question "how many of each" or "how much of each".

Aggregations may answer the following questions:

how many students are from each state or country? (count)
how many cars are sold by each automaker? (count)
what is the total $ sales generated by each sales associate? (sum)
what are the total number of hours billed to each client? (sum)

The dict is used to store this information. Each unique key in the dict will be associated with a count or a sum, depending on how many we found in the data source or the sum of values associated with each key in the data source.

building a counting dict

A "counting" dict increments the value associated with each key, and adds keys as new ones are found.

Customarily we loop through data, using the dictionary to keep a tally as we encounter items.

state_count = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')       # ["Haddad's", 'PA', '239.50']
    state = items[1]              # str, 'PA'

    if state not in state_count:
        state_count[state] = 0

    state_count[state] = state_count[state] + 1


print(state_count)                # {'PA': 2, 'NJ': 2, 'NY': 3}

print("here is the count of states from revenue.csv:  ")
for state in state_count:
    print(f"{state}:  {state_count[state]} occurrences")

print("here is the count for 'NY':  ")
print(state_count['NY'])                   # 3

fh.close()

building a summing dict

A "summing" dict sums the value associated with each key, and adds keys as new ones are found.

As with a counting dict, we loop through data, using the dictionary to keep a tally as we encounter items.

state_sum = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')          # ["Haddad's", 'PA', '239.50']
    state = items[1]                 # str, 'PA'
    value = float(items[2])          # float, 239.5

    if state not in state_sum:
        state_sum[state] = 0

    state_sum[state] = state_sum    [state] + value


print(state_sum)      # {'PA': 263.45, 'NJ': 265.4, 'NY': 133.16}

print("here is the sum for 'NY':  ")
print(state_sum['NY'])                 # 133.16

fh.close()

dictionary size with len()

len() counts the pairs in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

print(len(mydict))                 # 3 (number of keys in dict)

sidebar: dict .get() method

This method may be used to retrieve a value without checking the dict to see if the key exists.

mydict = {'a': 1, 'b': 2, 'c': 3}

xx = mydict.get('a', 0)          # 1 (key exists so paired value is returned)

yy = mydict.get('zzz', 0)        # 0 (key does not exist so the
                                 #    default value is returned)

You may use any value as the default. This method is sometimes used as an alternative to testing for a key in a dict before reading it -- avoiding the KeyError exception that occurs when trying to read a nonexistent key.

sidebar: obtaining keys of a dict

The .keys() method gives access to the keys in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

these_keys = mydict.keys()

for key in these_keys:
    print(key)

print(list(these_keys))            # ['a', 'c', 'b']

sidebar: obtaining values of a dict

The .values() method gives views on the dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

values = list(mydict.values())     # [1, 2, 3]

if 'c' in mydict.values():
    print("'c' was found")

for value in mydict.values():
    print(value)

The values cannot be used to get the keys - it's a one-way lookup from the keys. However, we might want to check for membership in the values, or sort or sum the values, or some other less-used approach.

sidebar: using the dict .items() method

.items() gives key/value pairs as 2-item tuples.

mydict = {'a': 1, 'b': 2, 'c': 3}

print(list(mydict.items()))         # [('a', 1), ('c', 3), ('b', 2)]

for key, value in mydict.items():
    print(key, value)               # a 1
                                    # b 2
                                    # c 3

.items() is usually used as another approach for looping through a dict. With each iteration for 'for', or each item when converted to a list, we see a 2-item tuple. The first item is a key, and the second a value. When looping with 'for', since each iteration produces a 2-item (key/value) tuple, we can assign the key and value to variable names and use them immediately, rather than resorting to subscripting. This is usually easier and it is also more efficient.

sidebar: working with dict items()

dict items() can give us a list of 2-item tuples. dict() can convert this list back to a dictionary.

mydict = {'a': 1, 'b': 2, 'c': 3}
these_items = list(mydict.items())    # [('a', 1), ('c', 3), ('b', 2)]

newdict = dict(these_items)

print(newdict)                        # {'a': 1, 'b': 2, 'c': 3}

2-item tuples can be sorted and sliced, so they are a handy alternate structure.

sidebar: converting parallel lists to tuples

zip() zips up parallel lists into tuples; dict() can convert this to dict.

list1 = ['a', 'b', 'c', 'd']
list2 = [ 1,   2,   3,   4 ]

tupes = list(zip(list1, list2))

print(tupes)          # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print(dict(tupes))    # {'a': 1,    'b': 2,   'c': 3,   'd': 4}

Occasionally we are faced with two lists that relate to each other one a 1-to-1 basis... or, we sometimes even shape our data into this form. Paralell lists like these can be zipped into multi-item tuples.

[pr]