Python 3

home

Introduction to Python

davidbpython.com




Dictionaries: Aggregations

dict aggregations

A "counting" or "summing" dictionary answers the question "how many of each" or "how much of each".


Aggregations may answer the following questions:


The dict is used to store this information. Each unique key in the dict will be associated with a count or a sum, depending on how many we found in the data source or the sum of values associated with each key in the data source.





building a counting dict

A "counting" dict increments the value associated with each key, and adds keys as new ones are found.


Customarily we loop through data, using the dictionary to keep a tally as we encounter items.


state_count = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')       # ["Haddad's", 'PA', '239.50']
    state = items[1]              # str, 'PA'

    if state not in state_count:
        state_count[state] = 0

    state_count[state] = state_count[state] + 1


print(state_count)                # {'PA': 2, 'NJ': 2, 'NY': 3}

print("here is the count of states from revenue.csv:  ")
for state in state_count:
    print(f"{state}:  {state_count[state]} occurrences")

print("here is the count for 'NY':  ")
print(state_count['NY'])                   # 3

fh.close()




building a summing dict

A "summing" dict sums the value associated with each key, and adds keys as new ones are found.


As with a counting dict, we loop through data, using the dictionary to keep a tally as we encounter items.


state_sum = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')          # ["Haddad's", 'PA', '239.50']
    state = items[1]                 # str, 'PA'
    value = float(items[2])          # float, 239.5

    if state not in state_sum:
        state_sum[state] = 0

    state_sum[state] = state_sum    [state] + value


print(state_sum)      # {'PA': 263.45, 'NJ': 265.4, 'NY': 133.16}

print("here is the sum for 'NY':  ")
print(state_sum['NY'])                 # 133.16

fh.close()




dictionary size with len()

len() counts the pairs in a dict.


mydict = {'a': 1, 'b': 2, 'c': 3}

print(len(mydict))                 # 3 (number of keys in dict)




sidebar: dict .get() method

This method may be used to retrieve a value without checking the dict to see if the key exists.


mydict = {'a': 1, 'b': 2, 'c': 3}

xx = mydict.get('a', 0)          # 1 (key exists so paired value is returned)

yy = mydict.get('zzz', 0)        # 0 (key does not exist so the
                                 #    default value is returned)

You may use any value as the default. This method is sometimes used as an alternative to testing for a key in a dict before reading it -- avoiding the KeyError exception that occurs when trying to read a nonexistent key.





sidebar: obtaining keys of a dict

The .keys() method gives access to the keys in a dict.


mydict = {'a': 1, 'b': 2, 'c': 3}

these_keys = mydict.keys()

for key in these_keys:
    print(key)

print(list(these_keys))            # ['a', 'c', 'b']




sidebar: obtaining values of a dict

The .values() method gives views on the dict.


mydict = {'a': 1, 'b': 2, 'c': 3}

values = list(mydict.values())     # [1, 2, 3]

if 'c' in mydict.values():
    print("'c' was found")

for value in mydict.values():
    print(value)

The values cannot be used to get the keys - it's a one-way lookup from the keys. However, we might want to check for membership in the values, or sort or sum the values, or some other less-used approach.





sidebar: using the dict .items() method

.items() gives key/value pairs as 2-item tuples.


mydict = {'a': 1, 'b': 2, 'c': 3}

print(list(mydict.items()))         # [('a', 1), ('c', 3), ('b', 2)]

for key, value in mydict.items():
    print(key, value)               # a 1
                                    # b 2
                                    # c 3

.items() is usually used as another approach for looping through a dict. With each iteration for 'for', or each item when converted to a list, we see a 2-item tuple. The first item is a key, and the second a value. When looping with 'for', since each iteration produces a 2-item (key/value) tuple, we can assign the key and value to variable names and use them immediately, rather than resorting to subscripting. This is usually easier and it is also more efficient.





sidebar: working with dict items()

dict items() can give us a list of 2-item tuples. dict() can convert this list back to a dictionary.


mydict = {'a': 1, 'b': 2, 'c': 3}
these_items = list(mydict.items())    # [('a', 1), ('c', 3), ('b', 2)]

newdict = dict(these_items)

print(newdict)                        # {'a': 1, 'b': 2, 'c': 3}

2-item tuples can be sorted and sliced, so they are a handy alternate structure.





sidebar: converting parallel lists to tuples

zip() zips up parallel lists into tuples; dict() can convert this to dict.


list1 = ['a', 'b', 'c', 'd']
list2 = [ 1,   2,   3,   4 ]

tupes = list(zip(list1, list2))

print(tupes)          # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print(dict(tupes))    # {'a': 1,    'b': 2,   'c': 3,   'd': 4}

Occasionally we are faced with two lists that relate to each other one a 1-to-1 basis... or, we sometimes even shape our data into this form. Paralell lists like these can be zipped into multi-item tuples.





[pr]