Introduction to Python
davidbpython.com
Project Discussion, Session 5
5.1 | Notes typing assignment. Please write out this week's transcription notes. |
5.2 | Filepaths Exercises. |
First, you must not use an absolute path (like those that start with a C:\ or other drive letter, or those that start with a slash (/). These can be used to correctly locate a file, but we are here to learn how to construct relative paths). A relative path is one that locates a file relative to the location of the script. Therefore, we must first consider the location of the script within the filepath. There are four relative locations to consider:
|
|
Troubleshooting:
|
|
5.3 | Lookup Dictionary. Reading file states.csv (see the file in this session's source data), build a dict of pairs with each state's name as key and the abbreviation as the value (for example, New York as key and NY as value). Then, read user input for a state name. If the state name is a key in the dict, display the abbreviation for that state. |
The program uses a 2-digit state name (through input()) and looks up the state name in the dict to retrieve and print the state's abbreviation. If the input name is not a state listed in the dict, the program prints "no state found with the name [state name]" (where [state name] is the input name) |
|
Sample program runs:
there are 50 pairs in the lookup dict please enter a state name: California CA |
|
there are 50 pairs in the lookup dict please enter a state name: New York NY |
|
there are 50 pairs in the lookup dict please enter a state name: Oman no state found with name "Oman" |
|
|
|
|
|
5.4 | Lookup Dictionary with try/except (notations not required for this solution). Replace the 'if state not in dict' language (i.e., checking to see if the user's state name is a key in the dict) with a try/except. So your solution will not check for the key ahead of time, and will instead trap the exception when it occurs. |
First, remove the 'if state not in dict' block. We will not test to see if the key is in the dict. Next, run the code and give the program a bad state name, i.e. one that does not exist in the dictionary. Next, identify two things:
|
|
Next, wrap the try: block around only the line where the exception is expected, and no other lines. You must not include lines that aren't related to the exception line -- this is an important best practice. (Sometimes you may need to wrap lines that are part of a block, for example 'for' blocks.) Next, follow the try: block with an except:, and specify the exception that was raised in your earlier test. Inside the except: block, print the same error message you had used earlier to signal that the state name key was not found in the dictionary. |
|
5.5 | Ranking. Reading cities_green_space.csv, build a dictionary that pairs city name keys with "pct" float values. |
{'Amsterdam': 13.0, 'Austin': 10.0, 'Barcelona': 28.0, 'Bogotá': 4.9,
'Brussels': 18.8, 'Buenos Aires': 9.4, 'Cape Town': 24.0, 'Chengdu': 42.3,
'Dublin': 26.0, 'Edinburgh': 49.2, 'Guangzhou': 19.78, 'Helsinki': 40.0,
'Hong Kong': 40.0, 'Istanbul': 2.2, 'Johannesburg': 24.0, 'Lisbon': 18.0,
'London': 33.0, 'Los Angeles': 34.7, 'Melbourne': 9.3, 'Milan': 13.74,
'Montréal': 12.82, 'Moscow': 18.0, 'Nanjing': 40.67, 'New York': 27.0,
'Oslo': 68.0, 'Paris': 10.0, 'Rome': 38.9, 'San Francisco': 13.0,
'Seoul': 27.91, 'Shanghai': 16.2, 'Shenzhen': 40.9, 'Singapore': 47.0,
'Stockholm': 40.0, 'Sydney': 46.0, 'Taipei': 6.56, 'Tokyo': 7.5,
'Toronto': 13.0, 'Vienna': 50.0, 'Warsaw': 17.0, 'Zürich': 41.0}
|
|
Next, print city name and its pct value. |
|
Expected Output:
Cities Ranked by Greenspace (% of total area) Oslo 68.0 Vienna 50.0 Edinburgh 49.2 Singapore 47.0 Sydney 46.0 Chengdu 42.3 Zürich 41.0 Shenzhen 40.9 Nanjing 40.67 Helsinki 40.0 Hong Kong 40.0 Stockholm 40.0 Rome 38.9 Los Angeles 34.7 London 33.0 Barcelona 28.0 Seoul 27.91 New York 27.0 Dublin 26.0 Cape Town 24.0 Johannesburg 24.0 Guangzhou 19.78 Brussels 18.8 Lisbon 18.0 Moscow 18.0 Warsaw 17.0 Shanghai 16.2 Milan 13.74 Amsterdam 13.0 San Francisco 13.0 Toronto 13.0 Montréal 12.82 Austin 10.0 Paris 10.0 Buenos Aires 9.4 Melbourne 9.3 Tokyo 7.5 Taipei 6.56 Bogotá 4.9 Istanbul 2.2 |
|
|
|
|
|
{'Amsterdam': 13.0, 'Austin': 10.0, 'Barcelona': 28.0, 'Bogotá': 4.9, 'Brussels': 18.8, 'Buenos Aires': 9.4, 'Cape Town': 24.0, 'Chengdu': 42.3, 'Dublin': 26.0, 'Edinburgh': 49.2, 'Guangzhou': 19.78, 'Helsinki': 40.0, 'Hong Kong': 40.0, 'Istanbul': 2.2, 'Johannesburg': 24.0, 'Lisbon': 18.0, 'London': 33.0, 'Los Angeles': 34.7, 'Melbourne': 9.3, 'Milan': 13.74, 'Montréal': 12.82, 'Moscow': 18.0, 'Nanjing': 40.67, 'New York': 27.0, 'Oslo': 68.0, 'Paris': 10.0, 'Rome': 38.9, 'San Francisco': 13.0, 'Seoul': 27.91, 'Shanghai': 16.2, 'Shenzhen': 40.9, 'Singapore': 47.0, 'Stockholm': 40.0, 'Sydney': 46.0, 'Taipei': 6.56, 'Tokyo': 7.5, 'Toronto': 13.0, 'Vienna': 50.0, 'Warsaw': 17.0, 'Zürich': 41.0} |
|
Sort the dict keys. Call sorted() passing the dict as argument, including the special key= argument for sorting a dict by value and reverse=True. This should return a list of dict keys sorted by value. Test this step: your sorted list should show the companies ordered by the revenue value, high to low: |
|
['Oslo', 'Vienna', 'Edinburgh', 'Singapore', 'Sydney', 'Chengdu', 'Zürich', 'Shenzhen', 'Nanjing', 'Helsinki', 'Hong Kong', 'Stockholm', 'Rome', 'Los Angeles', 'London', 'Barcelona', 'Seoul', 'New York', 'Dublin', 'Cape Town', 'Johannesburg', 'Guangzhou', 'Brussels', 'Lisbon', 'Moscow', 'Warsaw', 'Shanghai', 'Milan', 'Amsterdam', 'San Francisco', 'Toronto', 'Montréal', 'Austin', 'Paris', 'Buenos Aires', 'Melbourne', 'Tokyo', 'Taipei', 'Bogotá', 'Istanbul'] |
|
Loop through the dict keys and print the keys and values. Look for an example of this in the in-class exercises or the slides. Although you are looping through the list, you can use each of the strings in the list to obtain the value for that string, by subscripting the dict with the string. Your output should match that shown in the homework assignment. |
|
5.6 | (Extra credit.) Summing dictionary. Reading FF_abbreviated.txt, build a dictionary that sums all of the Mkt-RF values (the 2nd value, or leftmost float value) associated with each year. Sort the dictionary's keys by value and print each key and corresponding value, so that the values sort ascending. Do not loop through the data more than once. |
Sample program run:
1926: 3.39 1928: 3.88 1927: 4.67 |
|
Discussion This project builds a "summing dictionary" which calculates a separate sum of Mkt-RF values (the 2nd value, or leftmost float value) for each year in the Fama-French file. We call this kind of grouping of values under a unique key (i.e., each year) an "aggregation". In this case, the year will be the unique key in the dict, and the value for that key will be a sum of all Mkt-RF values (first float column) found for that year. An aggregation is a powerful and very common analysis technique. In aggregations in other data sets we might sum up total revenue by client, count the number of births by city, calculate average salary by gender, etc. Step-by-Step. Once we have loaded the dictionary with the sum of the Mkt-RF values for each year, we will: 1. Build the dict
|
|
|
|
For example, if we had just these lines in the file (note the years in each line):
19260701 1.0 0.22 0.30 0.009 19260702 2.0 0.35 0.08 0.009 19270103 10.0 0.21 0.24 0.010 19270104 20.0 0.15 0.73 0.010 |
|
Then by the end, the dict would have these keys and values:
{'1926': 3.0, '1927': 30.0}
|
|
To understand this result, note that the 1926 floats total 3.0 and the 1927 floats total 30.0. So you can see that the dictionary can be used to sum up values for each year. We use the key to indicate which total we're summing, and the value for that key to hold the sum. How can a dictionary be used to sum up values under a particular key? A summing dict checks to see if the year is in the dict. If it is not, it adds the year and float value from the line to the dict. But if it is already in the dict, it merely adds the float value from the line to the value for the year that is already in the dict. Here is a line-by-line breakdown of this concept, again considering the 4-line file above: |
|
When the for loops reads the the first line:
19260701 1.0 0.22 0.30 0.009 |
|
The program checks to see if '1926' is a key in the dict, using the in operator (if year not in dict). Since it is not (the dict is currently empty) the program adds the key 1926 and value 1.0 to the dictionary. |
|
At the end of the first iteration, then, the dict will contain:
{ '1926': 1.0 } |
|
When the for loop reads the second line:
19260702 2.0 0.35 0.08 0.009 |
|
The program checks to see if '1926' is a key in the dict. Since it is already in the dict (we added it in the previous iteration) we add 2.0 to 1.0 to make 3.0. This value 3.0 is then associated with the same year key. In other words, we're replacing the original value for '1926' (1.0) with a new summed value (3.0). |
|
The operative code line for adding new value to an existing value in the dict is this:
sumdict[year] = sumdict[year] + mkt_rf # here, mkt_rf is the
# current Mkt-RF value
|
|
Basically, the above says: "let the value for 1926 in the dict be associated with the current value for 1926 (1.0) plus the current value from this line (2.0)". So, as you loop, you will need to check the dict ahead of time to see if the year key is already there. If not, set the key and value in the dict for that line. |
|
if year not in sumdict:
sumdict[year] = mktrf
else:
sumdict[year] = sumdict[year] + mkt_rf # here, mkt_rf is the
# current Mkt-RF value
|
|
In the next iteration, the year is 1927. This key is not in the dict, so we will add it to the dict with the current float value (10.0). The same process happens for the two lines with 1927 as they did for 1926. |
|
The built (but unsorted) dict will look like this:
{'1926': 3.39, '1927': 4.67, '1928': 3.88} |
|
|
|
1926 1928 1927 |
|
To print the year keys and summed values, loop through the sorted list of years and print the year; and then use that year to get the value for that key from the dictionary. |
|