Introduction to Python
davidbpython.com
Projects, Session 4
PLEASE REMEMBER:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
All requirements are detailed in the homework instructions document. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NOTE ON OPENING FILES This week we learned about absolute and relative paths: to find a file using a relative path, we must know a) the location of the file, and b) the location from which we are running our script (the "present working directory") to determine c) the path needed to access the file location from the pwd. If Python can't find your file, it may be because the relative path is incorrect. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If the file you want to open is in the same directory as the script you're executing, use the filename alone:
fh = open('filename.txt')
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If the file you want to open is in the parent directory from the script you're executing, use the filename with ../:
fh = open('../filename.txt')
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If the file you want to open is in a child directory from the script you're executing, use the filename with the child directory name prepended:
fh = open('<childdir>/filename.txt')
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Replace <childdir> with the name of the child directory.) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.1 | Notes typing assignment. Please write out this week's transcription notes. The notes are displayed as an image named transcription in each week's project files folder. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This does not need to be in a Python program - you can use a simple text file. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.2 | Filepaths Exercises. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
As usual, returned solutions will lose points. It is recommended to confirm (through testing) that your answer is correct before submitting to ensure that you will receive credit. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Start with the below file tree, which is available in this week's data folder:
dir1 ├── file1.txt ├── test1.py │ ├── dir2a │ ├── file2a.txt │ ├── test2a.py │ │ │ └── dir3a │ ├── file3a.txt │ ├── test3a.py │ │ │ └── dir4 │ ├── file4.txt │ └── test4.py └── dir2b ├── file2b.txt ├── test2b.py │ └── dir3b ├── file3b.txt └── test3b.py |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To complete this assignment, please open and edit each of the below 5 .py scripts in the tree so that they open the noted .txt files (please do not move, copy or recreate any of the files -- they must be modified and run where they are located in the tree):
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Your job is to fill in the relative filepath (i.e. not starting with C:\Users or /Users) needed to open the indicated file in the open(r'') function call in each script.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
######## test4.py: read file4.txt ########
fh = open('file4.txt') # this path has been completed for you
######## test2b.py: read file3b.txt ########
fh = open('') # add relative filepath here to open file3b.txt
######## test1.py: read file3a.txt ########
fh = open('') # add relative filepath here to open file3a.txt
######## test2a.py: read file1.txt ########
fh = open('') # add relative filepath here to open file1.txt
######## test3a.py: read file1.txt ########
fh = open('') # add relative filepath here to open file1.txt
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
See "Filepaths for Locating Files" slide deck this session for a discussion to assist in completing this assignment. Send me any questions you may have. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.3 | Take user input for a 4-digit year and exit if incorrect. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The program can use an if/else: exit() if the value is bad (not 4 chars and not alldigits), and if good print 'Input validated' and the 4-digit input. Use a compound test: if not 4 characters or not all digits, then exit. SPECIAL NOTE: Please do not use a while True: loop. Here let's work with if/else. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 234a
sorry, must be 4 digits [ program exits here ]
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 234
sorry, must be 4 digits [ program exits here ]
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 2349
input validated: 2349
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Note that even if the year isn't a real year, the program still validates it - the test is len() of 4, and all digits. HOMEWORK CHECKLIST: all points are required
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.4 | Compile a list of Mkt-RF float values (the 2nd column, or leftmost float values) for a given year from the file FF_tiny.txt. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Start with a 4-digit string year (for example, '1927') assigned to a string variable:
year = '1927'
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Initialize an empty "collector" list.
mktrf_list = []
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Then looping through the FF_tiny.txt file, collect a list of MktRF values (the 2nd column, or leftmost float values) for that year. (Note the below output is from FF_tiny.txt) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample run (with year == '1927')
[0.97, 0.3, 0.0, 0.72] |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample run (with year == '1926')
[0.09, 0.44] |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample run (with year == '1928')
[0.43, 0.14, 0.71] |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Next, please set the year to '9999' and make sure that the program prints out a empty list. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample run (with year = '9999')
[] |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
When we combine the programs later, this empty list will be the key to determining if the user may have input a year that doesn't exist in the data. Note that if Python can't find your file, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 4 Slides. HOMEWORK CHECKLIST: all points are required
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.5 | Calculate sum, count, max, min, average and (optionally) median from a list of values. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Note: this assignment does not read from the file, loop, or gather values - it only uses the hard-coded values shown below. We will later combine this program with the data gathering solution to create a complete program. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Start with this sample list of values from 1927:
user_year = '1927'
mktrf_list = [0.97, 0.30, 0.00, 0.72]
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Use the "summary functions" we looked at this week to calculate the sum, count, max, min and average. Round the average to 2 decimal places. Print the year along with the results that you calculated using the summary functions: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
1927 (Mkt-RF): 4 values, max 0.97, min 0.0, avg 0.5 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Next, please test your program with a different list of values: there is no need to write out the logic again or create a new program - simply change the below two variables and test the code that you wrote for the above. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reassign the below two variables with values from 1928:
user_year = '1928' mktrf_list = [0.43, 0.14, 0.71] |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Print the year along with the results that you calculated using the summary functions: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
1928 (Mkt-RF): 3 values, max 0.71, min 0.14, avg 0.43 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Extra credit / supplementary: calculate the median. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
mktrf_1927 = [0.97, 0.30, 0.00, 0.72] # median is 0.51
# (average of 0.30 and 0.72)
mktrf_1928 = [0.43, 0.14, 0.71] # median is 0.43 (middle value in sorted list)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You must not hard-code the index of the middle value(s)! This index must be calculated. HOMEWORK CHECKLIST: all points are required
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.6 | PASTE IN THE CODE, ONE AFTER THE OTHER for the previous three assignments into one complete program, and alter variable names so that they work together. (You'll also remove the hard-coded year from the 'for' loop section, and the hard-coded list of values from the average calculation section.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
As we did last week, please paste in the 3 above solutions into a new program. Please do not combine or "nest" any code, as the solutions can work together without being mixed together -- they should follow one another. Remove the hard-coded values (as well as the 'input validated' message) and adjust the names of the variables so that the 3 code blocks can work together. For example, the year value taken in the user input section should be the same variable name as the year used in the data gathering section. This time, use the FF_data.txt file. Ensure that the program works as shown in the examples below: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 196
sorry, must be 4 digits
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: help
sorry, must be 4 digits
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 1926
1926 (Mkt-RF): 150 values, max 1.48, min -1.69, avg 0.05
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 1972
1972 (Mkt-RF): 251 values, max 1.38, min -1.45, avg 0.05
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 1993
1993 (Mkt-RF): 253 values, max 1.56, min -2.7, avg 0.03
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
year not found in data (please note special approach to this error) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
please enter a 4-digit year: 9999
no values found
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Special note on "no values found" The one original addition in this "combined" solution is the test that will announce "no values for found for year YYYY" (where YYYY is the user's year). You will determine this by testing the length of the collector list. If after going through the entire file no year matched, the list will be empty. Test to see if the length of the list is 0 -- if so, print the message and exit. Special note on input testing with if/else |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Some students (rather intuitively, I think) employ this logic:
take input if input is 4 characters and all digits: print('input validated') # read file and calculate results (whole rest of program) else: exit('input bad') |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
However, you are requested NOT TO PUT YOUR ENTIRE PROGRAM IN AN if OR else BLOCK. The reason for this has to do with our desire to separate steps in our code so that they are independent. If you put the whole program in the if you are creating a dependent relationship that is not needed. You are also pushing the else all the way to the bottom where it is hard to see the connection to the if. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You can resolve this issue by handling the input validation logic first, then moving on.
take input if input is 4 characters and all digits: print('input validated') else: exit('input bad') # read file and calculate results (whole rest of program) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Or even shorter and cleaner (and thus much better), use a negative test:
take input if input is not 4 characters or input is not all digits: exit('input bad') # read file and calculate results (whole rest of program) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No 'else' is required using this logic. Less is more! HOMEWORK CHECKLIST: all points are required
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.7 | Show unique years in FF_data.txt: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reading from the dates in the left-hand column of FF_data.txt, compile a sorted list of unique 4-digit years. Use a set() to build the collection of years, then use sorted() to sort them into a list. Print the list and the number of years found in the list. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To initialize an empty set, you must use the set() function:
unique_years = set()
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Empty curly braces would signify a dict. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Expected Output (please note this is abbreviated - we should see all years in the list below)
['1926', '1927', '1928' ... '2010', '2011', '2012'] 87 unique years found |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HOMEWORK CHECKLIST: all points are required
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EXTRA CREDIT / SUPPLEMENTARY EXERCISES |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.8 | (Extra credit / supplementary.) Repeating and enhancing the float-collecting solution (the main program for this week), complete one or both of the following: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Enhancement A: use the full FF_Research_Data_Factors_daily.txt file as data source.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program runs:
please enter a 4-digit year: 1926
1926 (Mkt-RF): 150 values, max 1.53, min -1.83, avg 0.05
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
please enter a 4-digit year: 2018
2018 (Mkt-RF): 82 values, max 2.67, min -4.03, avg 0.0
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Troubleshooting: if your 1926 and 2018 values are off, it may be because you have sliced the wrong number of lines. Note that if Python can't find your file, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 4 Slides. Enhancement B: choose column to sum. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program runs:
enter a 4-digit year: 1900
enter a factor to process: Mkt-RF
no values found
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
enter a 4-digit year: 1972
enter a factor to process: Mkt-RF
1972 (Mkt-RF): 251 values, avg 0.0486055776892
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
enter a 4-digit year: 2001
enter a factor to process: SMB
2001 (SMB): 248 values, avg [your calculated value here]
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
enter a 4-digit year: 2001
enter a factor to process: XXL
Sorry, that factor does not exist.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HOWEVER, you must not repeat code in your solution! Instead, select an index based on factor location and use that index when selecting the value from the split list. (Note that the restriction not to repeat code does not mean that you can't repeat code from the for-credit solution -- it simply means to not repeat code within any one solution.) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.9 | (Extra credit / supplementary.) wc emulation: wc is a unix utility program that counts the number of lines, words and characters in a given file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reading from file sawyer.txt (found in the source data directory linked from the class website), print the number of lines, words and characters in the file. Note: when counting characters, include spaces and newlines as well. Please do not use a 'for' loop and do not open the file more than once. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
please enter a filename: sawyer.txt
20 lines
270 words
1440 characters
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Special note: we have seen anomalies that stem from varying line endings on the Windows and Mac platforms. You may get a count that is off mine by 1 or 2 lines, or up to 20 characters. If so, don't worry. It just needs to be within 20. Challenge 1: please do not open and read the file more than once. Challenge 2: you will be tempted to loop and count to get your answer, but you are challenged to get these counts without looping. See if you can get each count without actually using a loop, but by using len() on various "slice and dices" in the data. Do this without opening the file more than once (hint: read() will read the file into a string; split() will split a string into words; splitlines() will split a string on the newlines!). Note that if Python can't find your file, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 4 Slides. Extra credit / supplementary: attempt to mirror the format of the original wc unix utility by right-justifying each value within an 8-character width (you can use the str method .rjust(), or an f'' string with :>10 inside the token -- see f'' strings in the 'object methods' slides from Session 2). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
please enter a filename: sawyer.txt
20 270 1435 sawyer.txt
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HOMEWORK CHECKLIST: all points are required
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4.10 | (Extra credit / supplementary.) Spell checker: check each word in a file against a set of spelling words. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run:
25225 words in spelling words (note this count will be 25185 if words are lowercased before adding) misspelled word on line 1: russling misspelled word on line 4: unconshiously misspelled word on line 6: interlarded misspelled word on line 8: minst misspelled word on line 14: coattails misspelled word on line 16: hhe misspelled word on line 18: sentense misspelled word on line 20: akt |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The spelling words should be loaded into a set, but the sawyer.txt words should not be loaded into a container (list or set) -- we should simply loop through the file line-by-line, and then for each line inside the loop, split the line into words and, still inside the for loop, loop through each word word-by-word. We can simply report the misspelled word; we do not need to add the word to another list or other structure. Note that if Python can't find your file, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 4 Slides. HOMEWORK CHECKLIST: all points are required
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||