Introduction to Python
davidbpython.com
Projects, Session 8
PLEASE REMEMBER:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
All requirements are detailed in the homework instructions document. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8.1 | Notes typing assignment. Please write out this week's transcription notes. The notes are displayed as a .png image named transcription in each week's project files folder. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This does not need to be in a Python program - you can use a simple text file. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8.2 | Function get_file_sizes(), which takes a directory name as an argument and returns a dictionary of each file's name paired with its size in bytes. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Everything beginning with the directory name and ending with the dictionary should be placed within the function. No other tasks should be handled outside the function (except for calling the function and printing the resulting dictionary, as in the example). Have your function skip over hidden files (starting with '.') and directories (use os.path.isdir() or os.path.isfile()). Please be certain that the function does not refer to any specific folder path. The idea is that the correct relative path will be passed to the function. This will make the function reusable. If you put a specific folder location inside the function, it will only be useful for that folder location. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
import os
# your function here - nothing else should appear outside the function,
# except for any needed import statements and my calling code
fsizes = get_file_sizes('../shakespeare') # should work with any valid folder path
print(fsizes) # {'twelfth_night.txt': 116758, 'julius_caesar.txt': 118048, ...
# (note that order of your dict may vary)
print(len(fsizes)) # 36 (if your number is different, make sure you are
# skipping hidden files and directories)
tfsizes = get_file_sizes('../testdir')
print(tfsizes) # {'file2.txt': 752, 'file3.txt': 200, 'file1.txt': 33}
print(len(tfsizes)) # 3
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
See discussion for further details.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8.3 | Directory Grep. (PLEASE MAKE SURE to read the instructions and checklist in full before submitting your solution!) This program takes two command-line arguments (not function arguments): a directory name and a search term. It opens each file in the directory, and shows each line of each file that contains a search term. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Note: this does not need to use a function. By 'arguments' I mean program arguments. Note: the below argument 'shakespeare' assumes that this directory is in the same directory as our pwd. Note: you must trap exceptions for missing arguments or missing directory, but you must witness these exceptions and the line on which they occur before attempting to trap them! Note: you do not need to show Terminal or Command Prompt output. Please just submit the code. if called with no arguments |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py
Usage: myprog.py [directory] [search term]
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if called with one argument |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py shakespeare
Usage: myprog.py [directory] [search term]
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if called with bad directory |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py shaykespeer/ perforce
error: directory 'shaykespeer' not found
Usage: myprog.py [directory] [search term]
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if called with correct directory and valid search term |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py shakespeare/ perforce
1_king_henry_vi.txt (line 1635): PLANTAGENET How I am braved and
must perforce endure it!
2_king_henry_iv.txt (line 410): To stormy passion, must
perforce decay.
2_king_henry_iv.txt (line 909): And one against Glendower;
perforce a third
2_king_henry_iv.txt (line 2317): And these unseason'd hours
perforce must add
2_king_henry_iv.txt (line 2957): Was force perforce compell'd to
banish him:
... continues for 38 more lines ...
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The above output is based on the shakespeare/ directory found in this week's source directory. In the example below, the (line 1635) number is the line number of the 1_king_henry_vi.txt file. keep in mind any relative path should be supplied by the user, i.e. there should be no relative path info inside the program: |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If the directory is in the same directory as the script:
$ python myprog.py shakespeare perforce
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If the directory is in the parent directory relative to the location of the script:
$ python myprog.py ../shakespeare perforce
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If the directory is in a child directory relative to the location of the script:
$ python myprog.py childdir/shakespeare perforce
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Not including any path information in the program allows it to be used on any directory in the filesystem - the script does not need to know the relative location as it is supplied by the argument to the program. Special Note: if (and only if) you see a UnicodeDecodeError, it may be because you attempted to open a nonstandard file - MacOS files sponteneously appear in directories and I'm not always successful in removing them. Make sure to test the file to see if it starts with a period -- if so please skip it with the continue statement (while inside the os.listdir() loop). (If you don't see a UnicodeDecodeError then you do not need to do this.) Note as well that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 3 Slides. HOMEWORK CHECKLIST: all points are required
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EXTRA CREDIT / SUPPLEMENTARY EXERCISES |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8.4 | (Extra Credit / Supplementary) A watched directory. Accept one argument: the pathname of an existing directory. Create a program that stays running and continually watches the directory to see if any files have been added or removed. The program first lists and stores the files found in the directory at program start, and subsequently "polls" (re-checks) the directory files to see if any changes have been made. The checking can be done every 5 seconds or so. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To pause the program so it can wait before checking the directory again, use the following code:
import time # at top of script with other imports
time.sleep(5)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To test, keep your program running, and then manually go into the directory and create and remove files to see the program's output reflect the changes. (Note that this can be a bit tricky in timing. Make sure to both add and remove a file to see that both are occurring.) Hint: your program will run for an indefinite period, looping endlessly and checking the contents of the directory continuously. Use a while(True) loop to do this. The loop will not stop looping until the program is terminated using Ctrl-C. Exceptions: trap exceptions for missing argument and unreadable directory. Note that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 3 Slides. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Expected output (will of course vary based on directory and add/remove events):
$ python watchdir.py my_files_dr/
# time passes as the program polls (periodically reads)
# the directory; at some point, we add two files to the directory
# (you can do this manually)
file1.txt added
file2.txt added
# time passes as the program continues polling the directory
# at some point, we remove one of the files
file2.txt removed
# time passes... we add two more files
file3.txt added
file4.txt added
# time passes... we remove one of the files
file3.txt removed
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Special note on debugging: command-line arguments can't be used when running the PyCharm debugger. Instead, you can hard-code your arguments thusly:
import sys
sys.argv = ['progname.py', 'shakespeare/', 'support']
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Now you won't need to enter arguments on the command line as you have re-established them using the sys.argv variable. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8.5 | (Extra Credit / Supplementary) Largest files in a directory tree using os.walk. Accept two arguments: the name of an existing directory, and the number of files to display. Use os.walk to go through an entire directory tree (review slide explaining this function). Store the names of a directory's files (the directory is submitted as an argument on the command line) paired with each file's size. Print the number of files requested, sorted by size, largest to smallest. Print out the filenames along with their paths. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Exceptions: trap exception for missing arguments. Exit with error if the directory is not found or if the 2nd argument is not all digits. (os.walk does not raise an error if the directory is invalid; use os.path.isdir() to check for existence of directory.) please note that in some cases os.walk() will list a file that cannot be read by open(). The reason for this is not clear, but you can avoid the error by simply checking the filepath with os.path.isfile() before using os.path.getsize(). Note that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 3 Slides. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
$ python largestfiles.py '/Users/david/python' 5
9382 test/test3/file5.txt
7539 test/test3/test5/test6/file10.txt
718 test/test2/file3.txt
86 test/test2/file2.txt
7 test/test2/test4/file6.txt
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Note that it's usually better to test on a smaller directory tree (2-3 levels deep) before working with a larger directory. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Special note on debugging: command-line arguments can't be used when running the PyCharm debugger. Instead, you can hard-code your arguments thusly:
import sys
sys.argv = ['progname.py', 'shakespeare/', 'support']
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Now you won't need to enter arguments on the command line as you have re-established them using the sys.argv variable. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||