Introduction to Python

davidbpython.com




Projects, Session 8



PLEASE REMEMBER:

  1. re-read the assignment before submitting
  2. go through the checklist including the tests
  3. make sure your notations are as specified in the homework instructions

All requirements are detailed in the homework instructions document.

Careless omissions will result in reductions to your solution grade.

 
8.1 Notes typing assignment. Please write out this week's transcription notes. The notes are displayed as a .png image named transcription in each week's project files folder.

This does not need to be in a Python program - you can use a simple text file.

 
8.2 Function get_file_sizes(), which takes a directory name as an argument and returns a dictionary of each file's name paired with its size in bytes.

Everything beginning with the directory name and ending with the dictionary should be placed within the function. No other tasks should be handled outside the function (except for calling the function and printing the resulting dictionary, as in the example). Have your function skip over hidden files (starting with '.') and directories (use os.path.isdir() or os.path.isfile()). Please be certain that the function does not refer to any specific folder path. The idea is that the correct relative path will be passed to the function. This will make the function reusable. If you put a specific folder location inside the function, it will only be useful for that folder location.

import os

# your function here - nothing else should appear outside the function,
# except for any needed import statements and my calling code

fsizes = get_file_sizes('../shakespeare')    # should work with any valid folder path

print(fsizes)         # {'twelfth_night.txt': 116758, 'julius_caesar.txt': 118048, ...
                      #   (note that order of your dict may vary)

print(len(fsizes))    # 36 (if your number is different, make sure you are
                      #     skipping hidden files and directories)


tfsizes = get_file_sizes('../testdir')

print(tfsizes)        # {'file2.txt': 752, 'file3.txt': 200, 'file1.txt': 33}

print(len(tfsizes))   # 3

See discussion for further details.


HOMEWORK CHECKLIST: all points are required



    the count of files is 36. Make sure to skip hidden files (the name starts with period) and directories (the filepath is not True when check with os.path.isdir()).

    program does not refer to any part of the name 'test_dir_skip_this' -- in order to avoid a directory, you must test for it

    the function does not refer to any path within the filesystem - instead, have the function call pass the correct relative path to be able to find the directory

    the function does not refer to variables defined outside the function -- double-check!

    The program does not use names like 'dict', 'list', 'set', 'int' or 'str' as these are the names of functions that produce objects of those types.

 
8.3 Directory Grep. (PLEASE MAKE SURE to read the instructions and checklist in full before submitting your solution!) This program takes two command-line arguments (not function arguments): a directory name and a search term. It opens each file in the directory, and shows each line of each file that contains a search term.

Note: this does not need to use a function. By 'arguments' I mean program arguments. Note: the below argument 'shakespeare' assumes that this directory is in the same directory as our pwd. Note: you must trap exceptions for missing arguments or missing directory, but you must witness these exceptions and the line on which they occur before attempting to trap them! Note: you do not need to show Terminal or Command Prompt output. Please just submit the code. if called with no arguments

Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py
Usage:  myprog.py [directory] [search term]

if called with one argument

Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py shakespeare
Usage:  myprog.py [directory] [search term]

if called with bad directory

Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py shaykespeer/ perforce
error:  directory 'shaykespeer' not found
Usage:  myprog.py [directory] [search term]

if called with correct directory and valid search term

Sample program run from the Command Line / Terminal (the $ (sometimes %) represents the command line - on Windows it is >):
$ python myprog.py shakespeare/ perforce

1_king_henry_vi.txt (line 1635):  PLANTAGENET   How I am braved and
must perforce endure it!

2_king_henry_iv.txt (line 410):     To stormy passion, must
perforce decay.

2_king_henry_iv.txt (line 909):     And one against Glendower;
perforce a third

2_king_henry_iv.txt (line 2317):    And these unseason'd hours
perforce must add

2_king_henry_iv.txt (line 2957):    Was force perforce compell'd to
banish him:

... continues for 38 more lines ...

The above output is based on the shakespeare/ directory found in this week's source directory. In the example below, the (line 1635) number is the line number of the 1_king_henry_vi.txt file. keep in mind any relative path should be supplied by the user, i.e. there should be no relative path info inside the program:

If the directory is in the same directory as the script:
$ python myprog.py shakespeare perforce
If the directory is in the parent directory relative to the location of the script:
$ python myprog.py ../shakespeare perforce
If the directory is in a child directory relative to the location of the script:
$ python myprog.py childdir/shakespeare perforce

Not including any path information in the program allows it to be used on any directory in the filesystem - the script does not need to know the relative location as it is supplied by the argument to the program.

  • read command line input for two arguments: a directory name and a search term
  • use os.listdir() to loop through a list of files that can be found in the directory
  • Inside the loop, for each file in the directory, open the file and loop through each line in the file (while still inside the directory listing loop)
  • as you loop through the file lines, count each line
  • as you loop through the file lines, check each line to see if the search term is in the line (use in on the line)
  • if the search term is in the line, print the filename, the line count and the line
  • Exceptions: trap exceptions for missing arguments and for unreadable directory. If either exception is caught, print an error message and exit. You must witness these exceptions before attempting to trap them!

  • Special Note: if (and only if) you see a UnicodeDecodeError, it may be because you attempted to open a nonstandard file - MacOS files sponteneously appear in directories and I'm not always successful in removing them. Make sure to test the file to see if it starts with a period -- if so please skip it with the continue statement (while inside the os.listdir() loop). (If you don't see a UnicodeDecodeError then you do not need to do this.)
    Note as well that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 3 Slides.


    HOMEWORK CHECKLIST: all points are required
        you do not submit the Terminal or Command Prompt output -- instead, just submit the code
        your program does not test for missing arguments or bad directory - it allows try/except to handle the error that occurs

        testing: you have run the program with the sample inputs as shown and are seeing the output exactly as shown (contact me if your output is different and you're unable to adjust to match)

        the path to the file is passed at the command line. The program does not supply any path information.

        make sure the program is not doing the same thing twice (for example, calling os.listdir() twice, once for the exception and once to use it), or subscripting sys.argv more than once.

        allow code placed in try: blocks to do something practical, not just for error checking. For example, use os.listdir() inside a try: block to generate a list of files that the program later uses

        you have witnessed the exception, and exception type and line number, before attempting to trap with try:

        the program responds properly when no arguments are passed at command line, OR WHEN only one argument is passed. it must show the error message in either case

        try: blocks are placed around the minimum number of lines. For example, don't wrap your whole os.listdir() for loop in a try: block - instead, assign os.listdir() to a list variable, and then loop through it outside of the try: block.

        always specify an exception type; never say except Exception: or except:. It must always be except SomeSpecificError (where SomeSpecificError is the specific exception you are expecting)

        there are no extraneous comments or "testing" code lines

        program follows all recommendations in the "Code Quality" handout

     

    EXTRA CREDIT / SUPPLEMENTARY EXERCISES

     
    8.4 (Extra Credit / Supplementary) A watched directory. Accept one argument: the pathname of an existing directory. Create a program that stays running and continually watches the directory to see if any files have been added or removed. The program first lists and stores the files found in the directory at program start, and subsequently "polls" (re-checks) the directory files to see if any changes have been made. The checking can be done every 5 seconds or so.
    To pause the program so it can wait before checking the directory again, use the following code:
    import time   # at top of script with other imports
    time.sleep(5)
    

    To test, keep your program running, and then manually go into the directory and create and remove files to see the program's output reflect the changes. (Note that this can be a bit tricky in timing. Make sure to both add and remove a file to see that both are occurring.) Hint: your program will run for an indefinite period, looping endlessly and checking the contents of the directory continuously. Use a while(True) loop to do this. The loop will not stop looping until the program is terminated using Ctrl-C. Exceptions: trap exceptions for missing argument and unreadable directory. Note that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 3 Slides.

    Expected output (will of course vary based on directory and add/remove events):
    $ python watchdir.py my_files_dr/
    # time passes as the program polls (periodically reads)
    # the directory; at some point, we add two files to the directory
    # (you can do this manually)
    
    file1.txt added
    file2.txt added
    
    # time passes as the program continues polling the directory
    # at some point, we remove one of the files
    
    
    file2.txt removed
    
    # time passes... we add two more files
    
    
    file3.txt added
    file4.txt added
    
    # time passes... we remove one of the files
    
    file3.txt removed
    Special note on debugging: command-line arguments can't be used when running the PyCharm debugger. Instead, you can hard-code your arguments thusly:
    import sys
    
    sys.argv = ['progname.py', 'shakespeare/', 'support']
    

    Now you won't need to enter arguments on the command line as you have re-established them using the sys.argv variable.

     
    8.5 (Extra Credit / Supplementary) Largest files in a directory tree using os.walk. Accept two arguments: the name of an existing directory, and the number of files to display. Use os.walk to go through an entire directory tree (review slide explaining this function). Store the names of a directory's files (the directory is submitted as an argument on the command line) paired with each file's size. Print the number of files requested, sorted by size, largest to smallest. Print out the filenames along with their paths.

    Exceptions: trap exception for missing arguments. Exit with error if the directory is not found or if the 2nd argument is not all digits. (os.walk does not raise an error if the directory is invalid; use os.path.isdir() to check for existence of directory.) please note that in some cases os.walk() will list a file that cannot be read by open(). The reason for this is not clear, but you can avoid the error by simply checking the filepath with os.path.isfile() before using os.path.getsize(). Note that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in the Session 3 Slides.

    (will of course vary based on arguments):
    $ python largestfiles.py '/Users/david/python' 5
    9382   test/test3/file5.txt
    7539   test/test3/test5/test6/file10.txt
    718   test/test2/file3.txt
    86   test/test2/file2.txt
    7   test/test2/test4/file6.txt

    Note that it's usually better to test on a smaller directory tree (2-3 levels deep) before working with a larger directory.

    Special note on debugging: command-line arguments can't be used when running the PyCharm debugger. Instead, you can hard-code your arguments thusly:
    import sys
    
    sys.argv = ['progname.py', 'shakespeare/', 'support']
    

    Now you won't need to enter arguments on the command line as you have re-established them using the sys.argv variable.

     
    [pr]