Introduction to Python

davidbpython.com




Project Discussion, Session 8



8.1 Notes typing assignment. Please write out this week's transcription notes.
 
8.2 Function get_file_sizes(), which takes a directory name as an argument and returns a dictionary of each file's name paired with its size in bytes.

  • The function takes the directory pathname as argument. The function shouldn't have to qualify this directory in any way (for example, by adding a '../' to the path). If the directory being searched is in a different directory than the current one, the function should be called with that path included -- for example, if 'shakespeare' is in the parent directory, the call should be get_file_sizes('../shakespeare')
  • Initialize an empty dictionary to hold the filenames and file sizes
  • Use os.listdir() to list the contents of the directory. This function returns a list.
  • Next, loop through the list.
  • Inside the 'for' loop, you'll first want to skip over 'hidden' files - these are files that start with a period. If the filename starts with a period, use continue to skip to the next item.
  • Also inside the 'for' loop, you'll need to join the filename to the directory name to produce a path to the file.
  • You'll also want to skip over any files that are not actually files, but directories. I used not os.path.isfile() to ask whether the entry is not a file and not a directory, and continue to jump to the top of the loop and the next filename in the list. (Keep in mind that you can't check .isdir() against the filename, it needs to be filepath so that the os can find the directory you're talking about. Since os.path.isdir() returns True or False, if you check it with only the directory name or file name it will simply look for that in the current directory, not find it, and just return False.)
  • Next, you can use os.path.getsize() to get the size of the file.
  • Finally, you can add a simple key/value pair to the dictionary: the key should be the filename (not the full path) and the value should be the size of the file.
  • There is no need to check to see if the key is in the dict - every filename should be unique, so you can just add the pair without doing any checks on the dict.
  • After the 'for' loop is complete, return the dict from the function.

Please let me know your questions!

 
8.3 Directory Grep.

The program does require a loop within a loop -- the outer loop reads the name of each file in a directory; the inner loop reads each line in each file: Note: this does not need to use a function. The program takes command-line arguments.

# read command-line arguments for directory and search term
# list the files in the directory
  #  (use try/except to trap exceptions for missing argument and bad directory)

# loop through each file in listing of files in directory
  # skip if the filename starts with a '.'
  # skip if the file is actually a directory
  # open the file
  # set a counter to 0
  # loop through each line in the file
    # if the search term is in the line,
      # print the filename, line number, file line


Special Note: if (and only if) you see a UnicodeDecodeError, it may be because you attempted to open a nonstandard file - MacOS files sponteneously appear in directories and I'm not always successful in removing them. Make sure to test the file to see if it starts with a period -- if so please skip it with the continue statement (while inside the os.listdir() loop). (If you don't see a UnicodeDecodeError then you do not need to do this.)
Note that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in Session 3 Slides.

 
8.4 A watched directory. The purpose of the assignment is to reinforce file lists and use sets to notice differences. Since set.difference() can tell us what's in one set that isn't in the other, it would be pretty simple to be able to say whether a file was added or removed by comparing the contents of the directory, listed out at different times.

The overall approach contemplated here is to list the files in the directory and store them in a set; wait 5 seconds; then list the files into a set again, and compare the sets. You'll do oldset.difference(newset) to detect file deletions and newset.difference(oldset) to detect file additions. Putting this process into a while(True) loop allows this process to go on indefinitely. Since the while(True) loop has no natural way to break out, the program requires the use of Ctrl-C to exit. The logic of the while(True) loop can be done in different order, perhaps, but here is the logic outline that worked for me:

# test sys.argv to affirm that dir argument has been passed

# list files into set ("old" set)

# start while True: loop

    # sleep 5 seconds
    # list files into set ("new" set)
    # compare sets to see what files have been added
    # compare sets to see what files have been removed
    # report any added or removed files

    # assign "new" set to "old" set variable for next loop
      comparison

Note that if Python can't find your directory, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in Session 3 Slides.

 
8.5 Largest files in a directory tree using os.walk. This script will be pretty close to the os.walk example in the slides. You simply need to join the directory to the filename to get the filepath, and then use os.path.getsize to get the file's size. Initialize an empty dictionary before the loop begins, store the filepath as the key and size as the value in the dictionary, and after the loop ends, sort the dictionary by value using key=dictname.get (where dictname is the name of your dictionary). Limit the results by using a slice subscript which can be placed directly after sorted().

Note that if Python can't find your file, it may be because the relative path is incorrect. Please see "Filepaths for Locating Files" in Session 3 Slides.

 
[pr]