Python 3

home

[Review] File Reading and Writing

File Mode: Reading, Writing, Appending

The second argument to open() determines the mode


The second argument to open may be 'r', 'w' or 'a'.


fh = open('thisfile.txt', 'r')    # open for reading
                                  # (the 'r' second arg can be omitted, as it is the default)

text = fh.read()                  # read entire file as single string


wfh = open('newfile.txt', 'w')    # open for writing
                                  # (will overwrite an existing file)

newtext = text.replace('this', 'that')   # make a change to the text

wfh.write(newtext)

wfh.close()                       # when writing, must be sure to close the file


afh = open('logfile.txt', 'a')    # open for appending
                                  # (will add to an
                                  #  existing file)

afh.write('this is a log line\n')

afh.close()

Files can be opened for reading, writing or appending. Although they can be opened for reading + writing simultaneously, we do not customarily work with files this way. Instead, we read the entire file into memory, make changes to the data in memory, and then write it back out to a new file (or overwriting the same file). Please also note that it is also possible to read or write a file in 'binary' mode ('rb', 'wb', 'ab') which reads or writes raw bytes to a file (instead of characters, which is our standard approach). Binary mode is suited to non-text-based files, such as sound or image files.





Opening a file with 'with'

A file is automatically closed upon exiting the 'with' block


A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.

with open('myfile.txt', 'w') as wfh:
    wfh.write('this is a line of text\n')
    wfh.write('this is another line')

## at this point (outside the with block), filehandle fh has been closed.

The conventional approach:

wfh = open('myfile.txt', 'w')

wfh.write('this is a line of text\n')
wfh.write('this is another line')

wfh.close()        # explicit close() of the file

Although open files do not block other processes from opening the same file, they do leave a small additional temporary file on the filesystem (called a file descriptor); if many files are left open (especially for long-running processes) these small additional files could accumulate. Therefore, files should be closed as soon as possible.





"Whole File" Parsing

The Line, Word, Character Count


Can we parse and count the lines, words and characters in a file? We will emulate the work of the Unix wc (word count) utility, which does this work. Here's how it works:

  • We call function open() with a filename, returning a file object.
  • We call read() on the file object, returning a string object containing the entire file text.
  • We call splitlines() on the string, returning a list of strings. len() will then tell us the number of lines.
  • We call split() on the same string, returning a list of strings. len() will then tell us the number of words.
  • We call len() on the string to count the number of characters.





    Tabular Data: Slicing and Dicing

    A file can be rendered as a single string or a list of strings. Strings can be split into fields.


    file (TextIOWrapper) object

    # read(): file text as a single strings
    fh = open('students.txt')                 # file object allows reading
    text = fh.read()                          # read() method called on
                                              # file object returns a string
    
    fh.close()                                # close the file
    
    print(text)
    
        # single string, entire text
        # 'id,fname,lname,city,state,tel\njw234,Joe,Wilson,Smithtown,NJ,
        #  2015585894\nms15,Mary,Smith,Wilsontown,NY,5185853892\npk669,
        #  Pete,Krank,Darkling,VA,8044894893'
    
    
    # readlines():  file text as a list of strings
    fh = open('students.txt')
    file_lines = fh.readlines()
    
        # list of strings, each line an item in the list (note newlines)
        # [ 'id,fname,lname,city,state,tel\n',
        #   'jw234,Joe,Wilson,Smithtown,NJ,2015585894\n',
        #   'ms15,Mary,Smith,Wilsontown,NY,5185853892\n',
        #   'pk669,Pete,Krank,Darkling,VA,8044894893'   ]
    
    fh.close()                        # close the file
    
    print(file_lines)                  # list of strings,
                                      # each string a line
    

    string object

    # split():  separate a string into a list of strings
    
    file_text = 'There was a russling of dresses, and the standing
    congregation sat down.\nThe boy whose history this book relates
    did not enjoy the prayer, \nhe only endured it, if he even did
    that much.  He was restive all through it; he \nkept tally of
    the details of the prayer, unconshiously, for he was not...'
    
    elements = mystr.split()      # split entire file on whitespace (spaces or newlines)
    print(elements)
        # ['There', 'was', 'a', 'russling', 'of', 'dresses,', 'and', 'the', 'standing',
        #  'congregation', 'sat', 'down.', 'The', 'boy', 'whose', 'history', 'this', 'book',
        #  'relates', 'did', 'not', 'enjoy', 'the', 'prayer,', 'he', 'only', 'endured', 'it,',
        #  'if', 'he', 'even', 'did', 'that', 'much.', 'He', 'was', 'restive', 'all',
        #  'through', 'it;', 'he', 'kept', 'tally', 'of', 'the', 'details', 'of', 'the',
        #  'prayer,', 'unconshiously,', 'for', 'he', 'was', 'not...']
    
    
    
    # splitlines():  separate a multiline string
    fh = open('students.txt')                # open the file, return
                                             # a file object
    text = fh.read()                         # read the entire file into
                                             # a string
                                             # (of course this includes newlines)
    
    fh.close()
    
    lines = text.splitlines()                # returns a list of strings
                                             # (similar to fh.readlines(),
                                             # except without newlines)
    
    print(lines)
    
        # list of strings, each line an item in the list (note no newlines)
        # [ 'id,fname,lname,city,state,tel',
        #   'jw234,Joe,Wilson,Smithtown,NJ,2015585894',
        #   'ms15,Mary,Smith,Wilsontown,NY,5185853892',
        #   'pk669,Pete,Krank,Darkling,VA,8044894893'   ]
    

    list object

    # subscript a list of lines
    lines = [ 'id,fname,lname,city,state,tel',
              'jw234,Joe,Wilson,Smithtown,NJ,2015585894',
              'ms15,Mary,Smith,Wilsontown,NY,5185853892',
              'pk669,Pete,Krank,Darkling,VA,8044894893'   ]
    
    header = lines[0]           # 'id,fname,lname,city,state,tel'
    last_line = lines[-1]       # 'pk669,Pete,Krank,Darkling,VA,8044894893'
    
    
    # slice a list
    data = lines[1:]            # from first line to end of file
    
    print(data)
    
       # [ 'jw234,Joe,Wilson,Smithtown,NJ,2015585894',
       #   'ms15,Mary,Smith,Wilsontown,NY,5185853892',
       #   'pk669,Pete,Krank,Darkling,VA,8044894893'   ]
    
    
    # get the length of a list of lines (# of lines in a file)
    x = len(lines)               # 4
    




    Summary: File Object

    3 ways to read strings from a file.


    for: read (newline ('\n') marks the end of a line)

    fh = open('students.txt')                 # file object allows looping
                                              # through a series of strings
    for my_file_line in fh:                   # my_file_line is a string
        print(my_file_line)                   # prints each line of students.txt
    
    fh.close()                                # close the file
    

    read(): read entire file as a single string

    fh = open('students.txt')  # file object allows reading
    text = fh.read()                          # read() method called on file
                                              # object returns a string
    fh.close()                                # close the file
    
    print(text)
    

    The above prints:

    jw234,Joe,Wilson,Smithtown,NJ,2015585894
    ms15,Mary,Smith,Wilsontown,NY,5185853892
    pk669,Pete,Krank,Darkling,NJ,8044894893

    readlines(): read as a list of strings (each string a line)

    fh = open('students.txt')
    file_lines = fh.readlines()               # file.readlines() returns
                                              # a list of strings
    fh.close()                                # close the file
    print(file_lines)
    

    The above prints:

    ['jw234,Joe,Wilson,Smithtown,NJ,2015585894\n', 'ms15,Mary,Smith,Wilsontown,
    NY,5185853892\n', 'pk669,Pete,Krank,Darkling,NJ,8044894893\n']




    Summary: String Object

    Strings: 4 ways to manipulate strings from a file.


    split() a string into a list of strings

    mystr = 'jw234,Joe,Wilson,Smithtown,NJ,2015585894'
    elements = mystr.split(',')
    print(elements)                           # ['jw234', 'Joe', 'Wilson',
                                              #  'Smithtown', 'NJ', '2015585894']
    

    (included for completeness): join() a list of strings into a string

    mylist = ['jw234', 'Joe', 'Wilson', 'Smithtown', 'NJ', '2015585894']
    
    line = ','.join(mylist)          # 'jw234,Joe,Wilson,Smithtown,NJ,2015585894'
    

    slice a string

    mystr = '2014-03-13 15:33:00'
    year =  mystr[0:4]               # '2014'
    month = mystr[5:7]               # '03'
    day =   mystr[8:10]              # '13'
    

    rstrip() a string

    xx = 'this is a line with a newline at the end\n'
    
    yy = xx.rstrip()                 # return a new string without the newline
    
    print(yy)                        # 'this is a line with a newline at the end'
    

    splitlines() a multiline string

    fh = open('students.txt') # open the file, return
                                             # a file object
    text = fh.read()                         # read the entire file into a string
                                             # (of course this includes newlines)
    
    lines = text.splitlines()                # returns a list of strings
                                             # (similar to fh.readlines(),
                                             # except without newlines)
    
    fh.close()
    




    [pr]