Python 3home |
The second argument to open() determines the mode
The second argument to open may be 'r', 'w' or 'a'.
fh = open('thisfile.txt', 'r') # open for reading
# (the 'r' second arg can be omitted, as it is the default)
text = fh.read() # read entire file as single string
wfh = open('newfile.txt', 'w') # open for writing
# (will overwrite an existing file)
newtext = text.replace('this', 'that') # make a change to the text
wfh.write(newtext)
wfh.close() # when writing, must be sure to close the file
afh = open('logfile.txt', 'a') # open for appending
# (will add to an
# existing file)
afh.write('this is a log line\n')
afh.close()
Files can be opened for reading, writing or appending. Although they can be opened for reading + writing simultaneously, we do not customarily work with files this way. Instead, we read the entire file into memory, make changes to the data in memory, and then write it back out to a new file (or overwriting the same file). Please also note that it is also possible to read or write a file in 'binary' mode ('rb', 'wb', 'ab') which reads or writes raw bytes to a file (instead of characters, which is our standard approach). Binary mode is suited to non-text-based files, such as sound or image files.
A file is automatically closed upon exiting the 'with' block
A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.
with open('myfile.txt', 'w') as wfh:
wfh.write('this is a line of text\n')
wfh.write('this is another line')
## at this point (outside the with block), filehandle fh has been closed.
The conventional approach:
wfh = open('myfile.txt', 'w')
wfh.write('this is a line of text\n')
wfh.write('this is another line')
wfh.close() # explicit close() of the file
Although open files do not block other processes from opening the same file, they do leave a small additional temporary file on the filesystem (called a file descriptor); if many files are left open (especially for long-running processes) these small additional files could accumulate. Therefore, files should be closed as soon as possible.
The Line, Word, Character Count
Can we parse and count the lines, words and characters in a file? We will emulate the work of the Unix wc (word count) utility, which does this work. Here's how it works:
A file can be rendered as a single string or a list of strings. Strings can be split into fields.
file (TextIOWrapper) object
# read(): file text as a single strings
fh = open('students.txt') # file object allows reading
text = fh.read() # read() method called on
# file object returns a string
fh.close() # close the file
print(text)
# single string, entire text
# 'id,fname,lname,city,state,tel\njw234,Joe,Wilson,Smithtown,NJ,
# 2015585894\nms15,Mary,Smith,Wilsontown,NY,5185853892\npk669,
# Pete,Krank,Darkling,VA,8044894893'
# readlines(): file text as a list of strings
fh = open('students.txt')
file_lines = fh.readlines()
# list of strings, each line an item in the list (note newlines)
# [ 'id,fname,lname,city,state,tel\n',
# 'jw234,Joe,Wilson,Smithtown,NJ,2015585894\n',
# 'ms15,Mary,Smith,Wilsontown,NY,5185853892\n',
# 'pk669,Pete,Krank,Darkling,VA,8044894893' ]
fh.close() # close the file
print(file_lines) # list of strings,
# each string a line
string object
# split(): separate a string into a list of strings
file_text = 'There was a russling of dresses, and the standing
congregation sat down.\nThe boy whose history this book relates
did not enjoy the prayer, \nhe only endured it, if he even did
that much. He was restive all through it; he \nkept tally of
the details of the prayer, unconshiously, for he was not...'
elements = mystr.split() # split entire file on whitespace (spaces or newlines)
print(elements)
# ['There', 'was', 'a', 'russling', 'of', 'dresses,', 'and', 'the', 'standing',
# 'congregation', 'sat', 'down.', 'The', 'boy', 'whose', 'history', 'this', 'book',
# 'relates', 'did', 'not', 'enjoy', 'the', 'prayer,', 'he', 'only', 'endured', 'it,',
# 'if', 'he', 'even', 'did', 'that', 'much.', 'He', 'was', 'restive', 'all',
# 'through', 'it;', 'he', 'kept', 'tally', 'of', 'the', 'details', 'of', 'the',
# 'prayer,', 'unconshiously,', 'for', 'he', 'was', 'not...']
# splitlines(): separate a multiline string
fh = open('students.txt') # open the file, return
# a file object
text = fh.read() # read the entire file into
# a string
# (of course this includes newlines)
fh.close()
lines = text.splitlines() # returns a list of strings
# (similar to fh.readlines(),
# except without newlines)
print(lines)
# list of strings, each line an item in the list (note no newlines)
# [ 'id,fname,lname,city,state,tel',
# 'jw234,Joe,Wilson,Smithtown,NJ,2015585894',
# 'ms15,Mary,Smith,Wilsontown,NY,5185853892',
# 'pk669,Pete,Krank,Darkling,VA,8044894893' ]
list object
# subscript a list of lines
lines = [ 'id,fname,lname,city,state,tel',
'jw234,Joe,Wilson,Smithtown,NJ,2015585894',
'ms15,Mary,Smith,Wilsontown,NY,5185853892',
'pk669,Pete,Krank,Darkling,VA,8044894893' ]
header = lines[0] # 'id,fname,lname,city,state,tel'
last_line = lines[-1] # 'pk669,Pete,Krank,Darkling,VA,8044894893'
# slice a list
data = lines[1:] # from first line to end of file
print(data)
# [ 'jw234,Joe,Wilson,Smithtown,NJ,2015585894',
# 'ms15,Mary,Smith,Wilsontown,NY,5185853892',
# 'pk669,Pete,Krank,Darkling,VA,8044894893' ]
# get the length of a list of lines (# of lines in a file)
x = len(lines) # 4
3 ways to read strings from a file.
for: read (newline ('\n') marks the end of a line)
fh = open('students.txt') # file object allows looping
# through a series of strings
for my_file_line in fh: # my_file_line is a string
print(my_file_line) # prints each line of students.txt
fh.close() # close the file
read(): read entire file as a single string
fh = open('students.txt') # file object allows reading
text = fh.read() # read() method called on file
# object returns a string
fh.close() # close the file
print(text)
The above prints:
jw234,Joe,Wilson,Smithtown,NJ,2015585894 ms15,Mary,Smith,Wilsontown,NY,5185853892 pk669,Pete,Krank,Darkling,NJ,8044894893
readlines(): read as a list of strings (each string a line)
fh = open('students.txt')
file_lines = fh.readlines() # file.readlines() returns
# a list of strings
fh.close() # close the file
print(file_lines)
The above prints:
['jw234,Joe,Wilson,Smithtown,NJ,2015585894\n', 'ms15,Mary,Smith,Wilsontown, NY,5185853892\n', 'pk669,Pete,Krank,Darkling,NJ,8044894893\n']
Strings: 4 ways to manipulate strings from a file.
split() a string into a list of strings
mystr = 'jw234,Joe,Wilson,Smithtown,NJ,2015585894'
elements = mystr.split(',')
print(elements) # ['jw234', 'Joe', 'Wilson',
# 'Smithtown', 'NJ', '2015585894']
(included for completeness): join() a list of strings into a string
mylist = ['jw234', 'Joe', 'Wilson', 'Smithtown', 'NJ', '2015585894']
line = ','.join(mylist) # 'jw234,Joe,Wilson,Smithtown,NJ,2015585894'
slice a string
mystr = '2014-03-13 15:33:00'
year = mystr[0:4] # '2014'
month = mystr[5:7] # '03'
day = mystr[8:10] # '13'
rstrip() a string
xx = 'this is a line with a newline at the end\n'
yy = xx.rstrip() # return a new string without the newline
print(yy) # 'this is a line with a newline at the end'
splitlines() a multiline string
fh = open('students.txt') # open the file, return
# a file object
text = fh.read() # read the entire file into a string
# (of course this includes newlines)
lines = text.splitlines() # returns a list of strings
# (similar to fh.readlines(),
# except without newlines)
fh.close()