Python 3home |
Introduction to Python
davidbpython.com
This technique forms the core of much of what we do.
In order to work with data, the usual steps are:
We call this process Extract-Transform-Load, or ETL. ETL is at the heart of what core Python does best.
This "summary algorithm" is very similar to building a float sum from a file source.
build a list of company names
company_list = [] # initialize an empty list
fh = open('revenue.csv') # 'file' object
for line in fh: # str, 'Haddad's,PA,239.50'
elements = line.split(':') # list, ["Haddad's", 'PA', '239.50']
company_list.append(elements[0]) # add the name for this row
# to company_list
print(company_list) # list, ["Haddad's", 'Westfield', 'The Store', "Hipster's",
# 'Dothraki Fashions', "Awful's", 'The Clothiers']
fh.close()
Just as we did when counting lines of a file or summing up values, we can use a 'for' loop over a file to collect values.
This "summary algorithm" uses a set collect unique items from repeating data.
state_set = set() # initialize an empty list
fh = open('revenue.csv') # 'file' object
for line in fh: # str, 'Haddad's,PA,239.50'
elements = line.split(':') # list, ["Haddad's", 'PA', '239.50']
state_set.add(elements[1]) # add the state for this row
# to state_set
print(state_set) # set, {'PA', 'NY', 'NJ'} (your order may be different)
chosen_state = input('enter a state: ')
if chosen_state in state_set:
print('that state was found in the file')
else:
print('that state was not found')
fh.close()
Data files can be rendered as lists of lines, and slicing can manipulate them holistically rather than by using a counter.
fh = open('student_db.txt')
file_lines_list = fh.readlines() # a list of lines in the file
print(file_lines_list)
# [ "id:address:city:state:zip",
# "jk43:23 Marfield Lane:Plainview:NY:10023",
# "ZXE99:315 W. 115th Street, Apt. 11B:New York:NY:10027",
# "jab44:23 Rivington Street, Apt. 3R:New York:NY:10002" ... (list continues) ]
wanted_lines = file_lines_list[1:] # take all but 1st element
# (i.e., 1st line)
for line in wanted_lines:
print(line.rstrip()) # jk43:23 Marfield Lane:
# Plainview:NY:10023
# axe99:315 W. 115th Street,
# Apt. 11B:New York:NY:10027
# jab44:23 Rivington Street,
# Apt. 3R:New York:NY:10002
# etc.
fh.close()
Once we have read a file as a single string, we can "chop it up" any way we like.
# read(): file text as a single strings
fh = open('guido.txt') # 'file' object
text = fh.read() # read() method called on
# file object returns a string
fh.close() # close the file
print(text)
print(len(text)) # 207 (number of characters in the file)
# single string, entire text:
# 'For three months I did my day job, \nand at night and
# whenever I got a \nchance I kept working on Python. \n
# After three months I was to the \npoint where I could
# tell people, \n"Look here, this is what I built."'
String .split() on a whole file string returns a list of words.
file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built." """
words = file_text.split() # split entire file on whitespace (spaces or newlines)
print(words)
# ['For', 'three', 'months', 'I', 'did', 'my', 'day', 'job,',
# 'and', 'at', 'night', 'and', 'whenever', 'I', 'got', 'a',
# 'chance', 'I', 'kept', 'working', 'on', 'Python.', 'After',
# 'three', 'months', 'I', 'was', 'to', 'the', 'point', 'where',
# 'I', 'could', 'tell', 'people,', '“Look', 'here,', 'this',
# 'is', 'what', 'I', 'built.”']
print(len(words)) # 42 (number of words in the file)
String .splitlines() will split any string on the newlines, delivering a list of lines from the file.
file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built."" """
lines = file_text.splitlines()
print(lines)
# ['For three months I did my day job, ', 'and at night and whenever I got a ',
# 'chance I kept working on Python. ', 'After three months I was to the ',
# 'point where I could tell people, ', '“Look here, this is what I built.”']
print(len(lines)) # 6 (number of lines in the file)
for: read (newline ('\n') marks the end of a line)
fh = open('students.txt') # file object allows looping
# through a series of strings
for my_file_line in fh: # my_file_line is a string
print(my_file_line) # prints each line of students.txt
fh.close() # close the file
read(): read entire file as a single string
fh = open('students.txt') # file object allows reading
text = fh.read() # read() method called on file
# object returns a string
fh.close() # close the file
print(text) # entire text as a single string
readlines(): read as a list of strings (each string a line)
fh = open('students.txt')
file_lines = fh.readlines() # file.readlines() returns
# a list of strings
fh.close() # close the file
print(file_lines) # entire text as a list of lines
We don't have call to write to a file in this course, but it's important to know how
wfh = open('newfile.txt', 'w') # open for writing
# (will overwrite an existing file)
wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')
wfh.close()
This function allows us to iterate over an integer sequence.
counter = range(10)
for i in counter:
print(i) # prints integers 0 through 9
for i in range(3, 8): # prints integers 3 through 7
print(i)
If we need an literal list of integers, we can simply pass the iterable to a list:
intlist = list(range(5))
print(intlist) # [0, 1, 2, 3, 4]