Python 3

home

Introduction to Python

davidbpython.com




Building Up Containers from File

introduction: building up containers from file

This technique forms the core of much of what we do.


In order to work with data, the usual steps are:


We call this process Extract-Transform-Load, or ETL. ETL is at the heart of what core Python does best.





looping through a data source and building up a list

This "summary algorithm" is very similar to building a float sum from a file source.


build a list of company names

company_list = []                             # initialize an empty list
fh = open('revenue.csv')                      # 'file' object

for line in fh:                               # str, 'Haddad's,PA,239.50'

    elements = line.split(':')                # list, ["Haddad's", 'PA', '239.50']
    company_list.append(elements[0])          # add the name for this row
                                              # to company_list

print(company_list)       # list, ["Haddad's", 'Westfield', 'The Store', "Hipster's",
                          #        'Dothraki Fashions', "Awful's", 'The Clothiers']

fh.close()

Just as we did when counting lines of a file or summing up values, we can use a 'for' loop over a file to collect values.





looping through a data source and building up a unique set

This "summary algorithm" uses a set collect unique items from repeating data.


state_set = set()                       # initialize an empty list
fh = open('revenue.csv')                # 'file' object

for line in fh:                         # str, 'Haddad's,PA,239.50'

    elements = line.split(':')          # list, ["Haddad's", 'PA', '239.50']
    state_set.add(elements[1])          # add the state for this row
                                        # to state_set

print(state_set)       # set, {'PA', 'NY', 'NJ'}   (your order may be different)

chosen_state = input('enter a state:  ')

if chosen_state in state_set:
   print('that state was found in the file')
else:
    print('that state was not found')

fh.close()





treating a file as a list

Data files can be rendered as lists of lines, and slicing can manipulate them holistically rather than by using a counter.


fh = open('student_db.txt')
file_lines_list = fh.readlines()          # a list of lines in the file
print(file_lines_list)
      # [ "id:address:city:state:zip",
      #   "jk43:23 Marfield Lane:Plainview:NY:10023",
      #   "ZXE99:315 W. 115th Street, Apt. 11B:New York:NY:10027",
      #   "jab44:23 Rivington Street, Apt. 3R:New York:NY:10002" ... (list continues) ]

wanted_lines = file_lines_list[1:]        # take all but 1st element
                                          # (i.e., 1st line)
for line in wanted_lines:
    print(line.rstrip())                   # jk43:23 Marfield Lane:
                                          # Plainview:NY:10023

                                          # axe99:315 W. 115th Street,
                                          # Apt. 11B:New York:NY:10027

                                          # jab44:23 Rivington Street,
                                          # Apt. 3R:New York:NY:10002

                                          # etc.
fh.close()





slicing and dicing a file: the line, word, character count (1/3)

Once we have read a file as a single string, we can "chop it up" any way we like.


# read(): file text as a single strings
fh = open('guido.txt')          # 'file' object
text = fh.read()                # read() method called on
                                # file object returns a string

fh.close()                      # close the file

print(text)
print(len(text))                 # 207 (number of characters in the file)

    # single string, entire text:

    # 'For three months I did my day job, \nand at night and
    #  whenever I got a \nchance I kept working on Python.  \n
    #  After three months I was to the \npoint where I could
    #  tell people, \n"Look here, this is what I built."'





slicing and dicing a file: splitting a string into words (2/3)

String .split() on a whole file string returns a list of words.


file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built." """

words = file_text.split()      # split entire file on whitespace (spaces or newlines)

print(words)
    # ['For', 'three', 'months', 'I', 'did', 'my', 'day', 'job,',
    #  'and', 'at', 'night', 'and', 'whenever', 'I', 'got', 'a',
    #  'chance', 'I', 'kept', 'working', 'on', 'Python.', 'After',
    #  'three', 'months', 'I', 'was', 'to', 'the', 'point', 'where',
    #  'I', 'could', 'tell', 'people,', '“Look', 'here,', 'this',
    #  'is', 'what', 'I', 'built.”']

print(len(words))       # 42 (number of words in the file)





slicing and dicing a file: the line, word, character count (3/3)

String .splitlines() will split any string on the newlines, delivering a list of lines from the file.


file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built."" """

lines = file_text.splitlines()

print(lines)

    # ['For three months I did my day job, ', 'and at night and whenever I got a ',
    #  'chance I kept working on Python.  ', 'After three months I was to the ',
    #  'point where I could tell people, ', '“Look here, this is what I built.”']

print(len(lines))          # 6 (number of lines in the file)





Summary: 3 ways to read strings from a file

for: read (newline ('\n') marks the end of a line)

fh = open('students.txt')        # file object allows looping
                                 # through a series of strings
for my_file_line in fh:          # my_file_line is a string
    print(my_file_line)           # prints each line of students.txt

fh.close()                       # close the file

read(): read entire file as a single string

fh = open('students.txt')  # file object allows reading
text = fh.read()                 # read() method called on file
                                 # object returns a string
fh.close()                       # close the file

print(text)                       # entire text as a single string

readlines(): read as a list of strings (each string a line)

fh = open('students.txt')
file_lines = fh.readlines()      # file.readlines() returns
                                 # a list of strings
fh.close()                       # close the file

print(file_lines)                 # entire text as a list of lines





sidebar: writing to a file

We don't have call to write to a file in this course, but it's important to know how


wfh = open('newfile.txt', 'w')    # open for writing
                                  # (will overwrite an existing file)

wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')

wfh.close()





sidebar: the range() function

This function allows us to iterate over an integer sequence.


counter = range(10)
for i in counter:
    print(i)                        # prints integers 0 through 9

for i in range(3, 8):               # prints integers 3 through 7
    print(i)

If we need an literal list of integers, we can simply pass the iterable to a list:

intlist = list(range(5))
print(intlist)                      # [0, 1, 2, 3, 4]




[pr]