Python 3

home

Introduction to Python

davidbpython.com




Building Up Containers from File


introduction: building up containers from file

This technique forms the core of much of what we do in Python.


In order to work with data, the usual steps are:


We call this process Extract-Transform-Load, or ETL. ETL is at the heart of what core Python does best.






looping through a data source and building up a list

Similar to the counting and summing algorithm, this one collects values instead.


build a list of company names

company_list = []                        # empty list
fh = open('revenue.csv')                 # 'file' object

for line in fh:                          # str, 'Haddad's,PA,239.50\n'
    line = line.rstrip()                 # str, 'Haddad's,PA,239.50'

    items = line.split(',')              # list, ["Haddad's", 'PA', '239.50']

    company_list.append(items[0])        # list, ["Haddad's"]


print(company_list)   # ["Haddad's", 'Westfield', 'The Store', "Hipster's",
                      #  'Dothraki Fashions', "Awful's", 'The Clothiers']

fh.close()

Ex. 5.14 - 5.15






looping through a data source and building up a unique set

This program uses a set to collect unique items from repeating data.


state_set = set()                       # empty set
fh = open('revenue.csv')                # 'file' object

for line in fh:                         # str, 'Haddad's,PA,239.50'

    items = line.split(',')             # list, ["Haddad's", 'PA', '239.50']
    state_set.add(items[1])             # set, {'PA'}

print(state_set)       # set, {'PA', 'NY', 'NJ'}   (your order may be different)

chosen_state = input('enter a state:  ')

if chosen_state in state_set:
   print(f'{chosen_state} found in the file')
else:
    print(f'{chosen_state} not found')

fh.close()


5.22 & 5.23






reading a file with with

A file is automatically closed upon exiting the 'with' block.


A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.

with open('pyku.txt') as fh:
    for line in fh:
        print(line)

# At this point (once outside the with block), filehandle fh
# has been closed.  There is no need to call fh.close().


However, we should understand the minimal cost of not closing our files:

A file open for writing should be closed as soon as possible. The data may not appear in the file until it has been closed. 4.15






slicing and dicing a file: the line, word, character count (1/3)

Once we have read a file as a single string, we can "chop it up" any way we like.


# read(): file text as a single strings
fh = open('guido.txt')          # 'file' object
text = fh.read()                # read() method called on
                                # file object returns a string

fh.close()                      # close the file

print(text)
print(len(text))                 # 207 (number of characters in the file)

    # single string, entire text:

    # 'For three months I did my day job, \nand at night and
    #  whenever I got a \nchance I kept working on Python.  \n
    #  After three months I was to the \npoint where I could
    #  tell people, \n"Look here, this is what I built."'






slicing and dicing a file: splitting a string into words (2/3)

String .split() on a whole file string returns a list of words.


file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built." """

words = file_text.split()      # split entire file on whitespace (spaces or newlines)

print(words)
    # ['For', 'three', 'months', 'I', 'did', 'my', 'day', 'job,',
    #  'and', 'at', 'night', 'and', 'whenever', 'I', 'got', 'a',
    #  'chance', 'I', 'kept', 'working', 'on', 'Python.', 'After',
    #  'three', 'months', 'I', 'was', 'to', 'the', 'point', 'where',
    #  'I', 'could', 'tell', 'people,', '“Look', 'here,', 'this',
    #  'is', 'what', 'I', 'built.”']

print(len(words))       # 42 (number of words in the file)






slicing and dicing a file: the line, word, character count (3/3)

String .splitlines() will split any string on the newlines, delivering a list of lines from the file.


file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built."" """

lines = file_text.splitlines()

print(lines)

    # ['For three months I did my day job, ', 'and at night and whenever I got a ',
    #  'chance I kept working on Python.  ', 'After three months I was to the ',
    #  'point where I could tell people, ', '“Look here, this is what I built.”']

print(len(lines))          # 6 (number of lines in the file)






"whole file" parsing: reading a file as a list of lines

String .splitlines() will split any string on the newlines, delivering a list of lines from the file.


fh = open('pyku.txt')           # 'file' object

file_text = fh.read()           # entire file as a single string

lines = file_text.splitlines()

print(lines)

    # ["We're out of gouda.", 'That parrot has ceased to be.',
    #  'Spam, spam, spam, spam, spam.']

print(len(lines))          # 3 (number of lines in the file)


Ex. 5.27 -> 5.29






Summary: 3 ways to read strings from a file

for: read (newline ('\n') marks the end of a line)

fh = open('students.txt')        # file object allows looping
                                 # through a series of strings
for my_file_line in fh:          # my_file_line is a string
    print(my_file_line)           # prints each line of students.txt

fh.close()                       # close the file

read(): read entire file as a single string

fh = open('students.txt')  # file object allows reading
text = fh.read()                 # read() method called on file
                                 # object returns a string
fh.close()                       # close the file

print(text)                       # entire text as a single string

readlines(): read as a list of strings (each string a line)

fh = open('students.txt')
file_lines = fh.readlines()      # file.readlines() returns
                                 # a list of strings
fh.close()                       # close the file

print(file_lines)                 # entire text as a list of lines






sidebar: writing to a file

We don't have call to write to a file in this course, but it's important to know how.


wfh = open('newfile.txt', 'w')    # open for writing
                                  # (will overwrite an existing file)

wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')

wfh.close()






sidebar: the range() function

This function allows us to iterate over an integer sequence.


counter = range(10)
for i in counter:
    print(i)                        # prints integers 0 through 9

for i in range(3, 8):               # prints integers 3 through 7
    print(i)

If we need an literal list of integers, we can simply pass the iterable to a list:

intlist = list(range(5))
print(intlist)                      # [0, 1, 2, 3, 4]




[pr]