Introduction to Python

davidbpython.com




Project Warmup Exercise Solutions, Session 4



EXERCISES RELATED TO list building assignment

 
Ex. 4.1 Create an empty list. Opening and looping through the file student_db.txt, split each line into elements, and append just the state values (the 4th value in each row) to a list. Print the list object. (Hint: make sure the list is initialized before the loop begins and is printed after the loop ends. If either statement is found inside the loop, it will be executed multiple times.
Suggested Solution:
sfile = '../student_db.txt'

states = []                      # initialize an empty list

fh = open(sfile)                 # open file, return file object

for line in fh:                  # for each line in file
    elements = line.split(':')   # returns a list of str elements
    states.append(elements[3])   # append the 4th field from split

print(states)                    # print compiled list


fh.close()

This is the basic summary algorithm, in which a "summary variable" (in this case a list) is initialized before the loop begins, is modified inside the loop, and then its value(s) reported once the loop is complete. The point of the exercise is to build up a summary list in the proper manner, taking care not to initialize the list inside the loop nor report its value inside the loop. The variable lives outside the loop, and is modified only inside the loop, as values are added to it one at a time.

 
Ex. 4.2 Extend the previous program by omitting the first line from the file. One way to do this is to call line = next(fh) on the file handle (assuming this is the variable name you used for the filehandle). This should be done before the loop begins, and will advance the file pointer to the next line.
Suggested Solution:
sfile = '../student_db.txt'

states = []

fh = open(sfile)                 # open file, return file object
lines = fh.readlines()           # readlines() -> list of strings
data_lines = lines[1:]           # list of 2nd->last lines of file

for line in data_lines:          # for each string line in list
    elements = line.split(':')   # split line on comma, returns
                                 # list of strings:  each field
    states.append(elements[3])   # add the 4th field from split

print(states)                    # print compiled list


fh.close()

This exercise introduces the concept of the file pointer and a method for moving it forward without including it in a loop.

 
Ex. 4.3 Starting with an empty list, and then opening and looping through student_db.txt, append the 1st field (a student ID) to a list only if the state field is NY. Print the list object.
Suggested Solution:
sfile = '../student_db.txt'

ids = []

fh = open(sfile)                 # open file, return file object

for line in fh:                  # for each string line in file
    elements = line.split(':')   # split line on comma, returns
                                 # list of strings: each field
    if elements[3] == 'NY':      # if the state (4th) is 'NY'
        ids.append(elements[0])  # add first field from split (id)
                                 # to list

print(ids)                       # when loop is done,
                                 # print compilled list

fh.close()

This combines the prior summary algorithm solution with line selection based on a particular field, i.e. only those lines containing the correct field value will be included in the collected values.

 
Ex. 4.4 Opening and reading through revenue.csv, build a list from the 3rd field (the floating point value), then sum up the values using sum(). (Hint: sum() will not work unless you convert each element to a float value before you add it to the list, so do that with float().)
Suggested Solution:
rfile = '../revenue.csv'

fh = open(rfile)                 # open file, return file object

revenues = []                    # initialize an empty list

for line in fh:                  # for each string line in file
    fields = line.split(',')     # split line on comma, returns
                                 # list of strings: each field
    fval = float(fields[2])
    revenues.append(fval)

print(sum(revenues))

fh.close()

The point of this exercise has less to do with the summary algorithm (which we are reprising and I hope you are recognizing) and more to do with the sum() function, which will sum up a container of floats. Of course if you neglect to convert each of the collected values to a float, the call to sum() will fail when the function attempts to add the first value to numeric 0.

 
Ex. 4.5 Extending the above program, use len() to determine the length of the list, then use the formula sum(flist) / len(flist) to produce an average (where flist is the list of floats you compiled).
Suggested Solution:
rfile = '../revenue.csv'

fh = open(rfile)                 # open file, return file object

revenues = []                    # initialize an empty list

for line in fh:                  # for each string line in file
    fields = line.split(',')     # split line on comma, returns
                                 # list of strs, each field a str
    fval = float(fields[2])      # the 3rd field convert to float
    revenues.append(fval)        # append the float to a list

print(sum(revenues) / len(revenues))  # average value in the list
                                      # total sum divided by count

fh.close()

This extends the concept of working with a list of values by asking you to take the average value from a list -- by computing the values of len() (giving a count of values in the list) and sum() (giving a sum of the same values). The purpose is to show how these summary functions work as well as to show how easy it is to get information from a list of values.

 
Ex. 4.6 Given the following code:
values = [1, 3, 4, 10, 15]

Determine and print the median (middle) value without mentioning any values directly. (Hint: use len() to determine the length, then use that value to figure out the "halfway" index. Also remember that your calculated index must be an integer, not a float.)

Suggested Solution:
values = [1, 3, 4, 10, 15]

median_index = int(len(values) / 2)   # of elements / 2 ==
                                      # middle index

print(values[median_index])      # using 'middle index' to get
                                 # the middle value

The "middle" value of an odd number of values can be found by using the "middle index". If there are 5 values, the middle index is 2. If there are 7 values, the middle index is 3. Our division of an odd count of values (5, 7, 9, etc.) by 2 must be converted to an integer, i.e. with any floating point values lopped off -- int() will do this for us.

 
Ex. 4.7 Given the following code:
values = [3, 1, 10, 15, 4]

Again, determine and print the median value. This time you'll need to use the sorted() function, which takes an unsorted list and returns a sorted list.

Suggested Solution:
values = [3, 1, 10, 15, 4]

svals = sorted(values)            # numeric vals sorted low->high:
                                  # [1, 3, 4, 10, 15]

median_index = int(len(values) / 2)  # calc len of list / 2
print(svals[median_index])           # using the middle index to
                                     # get the middle value

This exercise builds on the previous assignment's, requiring that the values be sorted before the median calculation.

 
Ex. 4.8 Given the following code:
values = [30, 20, 10, 40, 50, 60]

Determine the median value, which (in an even-number of elements) is halfway between the two "middle" values in a sorted list. (Hint: again use the len() of the list to calculate the indices of the middle two values.)

Suggested Solution:
values = [30, 20, 10, 40, 50, 60]

svals = sorted(values)             # sorted() of numbers returns
                                   # numeric sort, in numeric order
                                   # [10, 20, 30, 40, 50, 60]

len_values_mid = int(len(svals)/2) # 3 (the index to the right of
                                   #    middle, where the middle
                                   #    is between two indices)

lval = svals[len_values_mid-1]     # 2 (index to left of middle)
rval = svals[len_values_mid]       # 3 (index to right of middle)

median = (lval + rval) / 2         # these two values summed and
                                   # averaged
print(int(median))                 # 35

In an even-number of elements list, the same approach using an subscript index calculated from the len() of the list can be used to identify one of the two "middle" values needed to calculate the median -- the other is the value to the left of it, i.e. with an index one less than the other middle value. We then need only sum those two middle values and divide by 2 or 2.0 to arrive at the median value.

 

EXERCISES RELATED TO word count project

 
Ex. 4.9 Open the pyku.txt file and use the file read() method to read it into a string. Print the length of this string using len()
filename = '../pyku.txt'

fh = open(filename)     # open(filename) returns a file object

text = fh.read()        # read() returns whole file text as str

print(len(text))        # len() of the file string shows char len
                        # of the file


fh.close()

All files can be expressed as strings because they are essentially sequences of characters. read() returns the file as a string, and len() can measure the length of a string. Thus we are able to measure the length of the file in terms of # of characters.

 
Ex. 4.10 Building on the previous program, count the number of times the word spam occurs in the file. (Hint: read() the file into a string and use the str count() method.)
filename = '../pyku.txt'

fh = open(filename)       # open(filename) returns a file object

text = fh.read()          # read() returns the entire text as str

print(text.count('spam')) # int showing # of times 'spam' appears

fh.close()

This is a side illustration not directly related to the "word count" solution: since we can easily store the entire file in a single string, we can use the string "inspector" methods to determine things about the string. In this case, we can use the count() method to count the number of occurrences of 'spam' (or any other string) in the file.

 
Ex. 4.11 Open the pyku.txt file and use the file read() method to read it into a string. Use str splitlines() to split the string into a list of lines. Print the type of the list returned from splitlines(), then print the first and last line from the file using list subscripts.
filename = '../pyku.txt'

fh = open(filename)     # open(filename) returns a file object

text = fh.read()        # read(): returns file text as a string

lines = text.splitlines()  # splitlines(): list of strs (lines)
print(type(lines))         # print the type of the list
print(lines[0])            # the first line (first item in list)
print(lines[-1])           # the last line (last element of list)
                           # can also say lines[2]

fh.close()

This exercise illustrates the use of splitlines(): to take a multiline string (like the one produced by read()) and split it into its constituent lines (i.e., this is largely the same as split('\n') on Unix machines). The rest of the exercise just illustrates that the return value is a list of strings. Note the negative subscript (lines[-1]): this retrieves the last item in the list. You ciould also use lines[2] if you were counting from the start.

 
Ex. 4.12 Open a file and use the file read() method to read it into a string. Use the string split() method to split the entire string into a list of words. Print the type of the list returned from split(), then print the first and last word from the file using list subscripts.
filename = '../pyku.txt'

fh = open(filename)     # open(filename) returns a file object

text = fh.read()        # read() returns entire file text as str

words = text.split()    # split entire file on whitespace->
                        # list of words
print(type(words))      # print the type of the list
print(words[0])         # the first word in the file
print(words[-1])        # the last word in the file

fh.close()

This exercise is intended to work in the same way as the previous one, except by splitting the string into words rather than into lines. The rest of the exercise just illustrates that the return value is a list of strings.

 
Ex. 4.13 Starting with this list:
var = ['a', 'b', 'c', 'd']

print the length of the list. (Hint: do not use a loop and a counter. Use len())

var = ['a', 'b', 'c', 'd']   # initialize a list of str objects

print(len(var))              # print len of list (4)

Lists have a len()!

 

EXERCISES RELATED TO spell check program

 
Ex. 4.14 Given a list containing duplicate values, initialize an empty set() and add each element to it to produce a set of unique values. Print the whole set. (Note: just for practice, don't pass the list directly to the set() constructor -- loop through the list and add each element one at a time.)
dupvals = [1, 3, 1, 1, 2, 3, 2, 1, 3]

uniquevals = set()                # initialize a new empty set

for val in dupvals:               # for each value in the list
    uniquevals.add(val)           # add the value to the set
                                  # (dupe value are ignored)

print(uniquevals)                 # print entire set

A set automatically discards any duplicate values -- so you don't need to check for duplicates before adding to the set. As alluded to in the description (and as you'll see in the slides), you can pass the list to the set() constructor and a new set will be returned, with all duplicates removed. However the instructions are asking you to go "the long way around" and loop through the list, adding each element from the list to the set. This is requested because in the homework we'll be looping through a file and adding each element to a set.

 
Ex. 4.15 Opening the revenue.csv text file, loop through and print each line of the file, but make sure there are no blank lines printed (hint: use rstrip() on each line to remove the newline character; print() itself already prints a newline at the end of every line).
filename = '../revenue.csv'

fh = open(filename)          # open file, return file object

for line in fh:              # for each string line in file

    line = line.rstrip()     # 'remove' whitespace from end of str
                             # returning a new string without it
    print(line)

fh.close()

This exercise is simply reminding you to strip each line as you loop through a file. (Stripping is not strictly necessary if you're not using the end of the string (i.e., if you're planning to split and use a "middle" value from the split, but it is good practice to dispense with the newline anytime you are looping through a file - it just makes things easier. Another way to do this is to use read() on the file to produce a string, and splitlines() to produce a list of strings -- that is, the file string split on the newlines.

 
Ex. 4.16 Given the following list:
mylist = [-5, -2, 1, -3, 1.5, 7, 9]

Loop through mylist and build a new list that is only positive numbers (i.e., greater than 0). Print the list.

Suggested Solution:
mylist = [-5, -2, 1, -3, 1.5, 7, 9]

newlist = []                    # initialize an empty list

for el in mylist:               # looping thru each item in list
    if el > 0:                  # if the element is > than 0:
        newlist.append(el)      # add the element to the new list

print(newlist)

This is a "filtering" loop, one which works with a populated list and has the effect of filtering out selected elements. The general method for such a loop is to create an empty list first, then loop through the populated list and test each element against the filtering condition (in this case, is it greater than 0?). Elements that pass the test are appended to the empty list. Thus, the original/source list is not changed -- it is usually much more straightforward to build a new list rather than try to modify an old one, particularly since modifying a list while you are looping through it can cause problems.

 
Ex. 4.17 Given the following code:
sentence = "I could; I wouldn't.  I might?  Might I!"

Split this sentence into words (without altering or processing them). Print each word on a line.

Suggested Solution:
sentence = "I could; I wouldn't.  I might?  Might I!"

words = sentence.split()  # split line on space, returns
                          # list of strs, each word in string

for word in words:        # loop through each word in list
    print(word)           # print the word

This is the first step in breaking up a file of words into individual words. split() with no arguments splits on whitespace (i.e., any consecutive spaces found will be removed and will mark the border between two elements). Of course split() returns a list of strings; we can loop over this list with for.

 
Ex. 4.18 Extending the above code, use a single call to .rstrip() to remove any punctuation. (Hint: you can place any punctuation characters to be removed together in a single string argument to rstrip(); any single character within the string will be removed.)
Suggested Solution:
sentence = "I could; I wouldn't.  I might?  Might I!"

words = sentence.split()        # split line on space, returns
                                # list of strs, each word in str

for word in words:              # for each word in list of words
    word = word.rstrip(';.?!')  # remove any punctuation from end
    print(word)                 # print stripped word

This exercise takes the previous one further by processing each word in the split list, stripping off any punctuation. Note that the punctuation list ';.?!' is actually a single string, and yet each character in the argument string is treated separately as a candidate for stripping (in other words, this string argument is treated as a list of characters). Also remember that string methods don't actually modify the string, which is immutable -- instead, rstrip() is returning a new string which we are assigning back to the same name -- the variable "word".

 
Ex. 4.19 Extending the above code, add an additional statement to lowercase each word.
Suggested Solution:
sentence = "I could; I wouldn't.  I might?  Might I!"

words = sentence.split()         # split line on space, returns
                                 # list of strs: each word in line

for word in words:               # for each word in list of strs
    word = word.rstrip(';.?!')   # remove any punctuation from end
    word = word.lower()          # lowercase this string
    print(word)

Next step -- process each word further by lowercasing it. Again, we're assigning the new string returned from lower() to the same name, effectively replacing it -- this has the effect of making it seem as if the word string is being modified -- instead, it's only renaming the new string to the old name, word.

 
Ex. 4.20 Extending the above code, add each word to a set(), then print the entire set.
Suggested Solution:
sentence = "I could; I wouldn't.  I might?  Might I!"

newset = set()                   # new empty set
words = sentence.split()         # split line on space, returning
                                 # a list of strs, each word

for word in words:               # for each word in list of strs
    word = word.rstrip(';.?!')   # remove chars from end of word
    word = word.lower()          # lowercase the word
    newset.add(word)             # add the word to the set

print(newset)                    # print the set

Now we're combining the set adding earlier with the word preparing that we just accomplished.

 
Ex. 4.21 Open the file pyku.txt. Print each word from the file on a separate line. (Hint: use read() on the filehandle to return a string, then use split() on the string to create a master list of words.)
Suggested Solution:
filename = '../pyku.txt'

fh = open(filename)             # open file, return file object

text = fh.read()                # return file text as a string
words = text.split()            # split space on space, returns
                                # list of strs: each word in file

for word in words:              # for each word in split words
    print(word)                 # print the word

fh.close()

This exercise shows how easy it is to split an entire file into a list of words. One needs only to retrieve the entire file as a string, and then split on whitespace. This illustrates another way we can "slice and dice" the file, since a file can be read as a list of strings, a single string, or in this case a list of words.

 
Ex. 4.22 Extending the previous solution, again strip each word of punctuation and lowercase each word and add each to a set. Print the set.
filename = '../pyku.txt'

newset = set()                    # a new, empty set

fh = open(filename)               # open file, return file object

text = fh.read()                  # read() returns text of file
words = text.split()              # split text str, returns list
                                  # of words

for word in words:                # for each word in list of words
    word = word.rstrip(';.?!')    # remove punctuation
                                  # from end of word
    word = word.lower()           # lowercase the word
    newset.add(word)              # add the word tothe empty set

print(newset)                     # print the compiled set

fh.close()

Now we're combining prior exercises with the current one -- reading the file as a string, splitting the string on whitespace, 'preparing' each word by removing punctuation and lowercasing, and then adding each word to a set.

 
Ex. 4.23 Start with the following code:
text = "we're certainly out of gouda but Python is great."

Now split this text into words (no need to strip or lowercase). Removing the printing of the set and adding to the end of the code, loop through each word in filetext and check the word against the set created earlier (use the in operator to check for membership). If the word is not in the set, print it.

Suggested Solution:
filename = '../pyku.txt'
checktext = "we're certainly out of gouda but Python is great."

newset = set()                    # a new, empty set

fh = open(filename)               # open file, return file object

filetext = fh.read()              # read() returns text
words = filetext.split()          # text split into list of words

for word in words:                # for each word in list of words
    word = word.rstrip(';.?!')    # remove punctuation from word
    word = word.lower()           # lowercase the word
    newset.add(word)              # add prepared word to the set

test_words = checktext.split()    # split text into list of words

for word in test_words:           # for each word in list of words
    if word not in newset:        # if the same word is not in set
        print(word)               # print the word

fh.close()

Now the last step: once we have added all the words in the file to a set, we can loop through words from a test string and check each one against the set. This exercise (and the homework) illustrates the use of a set to test for membership. A set is suited to membership tests because a) it contains unique elements and b) the elements are ordered according to Python's internal rules that enable fast lookups.

 
[pr]