Python 3home |
Introduction to Python
davidbpython.com
The CSV format will allow us to explore Python's text parsing tools.
comma-separated values file (CSV)
19260701,0.09,0.22,0.30,0.009 19260702,0.44,0.35,0.08,0.009 19270103,0.97,0.21,0.24,0.010
Tables consist of records (rows) and fields (column values).
Tabular text files are organized into rows and columns.
comma-separated values file (CSV)
19260701,0.09,0.22,0.30,0.009 19260702,0.44,0.35,0.08,0.009 19270103,0.97,0.21,0.24,0.010 19270104,0.30,0.15,0.73,0.010 19280103,0.43,0.90,0.20,0.010 19280104,0.14,0.47,0.01,0.010
space-separated values file
19260701 0.09 0.22 0.30 0.009 19260702 0.44 0.35 0.08 0.009 19270103 0.97 0.21 0.24 0.010 19270104 0.30 0.15 0.73 0.010 19280103 0.43 0.90 0.20 0.010 19280104 0.14 0.47 0.01 0.010
Text files are just sequences of characters. Commas and newline characters separate the data.
If we print a CSV text file, we may see this:
19260701,0.09,0.22,0.30,0.009 19260702,0.44,0.35,0.08,0.009 19270103,0.97,0.21,0.24,0.010 19270104,0.30,0.15,0.73,0.010 19280103,0.43,0.90,0.20,0.010 19280104,0.14,0.47,0.01,0.010
However, here's what a text file really looks like under the hood:
19260701,0.09,0.22,0.30,0.009\n19260702,0.44,0.35,0.08, 0.009\n19270103,0.97,0.21,0.24,0.010\n19270104,0.30,0.15, 0.73,0.010\n19280103,0.43,0.90,0.20,0.010\n19280104,0.14, 0.47,0.01,0.010
Looping through file line strings, we can split and isolate fields on each line.
The process: 1. Open the file for reading. 2. Use a for loop to read each line of the file, one at a time. Each line will be represented as a string. 3. Remove the newline from the end of each string with .rstrip 4. Divide (using .split()) the string into fields. 5. Read a value from one of the fields, representing the data we want. 6. As the loop progresses, build a sum of values from each line. We will begin by reviewing each feature necessary to complete this work, and then we will begin to put it all together.
This method can remove any character from the right side of a string.
When no argument is passed, the newline character (or any "whitespace" character) is removed from the end of the line:
line_from_file = 'jw234,Joe,Wilson\n'
stripped = line_from_file.rstrip() # str, 'jw234,Joe,Wilson'
When a string argument is passed, that character is removed from the end of the ine:
line_from_file = 'I have something to say.'
stripped = line_from_file.rstrip('.') # str, 'I have something to say'
This method divides a delimited string into a list.
line_from_file = 'jw234:Joe:Wilson:Smithtown:NJ:2015585894\n'
xx = line_from_file.split(':')
print(xx) # ['jw234', 'Joe', 'Wilson',
# 'Smithtown', 'NJ', '2015585894\n']
We can also thing of a string as delimited by spaces.
gg = 'this is a file with some whitespace'
hh = gg.split() # splits on any "whitespace character"
print(hh) # ['this', 'is', 'a', 'file',
# 'with', 'some', 'whitespace']