Advanced Python
Project Discussion, Session 1
Please note: notations (as used in the Introduction to Python course) are no longer required. |
|
1.1 | Writing CSV rows with selected columns. Please create program select_columns.py that accepts two arguments: a source filename (for reading) and a target filename (for writing). The program selects the date, mean_temp, precip and events columns from the source file and writes them to the target file. |
1. Collect the arguments via argparseThe first thing I did in solving this problem was to set up the argparse code and test out that I could retrieve the two arguments: --source and --target. |
|
After composing the argparse code I printed the two attributes from the args object (see the argparse example for args):
print(args.source)
print(args.target)
|
|
So calling it this way (command is meant to be all on one line):
python select_columns.py --source weather_newyork_tiny.csv --ta rget weather_newyork_narrow.csv |
|
showed me these results:
weather_newyork_tiny.csv weather_newyork_narrow.csv |
|
You should test your code as above to confirm your results. Make sure not to use nargs unless multiple arguments are expected. --source and --target should not use nargs=. Special note on filenames/filepaths: if there's any issue with finding files, the easiest thing to do would be to save the script in the same directory as the file it is trying to open -- then the bare filenames that we're using in the examples above will work. But if the file is in a different location than the script, the relative path should be passed to the script; the script should not hard-code any directory location -- this would undercut the reusability of the script. If there is any question about relative paths, please let me know. 2. Construct a csv.reader on the source file and csv.writer on the target fileNext, it seemed pretty clear that csv.reader was going to be easiest to use for reading because we know which fields we need. csv.reader gives us each row as a list. I opened the source file for reading, and then passed the file object from open() to csv.reader(). This produces a xsv.reader object. I also opened the target file for writing, and passed the file object from open() to csv.writer(). This produces a csv.writer object. 3. Determine the indices of the fields you need.Since we are reading the date, mean_temp, precip and events columns, we can look at the file and see the indices that are needed: 0, 1, 17, 19. 4. Write each row of data to the target fileAs you loop through the source file using the csv.reader object, you'll write selected columns to the target file using the csv.writer object. Let's start by seeing what we are working with: loop through the csv.reader object, printing each row. Note that each row is represented as a list. |
|
So for each row of the source file, you must build a new list with only the selected values.
for each row in the reader: build a new list with just the items from the selected columns (indices are noted above) use writer.writerow() to write this new list to the target file |
|
5. Make sure to close the write fileAnytime you are working with a file that you have opened for writing, you must make sure to close it as soon as writing is finished. Make sure to call wfh.close() with the parentheses; without them it will have no effect. For the extra challenge (allowing the user to pass the column heads as arguments via columns=), here is a sample outline: |
|
Reading with DictReader and writing using DictWriter:
for each row dict in the DictReader: initialize a new, empty dict for each column heading in the list of selected columns (args.columns) add a new pair to the empty row dict - the column heading as key, and the value for this key in the reader's row dict use writer.writerow() to write the row dict to the target file |
|
1.2 | Writing CSV rows with selected rows based on row value. Please write a program select_rows.py that accepts four arguments: a source filename (for reading), a target filename (for writing), a column name (that will be evaluated to select rows) and a field value (the matching value that will select the row). |
1. Collect the arguments via argparseThe first thing I did was set up the argparse code and test out that I could retrieve the four arguments: --source, --target, --column and --value. |
|
After composing the argparse code I printed the four attributes from the args object:
print(args.source)
print(args.target)
print(args.column)
print(args.value)
|
|
So calling it this way (command meant to be on one line):
python select_rows.py --source revenue.csv --ta rget revenue_ny.csv --column state --value NY |
|
showed me these results:
revenue.csv revneue_ny.csv state NY |
|
Special note on filenames/filepaths: if there's any issue with finding files, the easiest thing to do would be to save the script in the same directory as the file it is trying to open -- then the bare filenames that we're using in the examples above will work. But if the file is in a different location than the script, the relative path should be passed to the script; the script should not hard-code any directory location -- this would destroy the reusability of the script. If there is any question about relative paths, please let me know. 2. Construct a csv.reader on the source file and csv.writer on the target fileNext, I decided to use csv.reader for reading and csv.writer for writing, partly because I would prefer to have practice using this approach, and because csv.DictReader as csv.DictWriter was used in the last assignment. I opened the source file for reading, and passed the file object from open() to csv.reader(). I opened the target file for writing, and passed the file object from open() to csv.writer().3. Read the header from the source file, and write it to the target fileheader = next(reader) will retrieve the first row from the source file. (Print this to make sure it is performing this correctly.) Then you can write it immediately to the source file using csv.writerow(). Just for a sanity check, I recommend that you run the program from this point. Use PyCharm or a text editor to open the target file, and see that the headers have been written to it. This way you can know you're on the right track and have successfully opened the target file for writing.4. Write each selected row of the source file to the target fileThis is the most interesting / challenging part. As you loop through the source file using the csv.reader object, you'll write selected columns to the target file using the csv.writer object.Begin by looping through the csv.reader object and print each row. Note that each row is represented as a list. Next, consider that for each row of the source file, you must test to see if the value in the selected column (specified by args.column) is the same as the selected value (specified by args.value). To do this, we must know the index position of the selected column (in other words in revenue.csv, if the selected column is 'company', the correct index position is 0, if 'state', index position is 1, etc.). |
|
So before we start looping, we must know the correct index to read from the row list. We can find this using the list .index() method. Here's an example of .index():
header = ['a', 'b', 'c', 'd']
selected = 'c'
row_idx = header.index(selected) # int, 2
|
|
You can then loop through the source file row by row, check to see if the field value at the selected row index is equal to args.value, and if it is, write the row to csv.writer. |
|
The basic outline is this:
use next() to get the header row from the reader determine the row index to use by checking the position of args.column within the header row for each row list in the reader: if the value of the item in the row at the row index is equal to args.value use writer.writerow() to write the row to the target file |
|
5. Make sure to close the write fileAnytime you are working with a file that you have opened for writing, you must make sure to close it as soon as writing is finished. Make sure to call wfh.close() with the parentheses; without them it will have no effect. |
|