Advanced Python
Projects, Session 2

Please note: notations (as used in the Introduction to Python course) are not required. IMPORTANT note on using the sqlite client The sqlite client is convenient for viewing table structure and other tasks, but it is not required. When using it, please keep in mind that your Terminal / Command Prompt session must be in the same directory as your database file, or you will create a new, empty file of the same name as the one you attempted to open. Please see the Challenge Exercises for steps needed to move your command line session to the proper data folder for these projects. If you try to open your database file but it has no tables, then you have either misspelled the name of the directory, or are not using the correct relative path. To find a file using a relative path, we must know a) the location of the file, and b) the location from which we are running our script (the "present working directory"); from this we can determine c) the path needed to access the file location from the pwd. If Python can't find your file, it may be because the relative path is incorrect.

If the file you want to open is in the same directory as the script you're executing, use the filename alone:
fh = open('filename.txt')
If the file you want to open is in the parent directory from the script you're executing, use the filename with ../:
fh = open('../filename.txt')
If the file you want to open is in a child directory from the script you're executing, use the filename with the child directory name prepended:
fh = open('childdir/filename.txt')

(Above assumes the directory where the file is located is named childdir and that this directory is in the same directory as the script you're executing.)

2.1 SQL to CSV: write function sql_to_csv(db_filename, table, csv_filename) that reads from a table in an sqlite database, and writes to a CSV file.

TWO IMPORTANT RECENT ADDITIONS: use "type hints" with your function (see end of slide deck titled "User Defined Functions" for examples, as well as the "if __name__ == '__main__': main()" structure (see end of slide deck "Functions and Code Organization". Thank you! Use the cursor .description attribute to retrieve the column names and build that into the first row of the CSV file. The result should be a .csv file with the same column heads and data as the database table. We will use session_2.db as database and read from a table named revenue, and write to revenue_from_db.csv. Please remember to close the file as well as the database connection when done. Also, please make sure that the function is "pure", meaning that it does not refer to any variables that were defined outside the function. To ensure this, please name your argument variables (i.e., in the 'def') differently than the variables 'db_name', 'table_name' and 'csv_filename')

import sqlite3

db_name = 'session_2.db'
table_name = 'revenue'
csv_filename = 'revenue_from_db.csv'

sql_to_csv(db_name, table_name, csv_filename)
After running the program we should see that the CSV file has been created:
company,state,cost
Haddad's,PA,239.5
Westfield,NJ,53.9
The Store,NJ,211.5
Hipster's,NY,11.98
Dothraki Fashions,NY,5.98
Awful's,PA,23.95
The Clothiers,NY,115.2


HOMEWORK CHECKLIST please use this checklist to avoid common mistakes and ensure a correct solution:

    the function is reusable -- therefore it cannot refer to the names of the database, table name or CSV file literally inside the function. Instead it must read these values from the function arguments. Please name your function argument variables (i.e., in the 'def' statement) with different names than the ones used outside the function.
    the function is "pure" -- therefore it cannot refer to any variables created outside the function and will not call print(), input() or exit() (it's OK to do these things during development).
    code conforms to points in the Code Quality pdf
    both file handle and database connections are closed before returning from function
    there are no extraneous comments or "testing" code lines
    program runs as shown in the assignment, or if it doesn't, a comment is placed at the top explaining what error or bad output has occurred (it is fine to turn in an incomplete solution if you have a question or would like to discuss ways to improve)

 
2.2 CREATE TABLE / DROP TABLE: write a script that creates a new table called weather_newyork in the session_2.db database with the following columns: date (should be a TEXT column), mean_temp (INT), precip (REAL), events (TEXT).

After running the script successfully for the first time, add an additional statement above the CREATE TABLE statement: DROP TABLE IF EXISTS weather_newyork. Now when you run the script, it will first drop and then re-create the table every time you run it. Don't forget to issue a conn.commit() after all of your database writes (where conn is the database connection object returned from sqlite3.connect()). If you don't commit your changes, the writes will not appear in the database or database table. (You do not need to execute conn.commit() more than once - one call can commit several database writes.) Note that this script won't do anything except create and remove the table.

You can verify that the CREATE TABLE statement has succeeded with the following code, which should return a list of table names in the database:
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = [ tup[0] for tup in cursor.fetchall() ]
print(f'tables:  {tables}')

(Note that 'table' above should be included as-is -- this is not meant to represent the name of your table.)
HOMEWORK CHECKLIST please use this checklist to avoid common mistakes and ensure a correct solution:

    code conforms to points in the Code Quality pdf
    there are no extraneous comments or "testing" code lines
    program runs as shown in the assignment, or if it doesn't, a comment is placed at the top explaining what error or bad output has occurred (it is fine to turn in an incomplete solution if you have a question or would like to discuss ways to improve)

 
2.3 CSV to SQL: Starting with the DROP TABLE and CREATE TABLE code from the previous solution, write a script that reads from weather_newyork.csv (using the csv module), selects the date, mean_temp, precip and events columns, and inserts them row-by-row into the table you created, weather_newyork.

Use parameter arguments (?, ?, ?, etc.) to perform the inserts. Please note that the "precip" column has a "T" value instead of a numeric value in some rows. Please account for this by inserting the Python value None (DO NOT USE STRING 'None' BUT RATHER THE VALUE None (no quotes) instead of the T. Dropping and then recreating the table every time we run the script is sometimes called "wipe and reload" and is considered a "best practice" since it starts with a clean slate. Use a SELECT query to confirm that the data is in the table. You can use SELECT * FROM weather_newyork to get all rows, but with very large datasets this would be impractical. Add LIMIT 10 to this query to see that many rows. You must confirm that the T->None functionality is working - the first 'T' value can be found at 1/9/16. At that row in the table, the 'precip' value should be empty or NULL. (In the sqlite3 client it will show as an empty cell; selecting using Python should show None.)

You should also check to see that all rows were inserted. One easy way to see how many rows are in the table is through the COUNT(*) function:
c.execute('SELECT COUNT(*) FROM weather_newyork')
print(c.fetchone())

You should see 366 rows in the table. If you see 367 rows, you may have inserted the header row from the file.
HOMEWORK CHECKLIST please use this checklist to avoid common mistakes and ensure a correct solution:

    code does not attempt to insert the STRING 'None' (i.e., with quotes) but instead uses the VALUE None (no quotes).
    code conforms to points in the Code Quality pdf
    there are no extraneous comments or "testing" code lines
    program runs as shown in the assignment, or if it doesn't, a comment is placed at the top explaining what error or bad output has occurred (it is fine to turn in an incomplete solution if you have a question or would like to discuss ways to improve)

 
2.4 JSON to SQL: Starting with the DROP TABLE and CREATE TABLE code from the earlier solution, write a script that reads from weather_newyork_dod.json, selects the date, mean_temp, precip and events values from each dict, and inserts them row-by-row into the table you created, weather_newyork (make sure values are in correct order).

Please note that the "precip" key sometimes has a "T" value instead of a numeric value. Please account for this by inserting the Python value None (DO NOT USE STRING 'None' BUT RATHER THE VALUE None (no quotes) instead of the T.
HOMEWORK CHECKLIST please use this checklist to avoid common mistakes and ensure a correct solution:

    code conforms to points in the Code Quality pdf
    there are no extraneous comments or "testing" code lines
    program runs as shown in the assignment, or if it doesn't, a comment is placed at the top explaining what error or bad output has occurred (it is fine to turn in an incomplete solution if you have a question or would like to discuss ways to improve)

 

EXTRA CREDIT / SUPPLEMENTARY

2.5 (this assignment is not required, but is highly recommended, so please do it if you can) SQL to JSON dict of dicts. Build a JSON dict of dicts (similar in form to weather_newyork_dod.json based on the revenue table in this week's session_2.db database. The key of the "outer" dict of dicts should be the company name. The value should be a dict of name, state, amount for each row in the table. The file should be named revenue.json.
The resulting revenue.json file looks like this:
{
     "Haddad's": {
          "name": "Haddad's",
          "state": "PA",
          "amount": 239.5
     },
     "Westfield": {
          "name": "Westfield",
          "state": "NJ",
          "amount": 53.9
     }
}

Note that the name parameter is repeated in each "inner" dict -- this is done for the convenience of anyone reading the dict.

 
2.6 (extra credit / supplementary) Parse larger JSON file. Reading the ip_routes.json file into a Python object, provide code that can answer the following three questions:

  1. How many keys are in the 'routes' dict of dicts?
  2. In the 'routes' dict of dicts, what are the ip addresses that have a 'routeType' of 'iBGP'?
  3. Besides 'routes', what are the keys in the dict associated with 'default', and what are the values associated with those keys?

number of keys in the 'routes' dict:
1346
ips with a 'routeType' of 'iBGP'
10.159.2.198/31
10.159.4.44/31
10.159.2.194/31
keys other than 'routes' in the 'default' dict
allRoutesProgrammedKernel:  True
routingDisabled:  False
allRoutesProgrammedHardware:  True
defaultRouteState:  notSet

You'll use dict subscripting and looping to answer these questions. See discussion for details on how to proceed.

 
2.7 (extra credit / supplementary) ETL Library. Write a module called etllib.py with the following four functions for performing four transforms: SQL->CSV, SQL->JSON, CSV->SQL, JSON->SQL.

This library can form the basis for a versatile library of ETL tools that make it easy to convert form one format to another. Note that this library does not assume what columns will appear in any CSV file, JSON file or database table. (The functions that insert into a database do assume that the CSV column heads and JSON file dict keys match the insert table, and that they are in the same order. You could extend your solution to be even more general by dropping this assumption if you wished.) Please also keep in mind that none of these functions open a database; they are passed a database 'connection' object (i.e., the value returned from sqlite3.connect()) this gives a function the flexibility to work with any database connection that is passed to it. Also, none of these functions writes to a CSV or JSON file. Instead they return CSV or JSON data as strings. This is done to give the user flexibility to write the resulting data to a file, return from a web request, etc.

def sql2csv(query, conn):
    """
    query a database and return string data in CSV format

    Arguments:  query   string SQL query to select data
                conn    database connection object

    Return Value:  a string with CSV-formatted data

    (note:  to return CSV data as a string, the csv writer
     must be given a StringIO object instead of an
     open file object.  See the io module for
     details)

    """


def sql2json(query, conn, format='lod', primary_key=None):
    """
    query a database and return a JSON string

    if format is 'lod', function will return a
    list of dicts

    (If format is 'lod' and primary_key is specified,
     function should raise ValueError with a suitable
     message.)

    if format is 'dod', function will return a dict
    of dicts with the designated primary_key as
    "outer" dict key

    (If format is 'dod' and primary_key is not specified,
     function should raise ValueError with a suitable
     message.)

    Arguments:  query   string SQL query to select data
                conn    database connection object

                format (optional):
                        'lod' (list of dicts, the default) or
                        'dod' (dict of dicts)

                primary_key  (optional):
                        column value to use as the key
                        for a dict of dicts

                (note that if format == 'dod' then 'primary_key'
                 must be an existing column name;
                 if format == 'lod' then 'primary_key'
                 must be None -- use 'is None' or 'is not None'
                 to test)

    Raises:  ValueError if format is 'dod' and primary_key is
             not specified, or format is 'lod' and primary_key
             is specified.

    Return Value:  string in JSON format

    (note:  to return a JSON string rather than
     writing to a file, use the json.dumps()
     method, which returns a string, instead of
     the json.dump() method)

    """


def csv2sql(filename, conn, table):
    """
    insert a csv file into a database table

    Arguments:  filename   CSV filename to read
                conn       database connection object
                table      table to insert to

    Return Value:  None (writes to database)
    """


def json2sql(filename, conn, table):
    """
    insert JSON data into a database

    Arguments:  filename   JSON file to read (assumes dict of dicts)
                           also assumes that "inner" dicts all have
                           identical keys
                conn       database connection object
                table      name of table to write to

    Return Value:  None (writes to database)
    """
 
[pr]