Python 3home |
Use these sqlite3 commands to format your output readably.
At the start of your session, issue the following two commands -- these will format your sqlite3 output so it is clearer, and add columns headers.
sqlite> .mode column sqlite> .headers on
Now output is clearly lined up with column heads displayed:
sqlite> SELECT * FROM revenue; # company state price # ---------- ---------- ---------- # Haddad's PA 239.5 # Westfield NJ 53.9 # The Store NJ 211.5 # Hipster's NY 11.98 # Dothraki F NY 5.98 # Awful's PA 23.95 # The Clothi NY 115.2
However, note that each column is only 10 characters wide. It is possible to change these widths although not usually necessary.
The PK is defined in the CREATE TABLE definition.
A primary key is a unique value that can be used to identify a table row. These are usually id numbers, such as a product id, county id, domain ip, etc. Since it identifies a unique row in a table, the primary key in a database cannot be duplicated -- an error will occur if this is attempted.
Here's a table description in SQLite for a table that has a "instructor_id" primary key:
CREATE TABLE instructors ( instructor_id INT PRIMARY KEY, password TEXT, first_name TEXT, last_name TEXT );
Two tables may have info keyed to the same primary key -- these can be joined into one table.
Relational database designs attempt to separate data into individual tables, in order to avoid repetition. For example, consider one table that holds data for instructors at a school (in which one instructor appears per row) and another that holds records of a instructor's teaching a class (in which the same instructor may appear in multiple rows).
Here is a CREATE TABLE description for tables instructors and instructor_classes. instructors contains:
sqlite3> .schema instructors CREATE TABLE instructors ( instructor_id INT PRIMARY KEY, password TEXT, first_name TEXT, last_name TEXT ); sqlite3> .schema instructor_classes CREATE TABLE instructor_classes ( instructor_id INT, class_name TEXT, day TEXT );
Select all rows from both tables:
sqlite3> SELECT * from instructors instructor_id password first_name last_name ------------- ---------- ---------- ---------- 1 pass1 David Blaikie 2 pass2 Joe Wilson 3 xxyx Jenny Warner 4 yyyy Xavier Yellen sqlite> SELECT * from instructor_classes instructor_id class_name day ------------- ------------ ---------- 1 Intro Python Thursday 1 Advanced Pyt Monday 2 php Monday 2 js Tuesday 3 sql Wednesday 3 mongodb Thursday 99 Golang Saturday
Why is instructor_classes data separated from instructors data? If we combined all of this data into one table, there would be repetition -- we'd see the instructor's name repeated on all the rows that indicate the instructor's class assignments. So it makes sense to separate the data that has a "one-to-one" relationship of instructors to the data for each instructor (as in the instructors table) from the data that has a "many-to-one" relationship of the instructor to the data for each instructor (as in the instructor_classes table). But there are times where we will want to see all of this data shown together in a single result set -- we may see repetition, but we won't be storing repetition. We can create these combined result sets using database joins.
all rows from "left" table, and matching rows in right table
A left join includes primary keys from the "left" table (this means the table mentioned in the FROM statement) and will include only those rows in right table that share those same keys.
sqlite3> SELECT * FROM instructors LEFT JOIN instructor_classes on instructors.instructor_id = instructor_classes.instructor_id; instructor_id password first_name last_name instructor_id class_name day ------------- ---------- ---------- ---------- ------------- --------------- ---------- 1 pass1 David Blaikie 1 Advanced Python Monday 1 pass1 David Blaikie 1 Intro Python Thursday 2 pass2 Joe Wilson 2 js Tuesday 2 pass2 Joe Wilson 2 php Monday 3 xxyx Jenny Warner 3 mongodb Thursday 3 xxyx Jenny Warner 3 sql Wednesday 4 yyyy Xavier Yellen
Note the missing data on the right half of the last line. The right table instructor_classes had no data for instructor id 4.
all rows from the "right" table, and matching rows in the left table
A right join includes primary keys from the "right" table (this means the table mentioned in the JOIN clause) and will include only those rose in the left table that share the same keys as those in the right.
Unfortunately, SQLite does not support RIGHT JOIN (although many other databases do). The workaround is to use a LEFT JOIN and reverse the table names.
sqlite3> SELECT * FROM instructor_classes LEFT JOIN instructors ON instructors.instructor_id = instructor_classes.instructor_id; instructor_id class_name day instructor_id password first_name last_name ------------- ------------ ---------- ------------- ---------- ---------- ---------- 1 Intro Python Thursday 1 pass1 David Blaikie 1 Advanced Pyt Monday 1 pass1 David Blaikie 2 php Monday 2 pass2 Joe Wilson 2 js Tuesday 2 pass2 Joe Wilson 3 sql Wednesday 3 xxyx Jenny Warner 3 mongodb Thursday 3 xxyx Jenny Warner 99 Golang Saturday
Now only rows that appear in instructor_classes appear in this table, and data not found in instructors is missing (In this case, Golang has no instructor and it is given the default id 99).
Select only PKs common to both tables, or all PKs for all tables
INNER JOIN: rows common to both tables
An inner join includes only those rows that have primary key values that are common to both tables:
sqlite3> SELECT * from instructor_classes INNER JOIN instructors ON instructors.instructor_id = instructor_classes.instructor_id; instructor_id class_name day instructor_id password first_name last_name ------------- ------------ ---------- ------------- ---------- ---------- ---------- 1 Intro Python Thursday 1 pass1 David Blaikie 1 Advanced Pyt Monday 1 pass1 David Blaikie 2 php Monday 2 pass2 Joe Wilson 2 js Tuesday 2 pass2 Joe Wilson 3 sql Wednesday 3 xxyx Jenny Warner 3 mongodb Thursday 3 xxyx Jenny Warner
Rows are joined where both instructors and instructor_classes have data. OUTER JOIN: all rows from both tables
An outer join includes all rows from both tables, regardless of whether a PK id appears in the other table. Here's what the query would be if sqlite3 supported outer joins:
SELECT * from instructor_classes OUTER JOIN instructors ON instructors.instructor_id = instructor_classes.instructor_id;
unfortunately, OUTER JOIN is not currently supported in sqlite3. In these cases it's probably best to use another approach, i.e. built-in Python or pandas merge() (to come).
"Aggregation" means counting, summing or otherwise summarizing multiple values based on a common key.
Consider summing up a count of voters by their political affiliation (2m Democrats, 2m Republicans, .3m Independents), a sum of revenue of companies by their sector (manufacturing, services, etc.), or an average GPA by household income. All of these require taking into account the individual values of multiple rows and compiling some sort of summary value based on those values.
Here is a sample that we'll play with:
sqlite3> SELECT date, name, rev FROM companyrev; date name rev ---------- ----------- ---------- 2019-01-03 Alpha Corp. 10 2019-01-05 Alpha Corp. 20 2019-01-03 Beta Corp. 5 2019-01-07 Beta Corp. 7 2019-01-09 Beta Corp. 3
If we wish to sum up values by company, we can say it easily:
sqlite3> SELECT name, sum(rev) FROM companyrev GROUP BY name; name sum(rev) ----------- ---------- Alpha Corp. 30 Beta Corp. 15
If we wish to count the number entries for each company, we can say it just as easily:
sqlite3> SELECT name, count(name) FROM companyrev GROUP BY name; name count(name) ----------- ----------- Alpha Corp. 2 Beta Corp. 3
This is SQL's way of sorting results.
The ORDER BY clause indicates a single column, or multiple columns, by which we should order our results:
sqlite3> SELECT name, rev FROM companyrev ORDER BY rev; name rev ---------- ---------- Beta Corp. 3 Beta Corp. 5 Beta Corp. 7 Alpha Corp 10 Alpha Corp 20