Python 3

home

Introduction to Python

davidbpython.com

Optional: modules for accessing databases, CSV, SQL, JSON and the internet

Importing Python Modules

A module is Python code (a code library) that we can import and use in our own code -- to do specific types of tasks.

import csv           # make csv (a library module) part of our code

fh = open('thisfile.csv')
reader = csv.reader(fh)

for row in reader:
    print(row)

Once a module is imported, its Python code is made available to our code. We can then call specialized functions and use objects to accomplish specialized tasks. Python's module support is profound and extensive. Modules can do powerful things, like manipulate image or sound files, munge and process huge blocks of data, do statistical modeling and visualization (charts) and much, much, much more. The Python 3 Standard Library documentation can be found at https://docs.python.org/3/library/index.html Python 2 Standard Library: https://docs.python.org/2.7/library/index.html

CSV

The CSV module parses CSV files, splitting the lines for us. We read the CSV object in the same way we would a file object.

import csv
fh = open('students.txt', 'rb')  # second argument: default "read"
reader = csv.reader(fh)

next(fh)                  # skip one row (useful for header lines)

for record in reader:     # loop through each row
    print(f'id:{record[0]};  fname:{record[1]}; lname: {record[2]}')

fh.close()

This module takes into account more advanced CSV formatting, such as quotation marks (which are used to allow commas within data.) The second argument to open() ('rb') is sometimes necessary when the csv file comes from Excel, which output newlines in the Windows format (\r\n), and can confuse the csv reader.

Writing is similarly easy:

import csv
wfh = open('some.csv', 'w', newline='')
writer = csv.writer(wfh)
writer.writerow(['some', 'values', "boy, don't you like long field values?"])
writer.writerows([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']])
wfh.close()

Please be advised that you will not see writes to a file until you close the file with fh.close() or until the program ends execution. (newline='' is necessary when opening the write file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.)

sqlite3: local file-based relational database

An sqlite3 lightweight database instance is built into Python and accessible through SQL statements. It can act as a simple storage solution, or can be used to prototype database interactivity in your Python script and later be ported to a production database like MySQL, Postgres or Oracle.

Keep in mind that the interface to your relational fdatabase will be the same or similar to the one presented here with the file-based one.

import sqlite3
conn = sqlite3.connect('example.db')  # a db connection object

c = conn.cursor()                     # a cursor object for issuing queries

Once a cursor object is established, SQL can be used to write to or read from the database:

c.execute('''CREATE TABLE stocks
             (date text, trans text, symbol text, qty real, price real)''')

Note that sqlite3 datatypes are nonstandard and don't reflect types found in databases such as MySQL: INTEGER: all int types (TINYINT, BIGINT, INT, etc.) REAL: FLOAT, DOUBLE, REAL, etc. NUMERIC: DECIMAL, BOOLEAN, DATE, DATETIME, NUMERIC TEXT: CHAR, VARCHAR, etc. BLOB: BLOB (non-typed (binary) data, usually large)

Insert a row of data

c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

Larger example that inserts many records at a time

purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
             ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
             ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
            ]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)

Commit the changes -- this actually executes the insert

conn.commit()

Retrieve single row of data

t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)

tuple_row = c.fetchone()
print(tuple_row)               # (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)

Retrieve multiple rows of data

for tuple_row in c.execute('SELECT * FROM stocks ORDER BY price'):
    print(tuple_row)

### (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
### (u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
### (u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
### (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)

Close the database

conn.close()

Using the requests Module to Make an HTTP Browser Request

A Python program can take the place of a browser, requesting and downloading CSV, HTML pages and other files.

Your Python program can work like a web spider (for example visiting every page on a website looking for particular data or compiling data from the site), can visit a page repeatedly to see if it has changed, can visit a page once a day to compile information for that day, etc.

Basic Example: Download and Save Data

import requests

url = 'https://www.python.org/dev/peps/pep-0020/'   # the Zen of Python (PEP 20)

response = requests.get(url)     # a response object

text = response.text             # text of response


# writing the response to a local file -
# you can open this file in a browser to see it
wfh = open('pep_20.html', 'w')
wfh.write(text)
wfh.close()

More Complex Example: Send Headers, Parameters, Body; Receive Status, Headers, Body

import requests

url = 'http://davidbpython.com/cgi-bin/http_reflect'   # my reflection program

div_bar = '=' * 10


# headers, parameters and message data to be passed to request
header_dict =  { 'Accept': 'text/plain' }          # change to 'text/html' for an HTML response
param_dict =   { 'key1': 'val1', 'key2': 'val2' }
data_dict =    { 'text1': "We're all out of gouda." }


# a GET request (change to .post for a POST request)
response = requests.get(url, headers=header_dict,
                             params=param_dict,
                             data = data_dict)


response_status = response.status_code   # status of the response (OK, Not Found, etc.)

response_headers = response.headers      # headers sent by the server

response_text = response.text            # body sent by server


# outputting response elements (status, headers, body)

# response status
print(f'{div_bar} response status {div_bar}\n')
print(response_status)
print(); print()

# response headers
print(f'{div_bar} response headers {div_bar}\n')
for key in response_headers:
    print(f'{key}:  {response_headers[key]}\n')
print()

# response body
print(f'{div_bar} response body {div_bar}\n')
print(response_text)

Note that if import requests raises a ModuleNotFoundError exception, requests must be installed: Mac: open the Terminal program and issue this command: pip3 install requests Windows: open the Command Prompt program and issue the following command: pip install requests If you have any problems with these commands, please let me know!

Using requests to read CSV and JSON Data

Specific techniques for reading the most common data formats.

CSV: feed string response to .splitlines(), then to csv.reader:

import requests
import csv

url = 'path to csv file'

response = requests.get(url)
text = response.text

lines = text.splitlines()
reader = csv.reader(lines)

for row in reader:
    print(row)

JSON: requests accesses built-in support:

import requests

url = 'path to json file'

response = requests.get(url)

obj = response.json()

print(type(obj))          # <class 'dict'>

Alternative to requests: the urllib module

If the requests module cannot be installed, this module is part of the standard distribution.

urllib2 is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution. urllib is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution.

The urlopen method takes a url and returns a file-like object that can be read() as a file:

import urllib.request
my_url = 'http://www.yahoo.com'
readobj = urllib.request.urlopen(my_url)  # return a 'file-like' object
text = readobj.read()                     # read into a 'byte string'
# text = text.decode('utf-8')             # optional, sometimes required:
                                          # decode as a 'str' (see below)
readobj.close()

Alternatively, you can call readlines() on the object (keep in mind that many objects that can deliver file-like string output can be read with this same-named method):

for line in readobj.readlines():
  print(line)
readobj.close()

Parsing CSV Files Downloaded CSV files should be parsed with the CSV module, as CSV can be more complex than just comma separators.

The csv.reader() function usually requires a file object, but we can also pass a list of lines to it:

readobj = urllib.request.urlopen(my_url, context=ctx)   # file
text = readobj.read()                                   # bytes, entire download
text = text.decode('utf-8')                             # str, entire download
lines = text.splitlines()                               # list of str (lines)

reader = csv.reader(lines)

for row in reader:
    print(row)

For discussion of potential issues with using urllib, please see the unit titled "Supplementary Modules: CSV, SQL, JSON and the Internet". POTENTIAL ERRORS AND REMEDIES WITH urllib

TypeError mentioning 'bytes' -- sample exception messages:

TypeError: can't use a string pattern on a bytes-like object
TypeError: must be str, not bytes
TypeError: can't concat bytes to str

These errors indicate that you tried to use a byte string where a str is appropriate.

The urlopen() response usually comes to us as a special object called a byte string. In order to work with the response as a string, we can use the decode() method to convert it into a string with an encoding.

text = text.decode('utf-8')

'utf-8' is the most common encoding, although others ('ascii', 'utf-16', 'utf-32' and more) may be required. I have found that we do not always need to convert (depending on what you will be doing with the returned string) which is why I commented out the line in the first example. SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
readobj = urllib.request.urlopen(my_url, context=ctx)

Encoding Parameters: urllib.requests.urlencode()

When including parameters in our requests, we must encode them into our request URL. The urlencode() method does this nicely:

import urllib.request, urllib.parse

params = urllib.parse.urlencode({'choice1': 'spam and eggs',
                                 'choice2': 'spam, spam, bacon and spam'})
print("encoded query string: ", params)

this prints:

encoded query string:
choice1=spam+and+eggs&choice2=spam%2C+spam%2C+bacon+and+spam

[pr]