Python 3

home

Introduction to Python

davidbpython.com

Useful Modules

Useful Modules: Introduction

This slide deck contains basic documentation on some of the most useful modules in the Python standard distribution. There are many more!

As you know, a module is Python code stored in a separate file or files that we can import into our code, to help us do specialized work. The Python documentation lists modules that come installed with Python (collectively, these modules are known as the "Standard Library"). Every module demonstrated below has many features and options. You can refer to documentation, or an article or blog post, to learn more about each.

Featured module: math

The math module handles advanced math calculations.

These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)

A quick look at the module's attributes gives us an idea of what is included:

import math

print(dir(math))

   # ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
   #  'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
   #  'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
   #  'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
   #  'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
   #  'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
   #  'tanh', 'tau', 'trunc']

For example, here are some simple geometry calculations using math:

import math

print(math.pi)                           # 3.141592653589793

radius = 3
circumference = 2 * math.pi * radius     # 18.84955592153876

area = math.pi * radius * radius         # 28.274333882308138

Featured module: statistics

This module provides basic statistical analysis.

Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.

import statistics as stats                       # set a convenient name for the module

values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]

# average value
meanval = stats.mean(values)                     # 4.083333333333333

# "middle" value in a list of sorted values (list does not need to be sorted)
medianval = stats.median(values)                 # 4.0

# average distance of each value from the mean
standev = stats.stdev(values)                    # 1.781640374554423

# square of the standard deviation
varianceval = stats.variance(values)             # 3.1742424242424243

# most common value
modeval = stats.mode(values)                     # 6

Featured module: string

This module provides useful lists of characters.

import string

print(string.ascii_letters)       # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.ascii_lowercase)     # abcdefghijklmnopqrstuvwxyz

print(string.ascii_uppercase)     # ABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.digits)              # 0123456789

print(string.hexdigits)           # 0123456789abcdefABCDEF

print(string.punctuation)         # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

print(string.whitespace)          # \t\n\r\x0b\x0c'   (prints as invisible characters)

Featured module: zipfile

The zipfile module builds, unpacks and inspects .zip archives.

import zipfile as zp

myzip = zp.ZipFile('myzip.zip', 'w')

# add names of files (of course these must exist)
myzip.write('file1.txt')
myzip.write('file2.pdf')
myzip.write('file3.doc')

myzip.close()                     # builds and writes zip file

print('done')

After running the above code and referencing real files, check this unit's files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.

Featured module: time

The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.

time can be used to sleep (or pause execution) for a set number of seconds:

import time

# pause execution # of seconds
time.sleep(5)

We can also use time to show the current time:

# current time and date
print(time.ctime())                 # Sat May 23 17:10:55 2020

At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).

# read current time in seconds
secs = time.time()                  # 1590257729.297496  (includes milliseconds)

# calculate 24 hours, in seconds (subtract 86,400 seconds)
yestersecs = secs - (60 * 60 * 24)

# show the current time minus 24 hours
print(time.ctime(yestersecs))          # Fri May 22 17:10:55 2020

# a "time struct"
print(time.localtime(yestersecs))
                                    # time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
                                    # tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
                                    # tm_yday=143, tm_isdst=1)

The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.

Featured module: datetime

The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.

import datetime as dt


# build a 'date' object from year, month, day
mydate1 = dt.date(2019, 9, 3)


# build a 'date' object representing today
mydate2 = dt.date.today()


# build a datetime object from year, month, day, hour, minute and second
mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)


# build a datetime object representing right now
mydatetime2 = dt.datetime.now()


# build a datetime object from a formatted string
mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')


# build a "timedelta" (time interval) object:  3 days, 2 hours
myinterval = dt.timedelta(days=3, seconds=7200)


# date objects and intervals can be calculated like math
newdate = mydatetime3 + myinterval

print(newdate)                                # 2019-03-06 00:02:00


# render a date object in a string format
print(newdate.strftime('%Y-%m-%d  (%H:%M)'))  # 2019-03-06 (02:00)

Featured module: random

The random module generates pseudorandom numbers.

'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.

import random

# random float from 0 to 1
myfloat = random.random()        # 0.22845730036901912


# random integer from 1 to 10
num = random.randint(1, 10)


# random choice from a list
x = ['a', 'b', 'c']
choice = random.choice(x)        # 'b'

Featured module: csv

The csv module reads and writes CSV files.

import csv

# reading a CSV file
fh = open('dated_file.csv')
reader = csv.reader(fh)

for row in reader:
    print(row)

fh.close()

# writing to a CSV file
wfh = open('newfile.csv', 'w', newline='')
writer = csv.writer(wfh)

writer.writerow(['a', 'b', 'c'])
writer.writerow(['d', 'e', 'f'])
writer.writerow(['g', 'b', 'i'])

wfh.close()                 # essential - otherwise you may not see the writes until the program exits

(newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits.

Featured module: sqlite3

The sqlite3 module allows file-based writing and reading of relational tables.

# connecting
import sqlite3

conn = sqlite3.connect('mydatabase.db')     # open an existing, or create a new file

cur = conn.cursor()


#creating a table
cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")


# insert rows into a table
rows = [
  [ 'Joe', 23, 23.9],
  [ 'Marie', 19, 7.95 ],
  [ 'Zoe', 29, 17.5 ]
]

for row in rows:
    cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)

conn.commit()                                # essential to see the write


# selecting data from a table
cur = conn.cursor()

cur.execute('SELECT name, years, balance FROM mytable')

for row in cur:
    print(row)            # ('Joe', 23, 23.9)
                          # ('Marie', 19, 7.95)
                          # ('Zoe', 29, 17.5)

Featured module: requests

requests

requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.

import requests

# make URL request; download the response
response = requests.get('http://www.nytimes.com')

# the HTTP response code (200 OK, 404 not found, 500 error, etc.)
status_code = response.status_code

# the text of the response
page_text =   response.text

# decoding the text of the response (if necessary)
page_text = page_text.encode('utf-8')

print(f'status code:  {status_code}')
print('======================= page text =======================')
print(page_text)

Featured module: urllib

If requests is not available on your system, urllib provides similar functionality.

import urllib

# make URL request; download the text of the response
read_object = urllib.request.urlopen('http://www.nytimes.com')

# a file-like object, can also 'for' loop or use .readlines()
text = read_object.read()

# decoding the text of the response (if necessary)
text = text.decode('utf-8')

SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
read_object = urllib.request.urlopen(my_url, context=ctx)

Featured module: bs4 (Beautiful Soup)

The bs4 module can parse HTML to extract data from web pages.

This module must be installed separately.

import bs4

fh = open('dormouse.html')
text = fh.read()
fh.close()

soup = bs4.BeautifulSoup(text, 'html.parser')


# show all plain text in a page
print(soup.get_text())


# retrieve first tag with this name (a <title> tag)
tag = page.title


# same, using <B>.find()
tag = page.find('title')


# find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
link1_a_tags = soup.find_all('a', {'id': 'link1'})


# find all <a> tags (hyperlinks)
tags = soup.find_all('a')

Featured module: re (regular expressions)

The re module can recognize patterns in text and extract portions of text based on patterns.

import re

line = 'a phone number:  213-298-1990'

matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)

print(matchobj.group(1))   # '213-298-1990'

The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.

Featured module: subprocess

The subprocess module allows your program to launch other programs / applications.

import subprocess


# execute another program; read from STDIN and write to STDOUT
subprocess.call(['ls', 'path/to/my/dir'])


# execute another Python script
subprocess.call(['python', 'hello.py'])


# execute another program and capture output
out = subprocess.check_output(['python', 'hello.py'])

Featured module: textwrap

The textwrap module allows you to wrap text at a certain width.

import textwrap

text = "This is some really long text that we would like to wrap.  Wouldn't you know it, there's a module for that!  "


# returns a list of lines
# text is limited to 10 characters width
items = textwrap.wrap(text, 10)


# join lines together into multi-line string with new width
print('\n'.join(items))

pandas for table manipulation

The pandas module enables table manipulations similar to those done by excel relational databases.

The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.

pandas can read and write to and from a multitude of formats

import pandas as pd
import sqlite3

# read from multiple formats to a DataFrame
df = pd.read_csv('dated_file.csv')
# df = pd.read_excel('dated_file.xls')
# df = pd.read_json('dated_file.json')

# write DataFrame to multiple formats
df.to_csv('new_file.csv')
# df.to_excel('new_file.xls')
# df.to_json('new_file.json')


# read from database through query
conn = sqlite3.connect('testdb.db')
df = pd.read_sql('SELECT * FROM test', conn)

pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:

df = pd.read_csv('dated_file.csv')


# select rows thru a filter
df2 = df[ df[3] > 18 ]      # all rows where '3' field is > 18


# sum, average, etc. a column
df.tax.mean()                         # average values in 'tax' column
df.revenue.sum()                      # sum values in 'revenue' column


# create a new column
df['col99'] = df.col1 + df.revenue   # new column sums 'col1' and 'revenue' field from each line


# groupby aggregation
dfgb = df.state.groupby.sum().revenue       # show sum of revenue for each state

pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.

# groupby bar chart
dfgb.plot().bar()

# weather temp line chart
weather_df.temp.plot().line()

[pr]