Python 3

home

Introduction to Python

davidbpython.com

Modules in Python

importing built-in modules

Python comes with hundreds of preinstalled modules.

import sys             # find and import the sys
import json            # find and importa the json

print(sys.copyright)   # the .copyright attribute points
                       # to a string with the copyright notice

      # Copyright (c) 2001-2023 Python...


obj = json.loads('{"a": 1, "b": 2}')   # the .loads attribute points to
                                       # a function that reads str JSON data

print(type(obj))    # <class 'dict'>

modules are files that contain reusable Python code for use in our programs
modules are made available through an import statement
we often import several modules in a given script
following import we can access the module code through its attributes
each attribute points to a function, string, list, etc. that is a global variable in the module

other module import patterns

These patterns are purely for convenience when needed.

Abbreviating the name of a module:

import json as js           # 'json' is now referred to as 'js'

obj = js.loads('{"a": 1, "b": 2}')

Importing a module variable into our program directly:

from json import loads      # making the 'loads' function part of the global namespace

obj = loads('{"a": 1, "b": 2}')

Please note that this does not import only a part of the module: the entire module code is still imported.

built-in module examples

Each module has a specific focus.

The sys module has functions that let us work with python's interpreter and how it interacts with the operating system
The os module has functions that let us work with the operating system's files, folders and other processes
The datetime module has functions that let us easily calculate date into the future or past, or compare two dates
The urllib2 module has functions that let us easily make HTTP requests over the internet

user-defined modules

A module of our own design may be saved as a .py file.

messages.py: a simple Python module that prints messages

import sys

def print_warning(msg):
    print(f'Warning!  {msg}')

test.py: a Python script that imports messages.py

import messages

# accessing the print_warning() function
messages.print_warning('Look out!')   # Warning!  Look out!

we can also build our only modules; one way is by creating a .py file with module code
upon import, the entire module is read, compiled and executed
global variables in the module then become attributes of the module

module search path

Python must be told where to find our own custom modules.

To view the currently used module search paths, we can use sys.path

import sys

print(sys.path)        # shows a list of strings, each a directory
                       # where modules can be found

the sys.path list contains all paths that your distribution of Python uses to store modules
to add our own folders (i.e., containing modules we'd like to import) we can augment this list with the PYTHONPATH environment variable (next)

setting the PYTHONPATH system environment variable

Like the PATH for programs, this variable tells Python where to find modules.

when we import a module, Python needs to find it
modules may be located in any of several directories (stored in sys.path)
to extend this list and add our own directories, we add them to the PYTHONPATH
when Python starts up to run a program, it looks for the PYTHONPATH variable and if found, adds the paths specified there to sys.path
for more specific instructions, please see your supplementary documents.

the python standard distribution of modules

Modules included with Python are installed when Python is installed -- they are always available.

Python provides hundreds of supplementary modules to perform myriad tasks. The modules do not need to be installed because they come bundled in the Python distribution, that is they are installed at the time that Python itself is installed. The documentation for the standard library is part of the official Python docs.

various string-related services
specialized containers (type-specific lists and dicts, pseudohashes, etc.)
math calculations and number generation
file and directory manipulation
persistence (saving data on disk)
data compression and archiving (e.g., creating zip files)
encryption
networking and interprocess (program-to-program) communication
internet tasks: web server, web client, email, file transfer, etc.
XML and HTML parsing
multimedia: audio and image file manipulation
GUI (graphical user interface) development
code testing
etc...

PyPI: the python package index

This index contains links to all modules ever added by anyone to the index.

Search for any module's home page at the PyPI website:

https://pypi.python.org/pypi

PyPI is the definitive index of modules written in Python
there are more than 70,000 projects uploaded there, from serious modules used by millions of developers to half-baked ideas that someone decided to share prematurely
usually, we will not search for modules here; we will hear about a module from an online article or through a search

finding third-party modules

Take some care when installing modules -- it is possible to install nefarious code.

We generally find third-party modules by doing web searches, or from colleagues
When we find a module that meets our needs, we should do some research to make sure it's the one we want and need
Although it's very rare, sometimes bad actors have created modules designed to pass viruses
These modules are named similarly to popular modules
We must always be careful when selecting a module to install

- [demo: searching for powerpoint module, verifying ]

installing modules

Third-party modules must be downloaded and installed into your Python distribution.

Commands to use at the command line:

pip search pandas         # searches for pandas in the PyPI repository
pip install pandas        # installs pandas

third-party modules are not part of the Python distribution but have been created for the public to use
thousands of these modules are free and available on demand
the pip utility is installed along with Python

Featured module: math

The math module handles advanced math calculations.

These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)

A quick look at the module's attributes gives us an idea of what is included:

import math

print(dir(math))

   # ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
   #  'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
   #  'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
   #  'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
   #  'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
   #  'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
   #  'tanh', 'tau', 'trunc']

For example, here are some simple geometry calculations using math:

import math

print(math.pi)                           # 3.141592653589793

radius = 3
circumference = 2 * math.pi * radius     # 18.84955592153876

area = math.pi * radius * radius         # 28.274333882308138

Featured module: statistics

This module provides basic statistical analysis.

Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.

import statistics as stats                       # set a convenient name for the module

values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]

# average value
meanval = stats.mean(values)                     # 4.083333333333333

# "middle" value in a list of sorted values (list does not need to be sorted)
medianval = stats.median(values)                 # 4.0

# average distance of each value from the mean
standev = stats.stdev(values)                    # 1.781640374554423

# square of the standard deviation
varianceval = stats.variance(values)             # 3.1742424242424243

# most common value
modeval = stats.mode(values)                     # 6

Featured module: string

This module provides useful lists of characters.

import string

print(string.ascii_letters)       # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.ascii_lowercase)     # abcdefghijklmnopqrstuvwxyz

print(string.ascii_uppercase)     # ABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.digits)              # 0123456789

print(string.hexdigits)           # 0123456789abcdefABCDEF

print(string.punctuation)         # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

print(string.whitespace)          # \t\n\r\x0b\x0c'   (prints as invisible characters)

Featured module: zipfile

The zipfile module builds, unpacks and inspects .zip archives.

import zipfile as zp

myzip = zp.ZipFile('myzip.zip', 'w')

# add names of files (of course these must exist)
myzip.write('file1.txt')
myzip.write('file2.pdf')
myzip.write('file3.doc')

myzip.close()                     # builds and writes zip file

print('done')

After running the above code and referencing real files, check the session files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.

Featured module: time

The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.

time can be used to sleep (or pause execution) for a set number of seconds:

import time

# pause execution # of seconds
time.sleep(5)

We can also use time to show the current time:

# current time and date
print(time.ctime())                 # Sat May 23 17:10:55 2020

At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).

# read current time in seconds
secs = time.time()                  # 1590257729.297496  (includes milliseconds)

# calculate 24 hours, in seconds (subtract 86,400 seconds)
yestersecs = secs - (60 * 60 * 24)

# show the current time minus 24 hours
print(time.ctime(yestersecs))          # Fri May 22 17:10:55 2020

# a "time struct"
print(time.localtime(yestersecs))
                                    # time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
                                    # tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
                                    # tm_yday=143, tm_isdst=1)

The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.

Featured module: datetime

The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.

import datetime as dt


# build a 'date' object from year, month, day
mydate1 = dt.date(2019, 9, 3)


# build a 'date' object representing today
mydate2 = dt.date.today()


# build a datetime object from year, month, day, hour, minute and second
mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)


# build a datetime object representing right now
mydatetime2 = dt.datetime.now()


# build a datetime object from a formatted string
mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')


# build a "timedelta" (time interval) object:  3 days, 2 hours
myinterval = dt.timedelta(days=3, seconds=7200)


# date objects and intervals can be calculated like math
newdate = mydatetime3 + myinterval

print(newdate)                                # 2019-03-06 00:02:00


# render a date object in a string format
print(newdate.strftime('%Y-%m-%d  (%H:%M)'))  # 2019-03-06 (02:00)

Featured module: random

The random module generates pseudorandom numbers.

'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.

import random

# random float from 0 to 1
myfloat = random.random()        # 0.22845730036901912


# random integer from 1 to 10
num = random.randint(1, 10)


# random choice from a list
x = ['a', 'b', 'c']
choice = random.choice(x)        # 'b'

Featured module: csv

The csv module reads and writes CSV files.

import csv

# reading a CSV file
fh = open('dated_file.csv')
reader = csv.reader(fh)

for row in reader:
    print(row)

fh.close()

# writing to a CSV file
wfh = open('newfile.csv', 'w', newline='')
writer = csv.writer(wfh)

writer.writerow(['a', 'b', 'c'])
writer.writerow(['d', 'e', 'f'])
writer.writerow(['g', 'b', 'i'])

wfh.close()                 # required - otherwise you may not see the writes

(newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits. (With Jupyter notebooks or the Python interactive interpreter, the unclosed file will not see changes until after the interpreter is closed.)

Featured module: sqlite3

The sqlite3 module allows file-based writing and reading of relational tables.

# connecting
import sqlite3

conn = sqlite3.connect('mydatabase.db')     # open an existing, or create a new file

cur = conn.cursor()


#creating a table
cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")


# insert rows into a table
rows = [
  [ 'Joe', 23, 23.9],
  [ 'Marie', 19, 7.95 ],
  [ 'Zoe', 29, 17.5 ]
]

for row in rows:
    cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)

conn.commit()                                # essential to see the write


# selecting data from a table
cur = conn.cursor()

cur.execute('SELECT name, years, balance FROM mytable')

for row in cur:
    print(row)            # ('Joe', 23, 23.9)
                          # ('Marie', 19, 7.95)
                          # ('Zoe', 29, 17.5)

Featured module: requests

requests

requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.

import requests

# make URL request; download the response
response = requests.get('http://www.nytimes.com')

# the HTTP response code (200 OK, 404 not found, 500 error, etc.)
status_code = response.status_code

# the text of the response
page_text =   response.text

# decoding the text of the response (if necessary)
page_text = page_text.encode('utf-8')

print(f'status code:  {status_code}')
print('======================= page text =======================')
print(page_text)

Featured module: urllib

If requests is not available on your system, urllib provides similar functionality.

import urllib

# make URL request; download the text of the response
read_object = urllib.request.urlopen('http://www.nytimes.com')

# a file-like object, can also 'for' loop or use .readlines()
text = read_object.read()

# decoding the text of the response (if necessary)
text = text.decode('utf-8')

SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
read_object = urllib.request.urlopen(my_url, context=ctx)

Featured module: bs4 (Beautiful Soup)

The bs4 module can parse HTML to extract data from web pages.

This module must be installed separately.

import bs4

fh = open('dormouse.html')
text = fh.read()
fh.close()

soup = bs4.BeautifulSoup(text, 'html.parser')


# show all plain text in a page
print(soup.get_text())


# retrieve first tag with this name (a <title> tag)
tag = page.title


# same, using <B>.find()
tag = page.find('title')


# find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
link1_a_tags = soup.find_all('a', {'id': 'link1'})


# find all <a> tags (hyperlinks)
tags = soup.find_all('a')

Featured module: re (regular expressions)

The re module can recognize patterns in text and extract portions of text based on patterns.

import re

line = 'a phone number:  213-298-1990'

matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)

print(matchobj.group(1))   # '213-298-1990'

The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.

Featured module: textwrap

The textwrap module allows you to wrap text at a certain width.

import textwrap

text = "This is some really long text that we would like to wrap.  Wouldn't you know it, there's a module for that!  "


# returns a list of lines
# text is limited to 10 characters width
items = textwrap.wrap(text, 10)


# join lines together into multi-line string with new width
print('\n'.join(items))

pandas for table manipulation

The pandas module enables table manipulations similar to those done by excel relational databases.

The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.

pandas can read and write to and from a multitude of formats

import pandas as pd
import sqlite3

# read from multiple formats to a DataFrame
df = pd.read_csv('dated_file.csv')
# df = pd.read_excel('dated_file.xls')
# df = pd.read_json('dated_file.json')

# write DataFrame to multiple formats
df.to_csv('new_file.csv')
# df.to_excel('new_file.xls')
# df.to_json('new_file.json')


# read from database through query
conn = sqlite3.connect('testdb.db')
df = pd.read_sql('SELECT * FROM test', conn)

pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:

df = pd.read_csv('dated_file.csv')


# select rows thru a filter
df2 = df[ df[3] > 18 ]      # all rows where the field in column '3' (4th column) is > 18


# sum, average, etc. a column
df.tax.mean()                         # average values in 'tax' column
df.revenue.sum()                      # sum values in 'revenue' column


# create a new column
df['col99'] = df.col1 + df.revenue   # new column sums 'col1' and 'revenue' field from each line


# groupby aggregation
dfgb = df.state.groupby.sum().revenue       # show sum of revenue for each state

pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.

# groupby bar chart
dfgb.plot().bar()

# weather temp line chart
weather_df.temp.plot().line()

[pr]