Python 3home |
Introduction to Python
davidbpython.com
Python comes with hundreds of preinstalled modules.
import sys # find and import the sys
import json # find and importa the json
print(sys.copyright) # the .copyright attribute points
# to a string with the copyright notice
# Copyright (c) 2001-2023 Python...
obj = json.loads('{"a": 1, "b": 2}') # the .loads attribute points to
# a function that reads str JSON data
print(type(obj)) # <class 'dict'>
These patterns are purely for convenience when needed.
Abbreviating the name of a module:
import json as js # 'json' is now referred to as 'js'
obj = js.loads('{"a": 1, "b": 2}')
Importing a module variable into our program directly:
from json import loads # making the 'loads' function part of the global namespace
obj = loads('{"a": 1, "b": 2}')
Please note that this does not import only a part of the module: the entire module code is still imported.
Each module has a specific focus.
A module of our own design may be saved as a .py file.
messages.py: a simple Python module that prints messages
import sys
def print_warning(msg):
print(f'Warning! {msg}')
test.py: a Python script that imports messages.py
import messages
# accessing the print_warning() function
messages.print_warning('Look out!') # Warning! Look out!
Python must be told where to find our own custom modules.
To view the currently used module search paths, we can use sys.path
import sys
print(sys.path) # shows a list of strings, each a directory
# where modules can be found
Like the PATH for programs, this variable tells Python where to find modules.
Modules included with Python are installed when Python is installed -- they are always available.
Python provides hundreds of supplementary modules to perform myriad tasks. The modules do not need to be installed because they come bundled in the Python distribution, that is they are installed at the time that Python itself is installed. The documentation for the standard library is part of the official Python docs.
This index contains links to all modules ever added by anyone to the index.
Search for any module's home page at the PyPI website:
https://pypi.python.org/pypi
Take some care when installing modules -- it is possible to install nefarious code.
- [demo: searching for powerpoint module, verifying ]
Third-party modules must be downloaded and installed into your Python distribution.
Commands to use at the command line:
pip search pandas # searches for pandas in the PyPI repository pip install pandas # installs pandas
The math module handles advanced math calculations.
These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)
A quick look at the module's attributes gives us an idea of what is included:
import math
print(dir(math))
# ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
# 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
# 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
# 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
# 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
# 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
# 'tanh', 'tau', 'trunc']
For example, here are some simple geometry calculations using math:
import math
print(math.pi) # 3.141592653589793
radius = 3
circumference = 2 * math.pi * radius # 18.84955592153876
area = math.pi * radius * radius # 28.274333882308138
This module provides basic statistical analysis.
Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.
import statistics as stats # set a convenient name for the module
values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]
# average value
meanval = stats.mean(values) # 4.083333333333333
# "middle" value in a list of sorted values (list does not need to be sorted)
medianval = stats.median(values) # 4.0
# average distance of each value from the mean
standev = stats.stdev(values) # 1.781640374554423
# square of the standard deviation
varianceval = stats.variance(values) # 3.1742424242424243
# most common value
modeval = stats.mode(values) # 6
This module provides useful lists of characters.
import string
print(string.ascii_letters) # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.ascii_lowercase) # abcdefghijklmnopqrstuvwxyz
print(string.ascii_uppercase) # ABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.digits) # 0123456789
print(string.hexdigits) # 0123456789abcdefABCDEF
print(string.punctuation) # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
print(string.whitespace) # \t\n\r\x0b\x0c' (prints as invisible characters)
The zipfile module builds, unpacks and inspects .zip archives.
import zipfile as zp
myzip = zp.ZipFile('myzip.zip', 'w')
# add names of files (of course these must exist)
myzip.write('file1.txt')
myzip.write('file2.pdf')
myzip.write('file3.doc')
myzip.close() # builds and writes zip file
print('done')
After running the above code and referencing real files, check the session files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.
The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.
time can be used to sleep (or pause execution) for a set number of seconds:
import time
# pause execution # of seconds
time.sleep(5)
We can also use time to show the current time:
# current time and date
print(time.ctime()) # Sat May 23 17:10:55 2020
At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).
# read current time in seconds
secs = time.time() # 1590257729.297496 (includes milliseconds)
# calculate 24 hours, in seconds (subtract 86,400 seconds)
yestersecs = secs - (60 * 60 * 24)
# show the current time minus 24 hours
print(time.ctime(yestersecs)) # Fri May 22 17:10:55 2020
# a "time struct"
print(time.localtime(yestersecs))
# time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
# tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
# tm_yday=143, tm_isdst=1)
The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.
The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.
import datetime as dt
# build a 'date' object from year, month, day
mydate1 = dt.date(2019, 9, 3)
# build a 'date' object representing today
mydate2 = dt.date.today()
# build a datetime object from year, month, day, hour, minute and second
mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)
# build a datetime object representing right now
mydatetime2 = dt.datetime.now()
# build a datetime object from a formatted string
mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')
# build a "timedelta" (time interval) object: 3 days, 2 hours
myinterval = dt.timedelta(days=3, seconds=7200)
# date objects and intervals can be calculated like math
newdate = mydatetime3 + myinterval
print(newdate) # 2019-03-06 00:02:00
# render a date object in a string format
print(newdate.strftime('%Y-%m-%d (%H:%M)')) # 2019-03-06 (02:00)
The random module generates pseudorandom numbers.
'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.
import random
# random float from 0 to 1
myfloat = random.random() # 0.22845730036901912
# random integer from 1 to 10
num = random.randint(1, 10)
# random choice from a list
x = ['a', 'b', 'c']
choice = random.choice(x) # 'b'
The csv module reads and writes CSV files.
import csv
# reading a CSV file
fh = open('dated_file.csv')
reader = csv.reader(fh)
for row in reader:
print(row)
fh.close()
# writing to a CSV file
wfh = open('newfile.csv', 'w', newline='')
writer = csv.writer(wfh)
writer.writerow(['a', 'b', 'c'])
writer.writerow(['d', 'e', 'f'])
writer.writerow(['g', 'b', 'i'])
wfh.close() # required - otherwise you may not see the writes
(newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits. (With Jupyter notebooks or the Python interactive interpreter, the unclosed file will not see changes until after the interpreter is closed.)
The sqlite3 module allows file-based writing and reading of relational tables.
# connecting
import sqlite3
conn = sqlite3.connect('mydatabase.db') # open an existing, or create a new file
cur = conn.cursor()
#creating a table
cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")
# insert rows into a table
rows = [
[ 'Joe', 23, 23.9],
[ 'Marie', 19, 7.95 ],
[ 'Zoe', 29, 17.5 ]
]
for row in rows:
cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)
conn.commit() # essential to see the write
# selecting data from a table
cur = conn.cursor()
cur.execute('SELECT name, years, balance FROM mytable')
for row in cur:
print(row) # ('Joe', 23, 23.9)
# ('Marie', 19, 7.95)
# ('Zoe', 29, 17.5)
requests
requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.
import requests
# make URL request; download the response
response = requests.get('http://www.nytimes.com')
# the HTTP response code (200 OK, 404 not found, 500 error, etc.)
status_code = response.status_code
# the text of the response
page_text = response.text
# decoding the text of the response (if necessary)
page_text = page_text.encode('utf-8')
print(f'status code: {status_code}')
print('======================= page text =======================')
print(page_text)
If requests is not available on your system, urllib provides similar functionality.
import urllib
# make URL request; download the text of the response
read_object = urllib.request.urlopen('http://www.nytimes.com')
# a file-like object, can also 'for' loop or use .readlines()
text = read_object.read()
# decoding the text of the response (if necessary)
text = text.decode('utf-8')
SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
my_url = 'http://www.nytimes.com'
read_object = urllib.request.urlopen(my_url, context=ctx)
The bs4 module can parse HTML to extract data from web pages.
This module must be installed separately.
import bs4
fh = open('dormouse.html')
text = fh.read()
fh.close()
soup = bs4.BeautifulSoup(text, 'html.parser')
# show all plain text in a page
print(soup.get_text())
# retrieve first tag with this name (a <title> tag)
tag = page.title
# same, using <B>.find()
tag = page.find('title')
# find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
link1_a_tags = soup.find_all('a', {'id': 'link1'})
# find all <a> tags (hyperlinks)
tags = soup.find_all('a')
The re module can recognize patterns in text and extract portions of text based on patterns.
import re
line = 'a phone number: 213-298-1990'
matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)
print(matchobj.group(1)) # '213-298-1990'
The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.
The textwrap module allows you to wrap text at a certain width.
import textwrap
text = "This is some really long text that we would like to wrap. Wouldn't you know it, there's a module for that! "
# returns a list of lines
# text is limited to 10 characters width
items = textwrap.wrap(text, 10)
# join lines together into multi-line string with new width
print('\n'.join(items))
The pandas module enables table manipulations similar to those done by excel relational databases.
The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.
pandas can read and write to and from a multitude of formats
import pandas as pd
import sqlite3
# read from multiple formats to a DataFrame
df = pd.read_csv('dated_file.csv')
# df = pd.read_excel('dated_file.xls')
# df = pd.read_json('dated_file.json')
# write DataFrame to multiple formats
df.to_csv('new_file.csv')
# df.to_excel('new_file.xls')
# df.to_json('new_file.json')
# read from database through query
conn = sqlite3.connect('testdb.db')
df = pd.read_sql('SELECT * FROM test', conn)
pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:
df = pd.read_csv('dated_file.csv')
# select rows thru a filter
df2 = df[ df[3] > 18 ] # all rows where the field in column '3' (4th column) is > 18
# sum, average, etc. a column
df.tax.mean() # average values in 'tax' column
df.revenue.sum() # sum values in 'revenue' column
# create a new column
df['col99'] = df.col1 + df.revenue # new column sums 'col1' and 'revenue' field from each line
# groupby aggregation
dfgb = df.state.groupby.sum().revenue # show sum of revenue for each state
pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.
# groupby bar chart
dfgb.plot().bar()
# weather temp line chart
weather_df.temp.plot().line()