Python 3

home

All Slides on One Page

Introduction; Installations and Setup

class goals

learn Python Fundamentals from the ground up
learn practical skills that model what Python programmers do every day
get to know our new partner, the Python Interpreter
think like a coder: learn specific strategies for debugging

about python

Python's popularity is due to its elegance and simplicity.

first released in 1991 by Guido van Rossum
most popular language in use today
designed to be readable, simple, and even beautiful
emphasis on explicitness, consistency and practicality
it is an elegant language -- usually only one way to do something

the zen of python

Do other languages have a manifesto like this one?

The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!

about me: David Blaikie

I am dedicated to student success.

software developer, release engineer, teacher
worked in the IT industry since 1998
worked at Google, AppNexus and DoubleClick (advertising tech)
taught at New York University School of Profesional Studies since 2000
taught at NASA, U.S. Navy, Cisco, Inuit, Salesforce, and many others

about you: welcome!

Prior exposure to Python is helpful, but not required.

You do not have to know anything about Python or programming, but some personal qualities will be very helpful. These are "soft skills" that will benefit you greatly as you proceed:

being observant: considering carefully what you are seeing -- error messages and program output
curiosity: wondering why things work the way they do
patience: considering what you are seeing before trying your program again
thoughtfulness: realizing that you are discovering a system of logic that follows specific rules. Our most important work will be in discovering how Python responds in different situations
wisdom: seeing mistakes as a learning opportunity, and not something only to be avoided
taking ownership: an "owner" wants the language to work for themselves, not just to pass a test

three technical requirements to write and run programs

If you already have an editor and Python installed, you do not need to add these.

editor: PyCharm (or you may use another editor if you are familiar with it)
python: Python distribution from python.org
class files: unzip and open folder in PyCharm (see instructions)

Please keep in mind that if you are already able to write and run Python programs, you only need to add the class files.

the course materials

The zip file contains all files needed for our course exercises.

Please look for the file called python_data.zip in your course files.
Unzip the folder so that it has the following structure:

python_data/
├── session_00_test_project/
├── session_01_objects_types/
├──── inclass_exercises/
│       ├── inclass_1.1.py
│       ├── inclass_1.2.py
│       ├── ..etc..
│       ├── inclass_1.6_lab.py
│       ├── inclass_1.7_lab.py
│       ├── ..etc..
├──── notebooks_inclass_warmup/
├── session_02_funcs_condits_methods/
├──── inclass_exercises/
│       ├── inclass_2.1.py
│       ├── inclass_2.2.py
│       ├── ..etc..
├── session_03_strings_lists_files/
├── session_04_containers_lists_sets/
├── ..etc..
└── session_10_classes/

Place this folder in a location where you can find it
Use PyCharm to open the folder (a project)
Link the project to your version of Python

Your New Partner: the Python Interpreter

what do computers do?

Computers can do many different things for us.

Think about what our computers do for us:

perform numeric calculations
analyze, gather, compose, edit text
move files around, search files
download web pages
send email
process image or sound files
play videos and music
display images
operate storage devices to save files
operate hardware like printers, light switches, automobiles, drones, etc.

what do computers really do?

At base, computers really only do three things.

store data in memory and perform calculations
send messages over a network
operate devices

Python can do many things, but we will focus on the first item -- working with data. The main purpose of any programming language is to allow us to store data in memory and then process that data according to our needs.

programming languages

A programming language like Python is designed to allow us to give instructions to our computer.

your computer understands "machine language", which is a "lower level" programming language
however, machine code is challenging to write
"high level" programming languages were devised to make it easier to communicate with a machine
Python, Java, C, C++, JavaScript, php, Ruby, and C# are all "general purpose" languages
languages like SQL, HTML, CSS, etc. are "domain specific" languages, designed for a specific purpose

the Python Interpreter

The Interpreter is Python Itself.

the Interpreter is the program that processes our Python code
it is what we mean when we use action words: "Python runs the program"; "Python prints to the screen"; "Python raises an error" -- all of these refer to the Interpreter
when we run a Python program, the Interpreter reads our Python code and translates it into machine instructions
as it is translating "Python" into "Machine", this is why we call it an Interpreter
we should think of the Interpreter as our new coding partner. It will execute our code, and tell us when and how we have made an error
the true purpose of any study of Python is to understand the Interpreter

evaluate - compile - run

When we run a python program, the Interpeter takes these three steps.

first, the Interpreter reads the Python code stored in a file
it validates the code, checking for errors in syntax
(a SyntaxError occurs when our code is missing or misplacing elements, like a missing period at the end of a sentence.)
code syntax must be perfect for the code to run
after validating the code, the Interpreter converts it into bytecode
it then executes ("runs") the bytecode, statement by statement until complete

what the interpreter can do

Python is very smart in some ways.

execute code very quickly, sometimes instantly
execute any valid instruction
allocate as much memory as we need for a task
tell us immediately when something goes wrong

what the interpreter can't do

Python is not smart in some ways, too!

understand what we're trying to do or what our program means
tell us when we're doing something inefficiently or incorrectly
explain every error to us or explain exactly what went wrong
tell us what we need to do to fix errors

how to respond to exceptions (errors)

We should seek to understand what the Interpreter is telling us.

most of us think of errors only as problems
when something goes wrong, our first instinct is to just fix the error any way we can and move on
however, exceptions are learning opportunities

This learning is not just about making programs work -- it's about understanding the interpreter -- what it can and can't do.

Executing Programs and Using the Lab Exercises

creating a new script (.py file) in PyCharm

A folder in PyCharm is known as a 'project'.

Open a Folder, which will correspond to a new workspace.

Choose File > Open Folder. A dialog opens to select a folder.
Navigate to your unzipped class folder and select, then click Open.
A new folder is shown at the left margin with your folder name as the workspace name

Add a new file.

Roll over the workspace name on the left, and click the 'new file' icon (the page icon with a 'plus' at the lower left)
Type the name of your file (with a .py extension) and hit [Enter]. The file appears on the left, and the text area for the file appears in the main window on the right, with the first line highlighted and numbered 1.

Create a 'hello, world!' script.

Type the following:

print('hello, world!')
print()

Take care when reproducing the above script - every character must be in its place. (The print() at end is to clarify the Terminal output.) Next, we'll execute the script.

executing a script

PyCharm may be able to run your script, or some configuration may be required.

Attempt to run your script.

Double-click the .py file. The code from the file appears in a larger window on the right.
In the code window, right-click and choose Run [filename].
You may see output at the bottom, including the 'hello, world!' text.
If you see an error popup indicating that a version of Python must be installed: click 'Select Python Interpreter' at the far lower right, and choose your preferred version of Python. Your version must be numbered 3.8 or greater. Make sure not to select 2.7!
If you do not see your version of Python listed, please let me know.

when programs run without error

'Without error' means Python did everything you asked.

On my Mac, I see this output:

hello, world!

Process finished with exit code 0

When you see the terminal prompt repeated, it means that the script has completed executing

when exceptions occur

An 'exception' is when Python cannot, or will not, do everything you asked in you program.

To demonstrate an exception, I removed one character from my code. Here is the result:

   File "/Users/david/test_project/test.py", line 2
     print('hello, world!)
          ^
SyntaxError: unterminated string literal (detected at line 2)

How should we read our exception?

First, look at the CamelCase error type: SyntaxError. This tells us the category of error we are witnessing. This error type indicates there is or are character(s) missing or in the wrong place.
Second, look at the line highlighted by Python: Line 2 (the first print() statement). This is where the error occurred.
Third, ask yourself why this problem was detected on this line.
I can't emphasize enough the importance of going straight to the line indicated by Python, and determining why this error occurred there

Throughout this course I will repeatedly stress that you must identify the exception type, pinpoint the error to the line, and seek to understand the error in terms of the exception type, and where Python says it occurred.

the SyntaxError exception

Some element of the code is misplaced or missing.

print('hello, world!)
print()

File "/Users/david/test_project/test.py", line 2
  print('hello, world!)
        ^
SyntaxError: unterminated string literal (detected at line 2)

How do we respond to a SyntaxError? First by understanding that there's something missing or out of place in the syntax (the proper placement of language elements -- brackets, braces, parentheses, quotes, etc.) We look at the syntax on the line, and compare it to similar examples in other code that we've seen. Careful comparison between our code and working code will usually show us what's missing or misplaced. In the example above, the first print() statement is missing a quotation mark. It might be hard to see at first, but eventually you will develop "eyes" for this kind of error.

writing code: comments and blank lines

Use hash marks to comment individual lines; blank lines are ignored.

# this program adds numbers
var1 = 5
var2 = 2

var3 = var1 + var2        # add these numbers together

# these lines modify the value further
# var3 = var3 * 2
# var3 = var3 / 20

print(var3)

When Python reads this code, it begins by ignoring blank lines
It also ignores any code to the right of a hash mark (#)
Text marked with '#' are called 'comments'
We may want to comment in order to note something in the code (lines 1, 4 and 6), or to disable certain lines temporarily (lines 7 and 8)
It is a common practice to enable and disable some statements in our code while testing

Using the Lab Exercises

We will use some exercises for demos in class; you will use them to practice you skills, and prepare for tests.

python_data/
├── session_00_test_project/
├── session_01_objects_types/
├──── inclass_exercises/
│       ├── inclass_1.1.py
│       ├── inclass_1.2.py
│       ├── ..etc..
│       ├── inclass_1.6_lab.py
│       ├── inclass_1.7_lab.py
│       ├── ..etc..
├──── notebooks_inclass_warmup/
├── session_02_funcs_condits_methods/
├──── inclass_exercises/
│       ├── inclass_2.1.py
│       ├── inclass_2.2.py
│       ├── ..etc..
├── session_03_strings_lists_files/
├── session_04_containers_lists_sets/
├── ..etc..
└── session_10_classes/

The exercises come in two forms:

the journey exercises (named inclass_2.1.py, inclass_2.2.py, etc.)
the lab exercises (named inclass_2.10_lab.py, 2.10_lab.py, etc.)
we will work through many of the journey exercises as we discover and discuss new features of the language
you will have the opportunity to practice your skills using the lab exercises
the solutions to the exercises are on the website.

Creating and Identifying Objects by Type

the variable

A variable is a value assigned ("bound") to an object.

xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)              # print 20 to screen

xx is a variable bound to 10 = is an assignment operator assigning 10 to xx yy is another variable bound to 2 * is a multiplication operator computing its operands (10 and 2) zz is bound to the product, 20 print() is a function that renders its argument to the screen.

the literal: a value typed into our code

early on we need to distinguish between a variable and a literal.

xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)             # print 20 to screen

can you name the 3 variables and 2 literals in this code?

the literal: a value typed into our code

early on we need to distinguish between a variable and a literal.

xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print zz              # print 20 to screen

can you name the 3 variables and 2 literals in this code?
variables: xx, yy, zz. These are names that have been assigned a value.
literals: 10, 2. These are values that have been typed directly into our code.

the object

An object is a data value of a particular type.

Every data value in Python is an object.

var_int = 100                  # assign integer object 100 to variable var_int

var2_float = 100.0             # assign float object 100.0 to variable var2_float

var3_str = 'hello!'            # assign str object 'hello' to variable var3_str

# NOTE:  'hash mark' comments are ignored by Python.

At every point you must be aware of the type and value of every object in your code.

object types for this session

The three object types we'll look at in this unit are int, float and str. They are the "atoms" of Python's data model.

data type	known as	description	example value
int	integer	a whole number	5
float	float	a floating-point number	5.03
str	string	a character sequence, i.e. text	'hello, world!'

sidebar: string literal syntax

The string has 3 ways to enquote -- all produce a string.

s1 = 'hello, quote'
s2 = "hello, quote"

s3 = """hello, quote                # multi-line strings can be expressed with triple-quotes
Sincerely, Python"""


s4 = 'He said "yes!"'               # using single quotes to enquote double quotes
s5 = "Don't worry about that."      # using double quotes to enquote a single quote

double and single quotes are identical in purpose and meaning
this allows us to easily put a single quote in a string (use double) or double quote in a string (use single)
style-wise, we usually prefer single quotes, but the choice is yours

identifying type through syntax

The way a variable is written in the code determines type.

It's vital that we always be aware of type.

myint = 5
myfloat = 5.0
mystr = '5.0'

Other languages (like Java and C) use explicit type declarations to indicate type, for example int a = 5. But Python does not do this.

identifying type through syntax

The way a variable is written in the code determines type.

It's vital that we always be aware of type.

myint = 5             # written as a whole number:  int
myfloat = 5.0         # written with a decimal point:  float
mystr = '5.0'         # written with quotes:  str

Other languages (like Java and C) use explicit type declarations to indicate type, for example int a = 5. But Python does not do this.

can we identify type through printing?

Printing is usually not enough to determine type, since a string can look like any object.

myint = 5
myfloat = 5.0
mystr = '5.0'

print(myint)         # 5
print(myfloat)       # 5.0
print(mystr)         # 5.0

mystr looks like a float, but it is a str.

identifying type through the type() function

If we're not sure, we can always have Python tell us an object's type.

myint = 5
myfloat = 5.0
mystr = '5.0'

print(type(myint))         # <class 'int'>
print(type(myfloat))       # <class 'float'>
print(type(mystr))         # <class 'str'>

python is strongly typed

This means that what an object can do is defined by its type.

a = 5            # int, 5
b = 10.0         # float, 10.0
c = '10.0'       # str, '10.0'

x = a + b        # 15.0           (adding int to float)

y = a + c        # TypeError      (cannot add int to str!)

Even though the value '10.0' looks like a number, it is of type str. Python will not add an int to a str.

variable names

You must follow correct style even though Python does not always require it.

name = 'Joe'
age = 29

my_wordy_variable = 100

student3 = 'jg61'

a variable name must use lowercase letters and the underscore
it may include numbers, but not as the first character in the name
you must not use capital letters, although Python will accept them
within these rules, you may name your variables anything you'd like

Math and String Operators

+, -, *, /: math operators

Math operators behave as you might expect.

var_int = 5
var2_float = 10.3

var3_float = var_int + var2_float    # int plus a float:  15.3, a float

var4_float = var3_float - 0.3    # float minus a float:  15.0, a float

var5_float = var4_float / 3      # float divided by an int:  5.0, a float

identifying type through an operation

Every operation or function call results in a predictable type.

With two integers, the result is integer. If a float is involved, it's always flot.

vari = 7
vari2 = 3
varf = 3.0

var3 = var * var2      # 35, an int.

var4 = var + var2      # 10.0, a float

When an integer is divided into another integer, the result is always a float.

var = 7
var2 = 3

var3 = var / var2      # 2.3333, a float

we usually don't worry too much about ints vs. floats, because they work well together
however, we do want to start thinking about type

** exponentiation operator

The exponentiation operator (**) raises its left operand to the power of its right operand and returns the result as a float or int.

var = 11 ** 2     # "eleven raised to the 2nd power (squared)"
print(var)        # 121

var = 3 ** 4
print(var)        # 81

% Modulus Operator

The modulus operator (%) shows the remainder that would result from division of two numbers.

var = 11 % 2      # "eleven modulo two"
print(var)        # 1   (11/2 has a remainder of 1)


var2 = 10 % 2     # "ten modulo two"
print(var2)       # 0   (10/2 divides evenly:  remainder of 0)

modulus shows the remainder of a division
modulus with 2 is useful because it shows us whether a number is even or odd

+ operator with strings: concatenation

The plus operator (+) with two strings returns a concatenated string.

aa = 'Hello, '
bb = 'World!'

cc = aa + bb     # 'Hello, World!'

Note that this is the same operator (+) that is used with numbers for summing. Python uses the type of the operands (values on either side of the operator) to determine behavior and result.

* operator with one string and one integer: string repetition

The "string repetition operator" (*) creates a new string with the operand string repeated the number of times indicated by the other operand:

aa = '!'
bb = 5

cc = aa * bb       # '!!!!!!'

Note that this is the same operator (*) that is used with numbers for multiplication. Python uses the type of the operands to determine behavior and result.

python's "overloaded" operator +

Object types determine behavior.

int or float "added" to int or float: addition

tt = 5            # assign an integer value to tt
zz = 10.0         # assign a float value to zz

qq = tt + zz      # compute 5 plus 10 and assign float 15.0 to qq

str "added" to str: concatenation

kk = '5'          # assign a str value (quotes mean str) to kk
rr = '10.0'       # assign a str value to rr

mm = kk + rr      # concatenate '5' and '10.0'
                  # to construct a new str object, assign to mm

print(mm)         # '510.0'

the plus operator serves double duty depending on what types are used
we call this type of behavior 'polymorphism'

python's "overloaded" operator *

Again, object types determine behavior.

int or float "multipled" by int or float: multiplication

tt = 5            # assign an integer value to tt
zz = 10           # assign an integer value to zz

qq = tt * zz      # compute 5 times 10 and assign integer 50 to qq
print(qq)         # 50, an int

str "multiplied" by int: string repetition

aa = '5'
bb = 3

cc = aa * bb      # '555'

once again, object types determine what is possible (and not possible) in the language
this is why it's so important to know the type of every variable

Built-In Functions

built-in functions

Built-in functions activate functionality when they are called.

aa = 'hello'        # str, 'hello'

bb = len(aa)        # pass string object aa as an argument to function len(),
                    # which returns an integer object as a return value.

print(bb)            # int, 5

All functions are called: the parentheses after the function name indicate the call.
All functions take argument(s) and return return value(s).
The argument (or comma-separated list of arguments) is placed in parentheses.
The return value of the function call can be assigned to a new variable. (It can also be printed or used in an expression.)

len() function

The len() function takes a string argument and returns an integer -- the length of (number of characters in) the string.

varx = 'hello, world!'

vary = len(varx)        # 13

round() function

The round() function takes a float argument and returns another float, rounded to the specified decimal place.

aa = 5.9583

bb = round(aa, 2)   # 5.96

cc = round(aa)      # 6

with two arguments, the 2nd argument determines the number of decimal places
with one argument, float() rounds to the nearest integer

float precision and the round() function

Some floating-point operations will result in a number with a small remainder:

x = 0.1 + 0.2
print(x)          # 0.30000000000000004  (should be 0.3?)

y = 0.1 + 0.1 + 0.1 - 0.3
print(y)               # 5.551115123125783e-17  (should be 0.0?)

The solution is to round any result

x = 0.1 + 0.2     # 0.30000000000000004

z = round(x, 1)
print(z)          # 0.3

input() function

This function allows us to enter data into the program through the keyboard.

cc = input('enter name:  ')    # program pauses!  Now the user types something

print(cc)                      # [a string, whatever the user typed]

the input() function takes a string message as argument
it displays the string message, and pauses execution
the user (the person running the program) may then enter characters from the keyboard
after the user types [Enter] input() returns a string containing the typed characters

exit() function: terminate the program

The exit() function terminates execution immediately. An optional string argument can be passed as an error message.

aa = input('to quit, press "q" ')
if aa == 'q':
    exit(0)                           # 0 indicates a successful termination (no error)

if aa == '':                          # if user typed nothing and hit [Return]

    exit('error:  input required')    # string argument passed to exit()
                                      # indicates an error led to termination

Note: the above examples make use of if, which we will cover in a later lesson.

exit() to manipulate execution during development

This function can be used as a temporary stop to the program if we'd like to isolate some statements.

We can also use exit() to simply stop program execution in order to debug:

aa = '55'
bb = float(aa)
print('type of bb is:')
print((type(bb)))
exit()                  # we inserted this to stop the code
                        # from continuing; we'll remove it later

cc = bb * 2             # because of exit() above, this code
                        # will not be reached

int() "conversion" function

This function converts an appropriate value to the int type.

# str -> int
aa = '55'
bb = int(aa)         # 55 (an int)
print(type(bb))      # <class 'int'>

# float -> int
var = 5.95
var2 = int(var)      # 5:  the rest is lopped off (not rounded)

The conversion functions are named after their types -- they take an appropriate value as argument and return an object of that type.

float() "conversion" function

This function converts an appropriate value to the float type.

# int -> float
xx = 5
yy = float(xx)      # 5.0

# str -> float
var = '5.95'
var2 = float(var)   # 5.95 (a float)

str() "conversion" function

This function converts any value to the str type.

var = 5
var2 = 5.5

svar = str(var)     # '5'
svar2 = str(var2)   # '5.5'

print(len(svar))    # 1
print(len(svar2))   # 3

conversion challenge: treating a string like a number

Because Python is strongly typed, conversions can be necessary.

Numeric data sometimes arrives as strings (e.g. from input() or a file). Use int() or float() to convert to numeric types.

aa = input('enter number and I will double it:  ')

print(type(aa))         # <class 'str'>

num_aa = int(aa)        # int() takes the user's input as an argument
                        # and returns an integer

print(num_aa * 2)       # prints the user's number doubled

You can use int() and float() to convert strings to numbers.

avoid improvising syntax!

It's important for early coders to follow existing syntax and not make up their own.

Imagine that would like to find the length of a string. What do you do? Some students being writing code off the top of their head, even though they are not completely familiar with the right syntax

they may write something like this...

var = 'hello'

mylen = var.len()      # or mylen = length('var')
                       # or mylen = lenth(var)

...and then run it, only to get a strange error that's difficult to diagnose.

using existing examples of a feature to write new code using it

When you want to use a Python feature, you must follow an existing example -- you must not improvise!

Let's say you have a string and you'd like to get its length:

s = "this is a string I'd like to measure"     # determine length (36)

You look up the function in a reference, like pythonreference.com:

mylen = len('hello')

Then you use the feature syntax very carefully:

slen = len(s)         # int, 36

However, your code will be slightly different from the example code:

the variable names you use will usually be different
you may use a variable in place of a literal, or a literal in place of a variable

review: distinguish between variables and string literals

early on we need to distinguish between a variable and a literal.

xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)             # print 20 to screen

can you name the 3 variables and 2 literals in this code?

review: distinguish between variables and string literals

early on we need to distinguish between a variable and a literal.

xx = 10
yy = 2

zz = xx * yy

print(zz)

can you name the 3 variables and 2 literals in this code?
variables: xx, yy, zz. These are names that have been assigned a value.
literals: 10, 2. These are values that have been typed directly into our code.

taking care not to confuse a string literal and a variable name

Here's a common error that beginners make - try to avoid it!

Going back to our previous example - you'd like to use len() to measure this string:

s = "this is a string I'd like to measure"     # determine length (36)

You look up the function in a reference, like pythonreference.com:

mylen = len('hello')

You have been told to make your syntax match the example's. But should you do this?

slen = len('s')            # int, 1

You were expecting a length of 36, but you got a length of 1. Can you see why? The variable s points to a long string. The literal 's' is just a one-character string. In trying to match the example code, you may have thought you The takeaway is this: anyplace a literal is used, a variable can be used instead; and anyplace a variable is used, a literal can be used instead.

Conditionals and Blocks; Object Methods

conditionals: if/elif/else and while

All programs must make decisions during execution.

Consider these decisions by programs you know:

text editor: does the read file exist?
ATM: is the security PIN valid?
website: is the email address in the proper form?
game: did the player's score beat the high score?

Conditional statements allow any program to make the decisions it needs to do its work.

'if' statement

The if statement executes code in its block only if the test is True.

aa = input('please enter a positive integer: ')
int_aa = int(aa)

if int_aa < 0:                          # test:  is this a True statement?

    print('error:  input invalid')      # block (2 lines) -- lines are
    exit()                              # executed only if test is True

d_int_aa = int_aa * 2                   # double the value
print('your value doubled is ' + str(d_int_aa))

The two components of an if statement are the test and the block. The test determines whether the block will be executed.

'else' statement

An else statement will execute its block if the if test before it was not True.

xx = input('enter an even or odd number:  ')
yy = int(xx)

if yy % 2 == 0:                    # can 2 divide into yy evenly?
    print(xx + ' is even')
    print('congratulations.')

else:
    print(xx + ' is odd')
    print('you are odd too.')

Therefore we can say that only one block of an if/else statement will execute.

'elif' statement

elif is also used with if (and optionally else): you can chain additional conditions for other behavior.

zz = input('type an integer and I will tell you its sign:  ')
zyz = int(zz)

if zyz > 0:
    print('that number is positive')

elif zyz < 0:
    print('that number is negative')

else:
    print('0 is neutral')

if can be used alone, with elif, with else, or with both
else is not required when using if

the python code block

A code block is marked by indented lines. The end of the block is marked by a line that returns to the prior indent.

xx = input('enter an even or odd number:  ')  # not in any block
yy = int(xx)                                      # ditto


if yy % 2 == 0:                         # the start of the 'if' block
    print('your number is even')
    print('even is cool')               # last line of the 'if' block


else:                                   # the start of the 'else' block
    print('your number is odd')
    print('you are cool')               # last line of the 'else' block


print('thanks for playing "even/odd number"')      # not in any block

Note also that a block is preceded by an unindented line that ends in a colon.

nested blocks increase indent

Blocks can be nested within one another. A nested block (a "block within a block") simply moves the code block further to the right.

var_a = int(input('enter a number: '))
var_b = int(input('enter another number:  '))

if var_b >= var_a:                         # compare int values for truth
    print("the test was true")
    print("var b is at least as large")

    if var_a == var_b:                     # if the two values are equivalent
        print('the two values are equivalent')

    print("now we're in the outer block but not in the inner block")

print('this gets printed in any case (i.e., not part of either block)')

Complex decision trees using 'if' and 'else' is the basis for most programs.

comparison operators with numbers

>, <, <=, >= tests with numbers work as you might expect.

var = 5
var2 = 3.3

if var >= var2:
    print('var is greater or equal')

if var == var2:
    print('they are equivalent')

'==' with strings

With strings, this operator tests to see if two strings are identical.

var = 'hello'
var2 = 'hello'

if var == var2:
    print('these are equivalent strings')

The same 'equivalence' operator is used for numbers and strings.
Compare this to the 'polymorphic' begavior of '+' and '*'.

the 'in' operator with strings

'in' with strings allows you can to see if a 'substring' appears within a string.

article = 'The market rallied, buoyed by a rise in Samsung Electronics.  The other...'

if 'Samsung' in article:
    print('Samsung was found')

'in' tests to see if one string can be found in another (a 'substring')
like the other comparison operators, this one returns a bool value

'and' "compound" test

Python uses the operator and to combine tests: both must be True.

The 'and' compound statement if both tests are True, the entire statement is True.

xx = input('what is your ID?  ')
yy = input('what is your pin?  ')

if xx == 'dbb212' and yy == '3859':
    print('you are a validated user')
else:
    print('you are not validated')

Note the lack of parentheses around the tests -- if the syntax is unambiguous, Python will understand. We can use parentheses to clarify compound statements like these, but they often aren't necessary. You should avoid parentheses wherever you can.

'or' "compound" test

Python uses the operator or to combine tests: either can be True.

The 'or' compound statement if either test is True, the entire statement is True.

aa = input('please enter "q" or "quit" to quit: ')
if aa == 'q' or aa == 'quit':
    exit()
print('continuing...')

testing a variable against two values

Bogth sides of an 'or' or 'and' must be complete tests.

if aa == 'q' or aa == 'quit':          # not "if aa == 'q' or 'quit'""
    exit()

Note the 'or' test above -- we would not say if aa == 'q' or 'quit'; this would always succeed (for reasons discussed later).

testing a variable against multiple values

We can also test a variable against multiple values by using in with a list (more on lists next week):

if aa in ['q', 'quit']:
    exit()

negating an 'if' test with 'not'

You can negate a test with the not keyword.

var_a = 5
var_b = 10

if not var_a > var_b:
    print("var_a is not larger than var_b (well - it isn't).")

Of course this particular test can also be expressed by replacing the comparison operator > with <=, but when we learn about new True/False condition types we'll see how this operator can come in handy.

boolean (bool) values True and False

True and False are boolean values (type bool), and are produced by expressions that can be seen as True or False.

aa = 3
bb = 5

if aa > bb:
    print("that is true")

Tests are actually expressions that resolve to True or False, which are values of boolean type:

var = 5
var2 = 10
xx = (5 > 3)
print(xx)            # True
print(type(xx))      # <class 'bool'>

Note that we would almost never assign comparisons like these to variables, but we are doing so here to illustrate that they resolve to boolean values.

The 'while' Statement and Looping

the concept of incrementing

We reassign the value of an integer to effect an incrementing.

x = 0         # int, 0

x = x + 1     # int, 1
x = x + 1     # int, 2
x = x + 1     # int, 3

print(x)      # 3

For each of the three incrementing statements above, a new value that equals the value of x is created, and then assigned back to x. The previous value of x is replaced with the new, incremented value. Incrementing is most often used for counting within loops -- see next.

while loops

A while test causes Python to loop through a block repetitively, as long as the test is True.

This program prints each number between 0 and 4

cc = 0                 # initialize a counter

while cc < 5:          # "if test is True, enter the block"
    print(cc)
    cc = cc + 1        # "increment" cc:  add 1 to its current value
                       # WHEN WE REACH THE END OF THE BLOCK,
                       # JUMP BACK TO THE while TEST

print('done')

The block is executing the print and cc = cc + 1 lines multiple times - again and again until the test becomes False. Of course, the value being tested must change as the loop progresses - otherwise the loop will cycle indefinitely (infinite loop).

understanding while loops

while loops have 3 components: the test, the block, and the automatic return.

cc = 10

while cc > 0:         # the test (if True, enter the block)

       print(cc)      # the block (execute as regular Python statements)
       cc = cc - 1

  # the automtic return [invisible!]
  # (at end of block, go back to the test)

print('done')

can you tell just from reading what this code prints?
to do so, you'll need to keep track of the value of cc and calculate its changes as the code block is executed repetitively
you must be able to "execute" this code in your head
his takes some practice but isn't complicated

loop control: "break"

break is used to exit a loop regardless of the test condition.

xx = 0
while xx < 10:
    answer = input("do you want loop to break? ")
    if answer == 'y':
        break             # drop down below the block
    print('Hello, User')
    xx = xx + 1
    print('I have now greeted you ' + str(xx) + ' times')

print("ok, I'm done")

loop control: "continue"

The continue statement jumps program flow to next loop iteration.

x = 0
while x < 10:
    x = x + 1
    if x % 2 != 0:             # will be True if x is odd
        continue               # jump back up to the test and test again
    print(x)

Note that print(x) will not be executed if the continue statement comes first. Can you figure out what this program prints?

the "while True" loop

while with True and break provide us with a handy way to keep looping until we feel like stopping.

while True:
    var = input('please enter a positive integer:  ')
    if int(var) > 0:
        break
    else:
        print('sorry, try again')

print('thanks for the integer!')

Note the use of True in a while expression: since True is always True this test will be always be True, and cause program flow to enter (and re-enter) the block. Therefore the break statement is essential to keep this loop from looping indefinitely.

debugging loops: the "fog of code"

Use print() statements to give visibility to your code execution.

The output of the code should be the sum of all numbers from 0-10, or 55:

revcounter = 0
while revcounter < 10:

    varsum = 0
    revcounter = revcounter + 1
    varsum = varsum + revcounter

    print("loop iteration complete")
    print("revcounter value: ", revcounter)
    print("varsum value: ", varsum)
    input('pausing...')
    print()
    print()

print(varsum)                        # 10

I've added quite a few statements, but if you run this example you will be able to get a hint as to what is happening:

loop iteration complete
revcounter value:  1
varsum value:  1
pausing...                          # here I hit [Return] to continue


loop iteration complete
revcounter value:  2
varsum value:  2
pausing...                          # [Return]

So the solution is to initialize varsum before the loop and not inside of it:

revcounter = 0
varsum = 0
while revcounter < 10:

    revcounter = revcounter + 1
    varsum = varsum + revcounter

print(varsum)

This outcome makes more sense. We might want to check the total to be sure, but it looks right. The hardest part of learning how to code is in designing a solution. This is also the hardest part to teach! But the last thing you want to do in response is to guess repeatedly. Instead, please examine the outcome of your code through print statements, see what's happening in each step, then compare this to what you think should be happening. Eventually you'll start to see what you need to do. Step-by-baby-step!

Object Methods

object methods

Objects are capable of behaviors, which are expressed as methods.

Use object methods to process object values

var = 'Hello, World!'
var2 = var.replace('World', 'Mars')      # replace substring, return a str
print(var2)                              # Hello, Mars!

Methods are type-specific functions that are used only with a particular type.

methods vs. functions

Compare method syntax to function syntax.

mystr = 'HELLO'

x = len(mystr)          # int, 5

y = mystr.count('L')    # int, 2

print(y)                # 2

Methods and functions are both called (using the parentheses after the name of the function or method). Both also may take an argument and/or may return a return value.

string method: .upper()

This "transforming" method returns a new string with a string's value uppercased.

upper() string method

var = 'hello'
newvar = var.upper()

print(newvar)                   # 'HELLO'

string method: .lower()

This "transforming" method returns a new string with a string's value uppercased.

lower() string method

var = 'Hello There'
newvar = var.lower()

print(newvar)                   # 'hello there'

string method: .replace()

this "transforming" method returns a new string based on an old string, with specified text replaced.

var = 'My name is Joe'

newvar = var.replace('Joe', 'Greta')    # str, 'My name is Greta'

print(newvar)                            # My name is Greta

This method takes two arguments, the search string and replace string.

string method: .isdigit()

This "inspector" method returns True if a string is all digits.

mystring = '12345'
if mystring.isdigit():
    print("that string is all numeric characters")

if not mystring.isdigit():
    print("that string is not all numeric characters")

Since it returns True or False, inspector methods like isdigit() are used in an if or while expression. To test the reverse (i.e. "not all digits"), use if not before the method call.

string method: .endswith()

This "inspector" method returns True if a string starts with or ends with a substring.

bb = 'This is a sentence.'
if bb.endswith('.'):
    print("that line had a period at the end")

string method: .startswith()

This "inspector"method returns True if the string starts with a substring.

cc = input('yes? ')
if cc.startswith('y') or cc.startswith('Y'):
    print('thanks!')
else:
    print("ok, I guess not.")

string method: .count()

This "inspector" method returns a count of occurrences of a substring within a string.

aa = 'count the substring within this string'
bb = aa.count('in')
print(bb)             # 3 (the number of times 'in' appears in the string)

string method: .find()

This "inspector" method returns the character position of a substring within a string.

xx = 'find the name in this string'
yy = xx.find('name')
print(yy)             # 9 -- the 10th character in mystring

f'' strings for string formatting

An f'' string allows us to embed any value such as numbers into a new, completed string.

aa = 'Jose'
var = 34

# 2 arguments to replace 2 {} tokens
bb = f'{aa} is {var} years old.'

print(bb)                                  # Jose is 34 years old.

f'' strings are the preferred way to combine strings with numbers
you should not convert and concatenate, i.e. aa + ' is ' + str(var)
it's also recommended not to use commas, i.e. print(aa, 'is', var) because there is less control

f'' string format codes

An f'' string allows us to embed any value such as numbers into a new, completed string.

overview of formatting

# text padding and justification
# :<15     # left justify width
# :>10     # right justify width
# :^8      # center justify width

# numeric formatting
:f         # as float (6 places)
:.2f       # as float (2 places)
:,         # 000's commas
:,.2f      # 000's commas with float to 2 places

examples

x = 34563.999999

f'hi:  {x:<30}'      # 'hi:  34563.999999                  '

f'hi:  {x:>30}'      # 'hi:                    34563.999999'

f'hi:  {x:^30}'      # 'hi:           34563.999999         '

f'hi:  {x:f}'        # 'hi:  34563.999999'

f'hi:  {x:.2f}'      # 'hi:  34564.00'

f'hi:  {x:,}'        # 'hi:  34,563.999999'

f'hi:  {x:,.2f}'     # 'hi:  34,564.00'

Please note that f'' strings are available only as of Python 3.6.

sidebar: method and function return values in an expression; combining expressions

The return value of an expression can be used in another expression.

letters = "aabbcdefgafbdchabacc"

vara = letters.count("a")         # 5

varb = len(letters)               # 20

varc = vara / varb                # 5 / 20, or 0.25

vard = varc * 100                 # 25


print(len(letters) / letters.count("a") * 100)  # statements combined

the first 4 statements calculate the percentage of a's in the string
the last statement does the same operations in one statement
combining statements is optional, but can be fun
shorter code is usually better, as long as it is clear

Data Parsing & Extraction: String Methods

our first data format: csv

The CSV format will allow us to explore Python's text parsing tools.

comma-separated values file (CSV)

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010

data is commonly organized in tabular form: columns and rows
examples: Excel spreadsheet, CSV file, relational database
the CSV stands for "comma-separated values"
CSV is used throughout the world to post, transmit and store data
in this lesson we will 'parse' CSV data (i.e., divide into usable pieces)
in the process, we'll learn Python's tools for reading file data and parsing strings
much of the data we are called to work with comes to us as strings

CSV structure: "fields" and "records"

Tables consist of records (rows) and fields (column values).

Tabular text files are organized into rows and columns.

comma-separated values file (CSV)

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010
    19270104,0.30,0.15,0.73,0.010
    19280103,0.43,0.90,0.20,0.010
    19280104,0.14,0.47,0.01,0.010

space-separated values file

    19260701    0.09    0.22    0.30   0.009
    19260702    0.44    0.35    0.08   0.009
    19270103    0.97    0.21    0.24   0.010
    19270104    0.30    0.15    0.73   0.010
    19280103    0.43    0.90    0.20   0.010
    19280104    0.14    0.47    0.01   0.010

note the delimeters may be commas, colons, tabs, or any other character
in addition, the delimeter may be "spaces", in other words multiple space
the delimiter is necessary to maintain the structure, but also must be removed during parsing
our job will be to turn the CSV into "fields", i.e. separated data values on each line

table data in text files

Text files are just sequences of characters. Commas and newline characters separate the data.

If we print a CSV text file, we may see this:

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010
    19270104,0.30,0.15,0.73,0.010
    19280103,0.43,0.90,0.20,0.010
    19280104,0.14,0.47,0.01,0.010

However, here's what a text file really looks like under the hood:

19260701,0.09,0.22,0.30,0.009\n19260702,0.44,0.35,0.08,
0.009\n19270103,0.97,0.21,0.24,0.010\n19270104,0.30,0.15,
0.73,0.010\n19280103,0.43,0.90,0.20,0.010\n19280104,0.14,
0.47,0.01,0.010

the newline character separates the records in a CSV file
the delimeter (in this case, a comma) separates the fields
the newline character is a signal to your display program to drop down a line and continue display on the next line

tabular data: looping, parsing and summarizing

Looping through file line strings, we can split and isolate fields on each line.

The process: 1. Open the file for reading. 2. Use a for loop to read each line of the file, one at a time. Each line will be represented as a string. 3. Remove the newline from the end of each string with .rstrip 4. Divide (using .split()) the string into fields. 5. Read a value from one of the fields, representing the data we want. 6. As the loop progresses, build a sum of values from each line. We will begin by reviewing each feature necessary to complete this work, and then we will begin to put it all together.

string method: .rstrip()

This method can remove any character from the right side of a string.

When no argument is passed, the newline character (or any "whitespace" character) is removed from the end of the line:

line_from_file = 'jw234,Joe,Wilson\n'

stripped = line_from_file.rstrip()      # str, 'jw234,Joe,Wilson'

When a string argument is passed, that character is removed from the end of the ine:

line_from_file = 'I have something to say.'

stripped = line_from_file.rstrip('.')   # str, 'I have something to say'

string method: .split() with a delimeter

This method divides a delimited string into a list.

line_from_file = 'jw234:Joe:Wilson:Smithtown:NJ:2015585894\n'

xx = line_from_file.split(':')

print(xx)                         # ['jw234', 'Joe', 'Wilson',
                                  #  'Smithtown', 'NJ', '2015585894\n']

string method: .split() without a delimeter

We can also thing of a string as delimited by spaces.

gg = 'this is a file    with    some     whitespace'

hh = gg.split()                   # splits on any "whitespace character"

print(hh)                         # ['this', 'is', 'a', 'file',
                                  #  'with', 'some', 'whitespace']

If no delimeter is supplied, the string is split on whitespace.
Also note that all whitespace is removed - any consecutive spaces are treated as one.

Data Parsing & Extraction: List Operations and String Slicing

lists and list subscripting

Subscripting allows us to select individual elements of a list.

fields = ['jw234', 'Joe', 'Wilson', 'Smithtown', 'NJ', '2015585894']

var = fields[0]           # 'jw234'
var2 = fields[4]          # 'NJ'
var3 = fields[-1]         # '2015585894' (-1 means last index)

the list is a sequence of objects of any type (here they are strings)
subscripting means accessing an individual item within the list
square brackets specify an element index, starting at 0
the last index can also be specified using -1
indices count from 0 at the start, or -1 at the

lists: slicing

Slicing allows us to select multiple items from a list.

letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
first_four = letters[0:4]
print(first_four)                     # ['a', 'b', 'c', 'd']

# no upper bound takes us to the end
print(letters[5:])                    # ['f', 'g', 'h']

Here are the rules for slicing:

   1) the first index is 0
   2) the lower bound is the 1st element to be included
   3) the upper bound is one higher the last element to be included
   4) no upper bound means "to the end"

strings: slicing

Slicing a string selects characters the way that slicing a list selects items.

mystr = '2014-03-13 15:33:00'
year =  mystr[0:4]               # '2014'
month = mystr[5:7]               # '03'
day =   mystr[8:10]              # '13'

Again, please review the rules for slicing:

   1) the first index is 0 (first character)
   2) the lower bound is the 1st character to be included
   3) the upper bound is one higher the last character to be included
   4) no upper bound means "to the end"

the IndexError exception

An IndexError exception indicates use of an index for a list element that doesn't exist.

mylist = ['a', 'b', 'c']

print(mylist[5])            # IndexError:  list index out of range

Since mylist does not contain a sixth item (i.e., at index 5), Python tells us it cannot complete this operation.

Data Parsing & Extraction: File Operations and the 'for' Loop

the 'for' loop with a list

'for' with a list repeats its block as many times as there are items in the list.

mylist = [1, 2, 'b']

for var in mylist:       # 1
    print(var)           # ===
    print('===')         # 2
                         # ===
print('done')            # b
                         # ===
                         # done

Similar to a while block, the for block repeats the contents of its block multiple times, but does so only the number itms in the list. The control variable var is reassigned for each iteration of the loop. This means that if the list has 3 items, the loop executes 3 times and var is reassigned a new value 3 times.

review: the concept of incrementing

We reassign the value of an integer to effect an incrementing.

x = 0         # int, 0

x = x + 1     # int, 1
x = x + 1     # int, 2
x = x + 1     # int, 3

print(x)      # 3

using a 'for' loop to count list items

An integer, updated for each iteration, can be used to count iterations.

mylist = [1, 2, 'b']

my_counter = 0

for var in mylist:
    my_counter = my_counter + 1

print(f'count:  {my_counter} items')   # counter:  3 items

The value of my_counter is initialized at 0 before the loop begins. Then, since the incrementing line my_counter = my_counter + 1 is inside the loop, the value of my_counter goes up once with each iteration. Please note that the len() function can count list items more efficiently, but we are using a counter to demonstrate the counter technique, which can be used in situations where len() can't be used, as when we count lines.

using a 'for' loop to sum list items

An integer, updated for each iteration, can be used to count iterations.

mylist = [1, 2, 3]

my_sum = 0

for val in mylist:
    my_sum = my_sum + val

print(f'sum:  {my_sum}')     # sum: 6  (value of 1 + 2 + 3)

The value of my_sum is initialized at 0 before the loop begins. Then, since the incrementing line my_sum = my_sum + val is inside the loop, the value of my_sum goes up once with each iteration. Please note that the sum() function can count list items more efficiently, but we are using a summing variable to demonstrate the summing technique, which can be used in situations where sum() can't be used, as when we are summing values from a file.

opening and reading a file with 'for'

'for' with a file repeats its block as many times as there are lines in the file.

fh = open('students.txt')              # file object allows
                                       # looping through a
                                       # series of strings

for xx in fh:                          # xx is a string, a line of the file
    print(xx)                           # prints each line of students.txt

fh.close()                             # close the file

"xx" is called a control variable, and it is automatically reassigned each line in the file as a string. break and continue work with for as well as while loops. Again, the control variable xx is reassigned for each iteration of the loop. This means that if the file has 5 lines, the loop executes 5 times and xx is reassigned a new value 5 times.

reading a file with 'with'

A file is automatically closed upon exiting the 'with' block.

A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.

with open('pyku.txt') as fh:
    for line in fh:
        print(line)

## at this point (outside the with block), filehandle fh has been closed.

open 'read' files do not block other processes from accessing the file
however, we are sometimes concerned with good file "housekeeping"
it's always best to close a file as soon as you are done with it
when writing to a file, this need becomes more critical
with saves us from having to remember to close the file

summarizing: csv parsing with 'for' looping and string parsing

Here we put together all features learned in this session.

fh = open('revenue.csv')          # 'file' object

counter = 0
summer = 0.0

for line in fh:                   # str, "Haddad's,PA,239.50\n"

    line = line.rstrip()          # str, "Haddad's,PA,239.50"
    fieldlist = line.split(',')   # list, ["Haddad's", 'PA', '239.50']

    rev_val = fieldlist[2]        # str, '239.50'   (value from first line)
    f_rev = float(rev_val)        # float, 239.5

    counter = counter + 1
    summer = summer + f_rev

fh.close()

print(f'counter:  {counter}')     # 7 (number of lines in file)
print(f'summer:   {summer}')      # 662.01000001  (sum of all 3rd col values in file)

This example puts together everything we learned in this session. Each line is a string, which gets stripped, split into fields and then the last item in the line converted to float. We then use a summing variable to sum up the values found on each line.

If we wish now we can derive an average value by dividing summer by counter.

(Note that the tiny remainder is expected and can be rounded to 2 places.)

sidebar: writing and appending to files using the file object

Files can be opened for writing or appending; we use the file object and the file write() method.

fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()

fh = open('new_file.txt')
lines = fh.readlines()
print(lines)
  # ["here's a line of text\n",
  #  'I add the newlines explicitly if I want to write to the file\n']

fh.close()

Note that we are explicitly adding newlines to the end of each line. The write() method doesn't do this for us.

Optional: modules for accessing databases, CSV, SQL, JSON and the internet

Importing Python Modules

A module is Python code (a code library) that we can import and use in our own code -- to do specific types of tasks.

import csv           # make csv (a library module) part of our code

fh = open('thisfile.csv')
reader = csv.reader(fh)

for row in reader:
    print(row)

Once a module is imported, its Python code is made available to our code. We can then call specialized functions and use objects to accomplish specialized tasks. Python's module support is profound and extensive. Modules can do powerful things, like manipulate image or sound files, munge and process huge blocks of data, do statistical modeling and visualization (charts) and much, much, much more. The Python 3 Standard Library documentation can be found at https://docs.python.org/3/library/index.html Python 2 Standard Library: https://docs.python.org/2.7/library/index.html

CSV

The CSV module parses CSV files, splitting the lines for us. We read the CSV object in the same way we would a file object.

import csv
fh = open('students.txt', 'rb')  # second argument: default "read"
reader = csv.reader(fh)

next(fh)                  # skip one row (useful for header lines)

for record in reader:     # loop through each row
    print(f'id:{record[0]};  fname:{record[1]}; lname: {record[2]}')

fh.close()

This module takes into account more advanced CSV formatting, such as quotation marks (which are used to allow commas within data.) The second argument to open() ('rb') is sometimes necessary when the csv file comes from Excel, which output newlines in the Windows format (\r\n), and can confuse the csv reader.

Writing is similarly easy:

import csv
wfh = open('some.csv', 'w', newline='')
writer = csv.writer(wfh)
writer.writerow(['some', 'values', "boy, don't you like long field values?"])
writer.writerows([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']])
wfh.close()

Please be advised that you will not see writes to a file until you close the file with fh.close() or until the program ends execution. (newline='' is necessary when opening the write file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.)

sqlite3: local file-based relational database

An sqlite3 lightweight database instance is built into Python and accessible through SQL statements. It can act as a simple storage solution, or can be used to prototype database interactivity in your Python script and later be ported to a production database like MySQL, Postgres or Oracle.

Keep in mind that the interface to your relational fdatabase will be the same or similar to the one presented here with the file-based one.

import sqlite3
conn = sqlite3.connect('example.db')  # a db connection object

c = conn.cursor()                     # a cursor object for issuing queries

Once a cursor object is established, SQL can be used to write to or read from the database:

c.execute('''CREATE TABLE stocks
             (date text, trans text, symbol text, qty real, price real)''')

Note that sqlite3 datatypes are nonstandard and don't reflect types found in databases such as MySQL: INTEGER: all int types (TINYINT, BIGINT, INT, etc.) REAL: FLOAT, DOUBLE, REAL, etc. NUMERIC: DECIMAL, BOOLEAN, DATE, DATETIME, NUMERIC TEXT: CHAR, VARCHAR, etc. BLOB: BLOB (non-typed (binary) data, usually large)

Insert a row of data

c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

Larger example that inserts many records at a time

purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
             ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
             ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
            ]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)

Commit the changes -- this actually executes the insert

conn.commit()

Retrieve single row of data

t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)

tuple_row = c.fetchone()
print(tuple_row)               # (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)

Retrieve multiple rows of data

for tuple_row in c.execute('SELECT * FROM stocks ORDER BY price'):
    print(tuple_row)

### (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
### (u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
### (u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
### (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)

Close the database

conn.close()

Using the requests Module to Make an HTTP Browser Request

A Python program can take the place of a browser, requesting and downloading CSV, HTML pages and other files.

Your Python program can work like a web spider (for example visiting every page on a website looking for particular data or compiling data from the site), can visit a page repeatedly to see if it has changed, can visit a page once a day to compile information for that day, etc.

Basic Example: Download and Save Data

import requests

url = 'https://www.python.org/dev/peps/pep-0020/'   # the Zen of Python (PEP 20)

response = requests.get(url)     # a response object

text = response.text             # text of response


# writing the response to a local file -
# you can open this file in a browser to see it
wfh = open('pep_20.html', 'w')
wfh.write(text)
wfh.close()

More Complex Example: Send Headers, Parameters, Body; Receive Status, Headers, Body

import requests

url = 'http://davidbpython.com/cgi-bin/http_reflect'   # my reflection program

div_bar = '=' * 10


# headers, parameters and message data to be passed to request
header_dict =  { 'Accept': 'text/plain' }          # change to 'text/html' for an HTML response
param_dict =   { 'key1': 'val1', 'key2': 'val2' }
data_dict =    { 'text1': "We're all out of gouda." }


# a GET request (change to .post for a POST request)
response = requests.get(url, headers=header_dict,
                             params=param_dict,
                             data = data_dict)


response_status = response.status_code   # status of the response (OK, Not Found, etc.)

response_headers = response.headers      # headers sent by the server

response_text = response.text            # body sent by server


# outputting response elements (status, headers, body)

# response status
print(f'{div_bar} response status {div_bar}\n')
print(response_status)
print(); print()

# response headers
print(f'{div_bar} response headers {div_bar}\n')
for key in response_headers:
    print(f'{key}:  {response_headers[key]}\n')
print()

# response body
print(f'{div_bar} response body {div_bar}\n')
print(response_text)

Note that if import requests raises a ModuleNotFoundError exception, requests must be installed: Mac: open the Terminal program and issue this command: pip3 install requests Windows: open the Command Prompt program and issue the following command: pip install requests If you have any problems with these commands, please let me know!

Using requests to read CSV and JSON Data

Specific techniques for reading the most common data formats.

CSV: feed string response to .splitlines(), then to csv.reader:

import requests
import csv

url = 'path to csv file'

response = requests.get(url)
text = response.text

lines = text.splitlines()
reader = csv.reader(lines)

for row in reader:
    print(row)

JSON: requests accesses built-in support:

import requests

url = 'path to json file'

response = requests.get(url)

obj = response.json()

print(type(obj))          # <class 'dict'>

Alternative to requests: the urllib module

If the requests module cannot be installed, this module is part of the standard distribution.

urllib2 is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution. urllib is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution.

The urlopen method takes a url and returns a file-like object that can be read() as a file:

import urllib.request
my_url = 'http://www.yahoo.com'
readobj = urllib.request.urlopen(my_url)  # return a 'file-like' object
text = readobj.read()                     # read into a 'byte string'
# text = text.decode('utf-8')             # optional, sometimes required:
                                          # decode as a 'str' (see below)
readobj.close()

Alternatively, you can call readlines() on the object (keep in mind that many objects that can deliver file-like string output can be read with this same-named method):

for line in readobj.readlines():
  print(line)
readobj.close()

Parsing CSV Files Downloaded CSV files should be parsed with the CSV module, as CSV can be more complex than just comma separators.

The csv.reader() function usually requires a file object, but we can also pass a list of lines to it:

readobj = urllib.request.urlopen(my_url, context=ctx)   # file
text = readobj.read()                                   # bytes, entire download
text = text.decode('utf-8')                             # str, entire download
lines = text.splitlines()                               # list of str (lines)

reader = csv.reader(lines)

for row in reader:
    print(row)

For discussion of potential issues with using urllib, please see the unit titled "Supplementary Modules: CSV, SQL, JSON and the Internet". POTENTIAL ERRORS AND REMEDIES WITH urllib

TypeError mentioning 'bytes' -- sample exception messages:

TypeError: can't use a string pattern on a bytes-like object
TypeError: must be str, not bytes
TypeError: can't concat bytes to str

These errors indicate that you tried to use a byte string where a str is appropriate.

The urlopen() response usually comes to us as a special object called a byte string. In order to work with the response as a string, we can use the decode() method to convert it into a string with an encoding.

text = text.decode('utf-8')

'utf-8' is the most common encoding, although others ('ascii', 'utf-16', 'utf-32' and more) may be required. I have found that we do not always need to convert (depending on what you will be doing with the returned string) which is why I commented out the line in the first example. SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
readobj = urllib.request.urlopen(my_url, context=ctx)

Encoding Parameters: urllib.requests.urlencode()

When including parameters in our requests, we must encode them into our request URL. The urlencode() method does this nicely:

import urllib.request, urllib.parse

params = urllib.parse.urlencode({'choice1': 'spam and eggs',
                                 'choice2': 'spam, spam, bacon and spam'})
print("encoded query string: ", params)

this prints:

encoded query string:
choice1=spam+and+eggs&choice2=spam%2C+spam%2C+bacon+and+spam

Filepaths for Locating Files

Locating Files with Filepaths

Filepaths pinpoint the location of any file.

Your computer's filesystem contains files and folders, arranged in a tree (folders and files within folders within other folders, etc.) In this session we'll look at how we can open files anywhere on the filesystem tree. Here's a sample tree for us to work with, containing both files (ending in .txt) and python scripts (ending in .py). (This tree and files are replicated in your data folder for this session.)

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py

When our script is located in the same directory as a file we want to open, we can give Python the name of the file, and it will find it in this same directory.

""" test2b.py:  open and read a file """

fh = open('file2b.txt')   # OS looks for file in present working directory
print(fh.read())          # this is file 2b - note that it is in same directory as script

This works because test2b.py and file2b.txt are in the same directory.

However, if our script is in a different location from the file we want to open, we have a problem -- the OS won't be able to find the file.

""" test3a.py:  open and read a file """

fh = open('file2b.txt')    # raises a FileNotFoundError exception
                           # (OS looks for file in the pwd (dir3a)
                           # but doesn't find it)

The file exists, but it is in a different directory. The OS can't find the file because it needs to be told in which directory it should look for the file. So, if we are running our script from a different location than the file we wish to open, we must use a relative path or an absolute path to show the OS where the file is located.

Relative vs. Absolute Paths

There are two different ways of expressing a file's location.

Again, let's use the sample tree that can be found in your session folder:

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py

Absolute path: this is one that locates a file from the root of the filesystem. It lists each of the directories that lead from the root to the directory that holds the file.

In Windows, absolute paths begin with a drive letter, usually C:\:

""" test3a.py:  open and read a file """

filepath = r'C:\Users\david\Downloads\python_data\session_03_strings_lists_files\dir1\dir2b\file2b.txt'
fh = open(filepath)

print(fh.read())

(Note that r'' should be used with any Windows paths that contain backslashes.)

On the Mac, absolute paths begin with a forward slash:

""" test3a.py:  open and read a file """

filepath = '/Users/david/Downloads/python_data/session_03_strings_lists_files/dir1/dir2b/file2b.txt'
fh = open(filepath)

print(fh.read())

(The above paths assume that the python_data folder is in the Downloads directory; your may have placed yours elsewhere on your system. Of course, the above paths also assume that my home directory is called david/; yours is likely different.) Relative Path: locate a file folder in relation to the present working directory

A relative path is read as an extension of the present working directory. The below path assumes that our present working directory is /Users/david/Downloads/python_data/dir1/dir2a:

""" test2a.py:  open and read a file """

filepath = 'dir3a/dir4/file4.txt'   # starts from /Users/david/Downloads/python_data/dir1/dir2a
fh = open(filepath)

print(fh.read())

When we use a relative path, we can think of it as extending the pwd. So the whole path is: /Users/david/Downloads/python_data/dir1/dir2a/dir3a/dir4/file4.txt Therefore, in order to use a relative path, you must first ascertain your present working directory in the filesystem. Only then can you know the relative path needed to find the file you are looking for. Special Note: Windows paths and the "raw string" Note that Windows paths featuring backslashes should use r'' ("raw string"), in which a backslash is not seen as an escape sequence such as \n (newline). This is not required on Macs or on paths without backslashes. For simplicity, you can substitute forward slashes in Windows paths, and Python will translate the slashes for you. Using forward slashes is probably the easiest way to work with Windows paths in Python.

Locating a File in a Parent Directory

We use .. to signify the parent; this can be used in a relative filepath.

Again, let's use the sample tree that is replicated in your session folder:

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py

What if we are in dir3a running file test3a.py but want to access file file1.txt?

Think of .. (two dots) as representing the parent directory:

""" test3a.py:  open and read a file """

filepath = '../../file1.txt'  # reads from /Users/david/Downloads/python_data/dir1
fh = open(filepath)

print(fh.read())

As with all relative paths, you must first consider the location from which we are running the script, then the location of the file you're trying to open. If we are in the dir3 directory when we run test3a.py, then we are two directories "below" the dir1 directory. The first .. takes us to the dir2 directory. The second .. takes us to the dir1 directory. We can then access the file1.txt directory from there. Going up, then down What if we wanted to go from dir2a to dir2b? They are at the same level, in other words they are neither above or below each other.

The answer is to go up to the parent, then down to the other child:

""" test2a.py:  open and read a file """

filepath = '../dir2b/file2b.txt'
fh = open(filepath)

print(fh.read())

.. takes us to the dir1 directory. dir2b can be accessed from that directory.

Containers: More List Operations

using containers to collect data

Containers are Python objects that can contain other objects.

a container is an object that can contain other objects
we collect values (numbers and strings) from a data source and store them in a container to manipulate and analyze data
the four Python containers are list, tuple, set and dict
each one stores data in a different way that makes it convenient for us to manipulate and analyze

containers allow for manipulation and analysis

Once collected, values in a container can be sorted or filtered (i.e. selected) according to whatever rules we choose. A collection of numeric values offers many new opportunities for analysis:

median (the "middle" value in a sorted list)
standard deviation (the average distance of the values from an average value)
top 5 or bottom 3, and the average of those
dividing values into "percentiles"

A collection of string values allows us to perform text analysis:

frequency of a word
position of a word within a text
whether a word in one collection is present in another collection

container objects: list, set, tuple

Compare and contrast the characteristics of each container.

mylist =  ['a', 'b', 'c', 'd', 1, 2, 3]

mytuple = ('a', 'b', 'c', 'd', 1, 2, 3)

myset =   {'a', 'b', 'c', 'd', 1, 2, 3}

mydict =  {'a': 1, 'b': 2, 'c': 3, 'd': 4}

list: ordered, mutable sequence of objects
tuple: ordered, immutable sequence of objects
set: unordered, mutable, unique collection of objects
dict: unordered, mutable collection of object key-value pairs, with unique keys (discussed upcoming)

this is just an overview, to show you where we are going
the containers employ different syntax - the brackets used (dicts are a bit more elaborate)
do not attempt to memorize the differences now -- simply look over this slide to see where we are going

review: the list container object

A list is an ordered sequence of values.

var = []                     # initialize an empty list

var2 = [1, 2, 3, 'a', 'b']   # initialize a list of values

lists are the most commonly used container
they store values (items) in a sequence
they allow us to access an item by position
they can be sorted, sliced and looped through ('for' loop, coming up)

review: subscripting a list

Subscripting allows us to read individual items from a list.

mylist = [1, 2, 3, 'a', 'b']       # initialize a list of values

xx = mylist[2]                     # 3

yy = mylist[-1]                    # 'b'

indexing starts at 0
so index 1 is the 2nd item, index 2 is the 3rd item, etc.
indexing can also be counted from the end, at -1
so index -1 is the last item, index -2 is the 2nd to last, etc.

review: slicing a list

Slicing a list returns a new list.

var2 = [1, 2, 3, 'a', 'b']            # initialize a list of values

sublist1 = var2[0:3]                  # [1, 2, 3]

sublist2 = var2[2:4]                  # [3, 'a']

sublist3 = var2[3:]                   # ['a', 'b']

Remember the rules of slicing, similar to strings:

indexing begins at 0, so 0 is the first item
the "upper bound" (2nd integer) is an index one greater than the item that will be returned (non-inclusive)
to slice off the end, leave off the upper bound

finding an item within a list

The 'in' operator works with lists similar to how it works with strings.

mylist = [1, 2, 3, 'a', 'b']

if 'b' in mylist:                        # this is True for mylist
    print("'b' can be found in mylist")

print('b' in mylist)                      # "True":  the 'in' operator
                                         # actually returns True or False

summary functions: len(), sum(), max(), min()

Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?

mylist = [1, 3, 5, 7, 9]        # initialize a list

print(len(mylist))               # 5 (count of items)
print(sum(mylist))               # 25 (sum of values)
print(min(mylist))               # 1 (smallest value)
print(max(mylist))               # 9 (largest value)

sorting a list

sorted() returns a new list of sorted values.

mylist = [4, 9, 1.2, -5, 200, 20]

smyl = sorted(mylist)                     # [-5, 1.2, 4, 9, 20, 200]

concatenating two lists with +

Concatenation works in the same way as strings.

var = ['a', 'b', 'c']
var2 = ['d', 'e', 'f']

var3 = var + var2                  # ['a', 'b', 'c', 'd', 'e', 'f']

adding (appending) an item to a list

var = []

var.append(4)                # Note well! call is not assigned
var.append(5.5)              # list is changed in-place

print(var)                                # [4, 5.5]

It is the nature of a list to hold these items in order as they were added.

the AttributeError exception

An AttributeError exception occurs when calling a method on an object type that doesn't support that method.

mylines = ['line1\n', 'line2\n', 'line3\n']

mylines = mylines.rstrip()         # AttributeError:
                                   # 'list' object has no attribute 'rstrip'

although .rstrip() is a method, we can refer to it as an attribute
an attribute is any name that appears after an object and a dot (object.attribute).
the attribute is often a method, though it may point at any type of object (str, int, list, etc.)

the AttributeError when using .append()

This exception may sometimes result from a misuse of the append() method, which returns None.

mylist = ['a', 'b', 'c']

# oops:  returns None -- call to append() should not be assigned
mylist = mylist.append('d')

mylist = mylist.append('e')        # AttributeError:  'NoneType'
                                   # object has no attribute 'append'

since .append() isn't designed to return an object, it returns None
if we assign to the list variable, it replaces the list with None
the next time we try to use .append() we are attempting to call it on None
Python's error message says "the None object doesn't have an .append() method"

avoiding the incorrect use of .append()

mylist = ['a', 'b', 'c']

mylist.append('d')                 # now mylist equals ['a', 'b', 'c', 'd']

because .append() does not return a useable value, we should not assign from the call
simply call the method and understand that the list will change in-place

sidebar: removing a container element

There are a number of additional list methods to manipulate a list, though they are less often used.

mylist = ['a', 'hello', 5, 9]

popped = mylist.pop(0)         # str, 'a'
                               # (argument specifies the index  of the item to remove)

mylist.remove(5)               # remove an element by value

print(mylist)               # ['hello', 9]

mylist.insert(0, 10)

print(mylist)               # [10, 'hello', 9]

Containers: Tuples and Sets

tuples and sets: like lists but different

It's helpful to contrast these containers and lists.

tuples are like lists, but read-only
sets are like lists, but the items are unordered and unique (no duplicates in a set)

It's easy to remember how to use one of these containers by considering how they differ in behavior.

the tuple container object

A tuple is an immutable ordered sequence of values.

var2 = (1, 2, 3, 'a', 'b')   # initialize a tuple of values

immutable means the tuple cannot be changed once initialized
but the best way to think about tuples is that they are identical lists, except that they are read-only -- they can't be changed
any value can be placed within a tuple, and they do not need to be of the same type

subscripting a tuple

Subscripting allows us to read individual items from a tuple.

mytuple = (1, 2, 3, 'a', 'b')       # initialize a tuple of values

xx = mytuple[3]                     # 'a'

Note that indexing starts at 0, so index 1 is the 2nd item, index 2 is the 3rd item, etc.

slicing a tuple

Slicing a tuple returns a new tuple.

var2 = (1, 2, 3, 'a', 'b')             # initialize a tuple of values

subtuple1 = var2[0:3]                  # (1, 2, 3)

subtuple2 = var2[2:4]                  # (3, 'a')

subtuple3 = var2[3:]                   # ('a', 'b')

Remember the rules of slicing, same as lists and strings:

indexing begins at 0, so 0 is the first item
the "upper bound" (2nd integer) is an index one greater than the item that will be returned (non-inclusive)
to slice off the end, leave off the upper bound

concatenating two tuples with +

Concatenation works in the same way as lists and strings.

var = ('a', 'b', 'c')
var2 = ('d', 'e', 'f')

var3 = var + var2                  # ('a', 'b', 'c', 'd', 'e', 'f')

"set" container object

A set is an unordered, unique collection of values.

Initialize a Set

myset = set()                  # initialize an empty set (note empty curly
                               # are reserved for dicts)

myset = {'a', 9999, 4.3, 'a'}  # initialize a set with elements

print(myset)                   # {9999, 4.3, 'a'}

note that the set has changed the order of items (this may vary from time to time)
note also that the duplicate 'a' has been eliminated
this illustrates the two salient characteristics of a set

adding an item to a set

myset = set()                  # initialize an empty set

myset.add(4.3)                 # note well method call not assigned
myset.add('a')

print(myset)                   # {'a', 4.3}    (order is not
                               #                necessarily maintained)

getting information about a set or tuple

# Get Length of a set or tuple (compare to len() of a list or string)
myset = {1, 2, 3, 'a', 'b'}

yy = len(myset)              # 5 (# of elements in myset)


# Test for membership in a set or tuple
mytuple = (1, 2, 3, 'a', 'b')

if 'b' in mytuple:                        # this is True for mytuple
    print("'b' can be found in mytuple")

print('b' in mytuple)                      # "True":  the 'in' operator
                                          # actually returns True or False

note that thee operations are identical as for a list
Python tries as much as may be possible to unify operations between similar objects

looping through a set or tuple

The 'for' loop allows us to traverse a set or tuple and work with each item.

mytuple = (1, 2, 3, 'a', 'b')            # could also be a set here

for var in mytuple:
    print(var)                            # prints 1, then 2, then 3,
                                         # then a, then b

Whether a list, set or tuple, these operations work in the same way.

summary functions: len(), sum(), max(), min()

Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?

Whether a set or tuple, these operations work in the same way.

mytuple = (1, 3, 5, 7, 9)       # initialize a tuple
myset =   {1, 3, 5, 7, 9}       # initialize a set

print(len(mytuple))              # 5  (count of items)
print(sum(myset))                # 25 (sum of values)
print(min(myset))                # 1 (smallest value)
print(max(mytuple))              # 9 (largest value)

sorting a set or tuple

Regardless of type, sorted() returns a list of sorted values.

mytuple = (4, 9, 1.2, -5, 200, 20)       # could also be a set here

smyl = sorted(mytuple)                   # [-5, 1.2, 4, 9, 20, 200]

Whether a set or tuple, these operations work in the same way.
Note that unlike other operations discussed here, sorted() returns a list. This is always the case sorting any object.

Building Up Containers from File

introduction: building up containers from file

This technique forms the core of much of what we do.

In order to work with data, the usual steps are:

read a data source, such as file, database, or network response
select the data that we'd like to work with
add that data to one or more containers
use the container to analyze the data
use the container to produce new data or new values
store the results in another data source

We call this process Extract-Transform-Load, or ETL. ETL is at the heart of what core Python does best.

looping through a data source and building up a list

This "summary algorithm" is very similar to building a float sum from a file source.

build a list of company names

company_list = []                             # initialize an empty list
fh = open('revenue.csv')                      # 'file' object

for line in fh:                               # str, 'Haddad's,PA,239.50'

    elements = line.split(':')                # list, ["Haddad's", 'PA', '239.50']
    company_list.append(elements[0])          # add the name for this row
                                              # to company_list

print(company_list)       # list, ["Haddad's", 'Westfield', 'The Store', "Hipster's",
                          #        'Dothraki Fashions', "Awful's", 'The Clothiers']

fh.close()

Just as we did when counting lines of a file or summing up values, we can use a 'for' loop over a file to collect values.

looping through a data source and building up a unique set

This "summary algorithm" uses a set collect unique items from repeating data.

state_set = set()                       # initialize an empty list
fh = open('revenue.csv')                # 'file' object

for line in fh:                         # str, 'Haddad's,PA,239.50'

    elements = line.split(':')          # list, ["Haddad's", 'PA', '239.50']
    state_set.add(elements[1])          # add the state for this row
                                        # to state_set

print(state_set)       # set, {'PA', 'NY', 'NJ'}   (your order may be different)

chosen_state = input('enter a state:  ')

if chosen_state in state_set:
   print('that state was found in the file')
else:
    print('that state was not found')

fh.close()

the state value in the 2nd column has many state values repeated
if our only purpose is to know which states appear in the file, we can simply add all state values to the set and know that duplicates will be removed

treating a file as a list

Data files can be rendered as lists of lines, and slicing can manipulate them holistically rather than by using a counter.

fh = open('student_db.txt')
file_lines_list = fh.readlines()          # a list of lines in the file
print(file_lines_list)
      # [ "id:address:city:state:zip",
      #   "jk43:23 Marfield Lane:Plainview:NY:10023",
      #   "ZXE99:315 W. 115th Street, Apt. 11B:New York:NY:10027",
      #   "jab44:23 Rivington Street, Apt. 3R:New York:NY:10002" ... (list continues) ]

wanted_lines = file_lines_list[1:]        # take all but 1st element
                                          # (i.e., 1st line)
for line in wanted_lines:
    print(line.rstrip())                   # jk43:23 Marfield Lane:
                                          # Plainview:NY:10023

                                          # axe99:315 W. 115th Street,
                                          # Apt. 11B:New York:NY:10027

                                          # jab44:23 Rivington Street,
                                          # Apt. 3R:New York:NY:10002

                                          # etc.
fh.close()

in this example, we want to skip the 'header' line of the file
ather than count the lines and skip line 1, we simply treat the entire file as a list and slice the list as desired - in this case, to slice only the data lines (2nd to end)

slicing and dicing a file: the line, word, character count (1/3)

Once we have read a file as a single string, we can "chop it up" any way we like.

# read(): file text as a single strings
fh = open('guido.txt')          # 'file' object
text = fh.read()                # read() method called on
                                # file object returns a string

fh.close()                      # close the file

print(text)
print(len(text))                 # 207 (number of characters in the file)

    # single string, entire text:

    # 'For three months I did my day job, \nand at night and
    #  whenever I got a \nchance I kept working on Python.  \n
    #  After three months I was to the \npoint where I could
    #  tell people, \n"Look here, this is what I built."'

once the file is read as a string, we can do all kinds of string operations
'in' (to find a substring)
.replace()
.count(), etc.
we can also process the string data further -- see next

slicing and dicing a file: splitting a string into words (2/3)

String .split() on a whole file string returns a list of words.

file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built." """

words = file_text.split()      # split entire file on whitespace (spaces or newlines)

print(words)
    # ['For', 'three', 'months', 'I', 'did', 'my', 'day', 'job,',
    #  'and', 'at', 'night', 'and', 'whenever', 'I', 'got', 'a',
    #  'chance', 'I', 'kept', 'working', 'on', 'Python.', 'After',
    #  'three', 'months', 'I', 'was', 'to', 'the', 'point', 'where',
    #  'I', 'could', 'tell', 'people,', '“Look', 'here,', 'this',
    #  'is', 'what', 'I', 'built.”']

print(len(words))       # 42 (number of words in the file)

the "triple-quoted string" above is also called a "multi-line" string
it is the same data form that we get from the file .read() method, previously
.split() splits on whitespace, which separates each word
(newlines, which separate each line, are also considered whitespace)
we now have the opportunity to count the words, sort the words, subscript the first or last word, etc.

slicing and dicing a file: the line, word, character count (3/3)

String .splitlines() will split any string on the newlines, delivering a list of lines from the file.

file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built."" """

lines = file_text.splitlines()

print(lines)

    # ['For three months I did my day job, ', 'and at night and whenever I got a ',
    #  'chance I kept working on Python.  ', 'After three months I was to the ',
    #  'point where I could tell people, ', '“Look here, this is what I built.”']

print(len(lines))          # 6 (number of lines in the file)

.splitlines() divides the multi-line string into a list of lines
this has the effect of delivering an entire file as a list of lines, but with the newlines removed
because these lines are in a list, we now have the opportunity to count the lines, slice the lines, subscript the first or last line, etc.

Summary: 3 ways to read strings from a file

for: read (newline ('\n') marks the end of a line)

fh = open('students.txt')        # file object allows looping
                                 # through a series of strings
for my_file_line in fh:          # my_file_line is a string
    print(my_file_line)           # prints each line of students.txt

fh.close()                       # close the file

read(): read entire file as a single string

fh = open('students.txt')  # file object allows reading
text = fh.read()                 # read() method called on file
                                 # object returns a string
fh.close()                       # close the file

print(text)                       # entire text as a single string

readlines(): read as a list of strings (each string a line)

fh = open('students.txt')
file_lines = fh.readlines()      # file.readlines() returns
                                 # a list of strings
fh.close()                       # close the file

print(file_lines)                 # entire text as a list of lines

when we read a file, we can choose from these 3 approaches
we cannot do more than one read on a file, so we choose one
which we choose depends on what we want to do -- analyze the file as a whole (.read() or .readlines()

sidebar: writing to a file

We don't have call to write to a file in this course, but it's important to know how

wfh = open('newfile.txt', 'w') # open for writing # (will overwrite an existing file) wfh.write('this is a line of text\n') wfh.write('this is a line of text\n') wfh.write('this is a line of text\n') wfh.close()

note the second argument to open(): this determines the 'mode'

the mode may be 'r' (reading, the default), 'w' (writing, first deletes the file if it exists) or 'a' (appending, adds to the end of the file)

note also that the newline character ('\n') must be added to the end; the .write() function is unlike print() in that it does not add a newline automatically

sidebar: the range() function

This function allows us to iterate over an integer sequence.

counter = range(10) for i in counter: print(i) # prints integers 0 through 9 for i in range(3, 8): # prints integers 3 through 7 print(i)

If we need an literal list of integers, we can simply pass the iterable to a list:

intlist = list(range(5)) print(intlist) # [0, 1, 2, 3, 4]

Dictionaries: Lookup Tables

dictionaries

A dictionary (or dict) is a collection of unique key/value pairs of objects.

mydict = {} # empty dict mydict = {'a':1, 'b':2, 'c':3} # dict with str keys and int values print(mydict['a']) # look up 'a' to get 1

each item is a pair

the pair consists of a key and a value

the keys in a dict are unique

each key is associated with a value

a dict is addressible by key, in the same way that a list is addressable by index

example uses: dictionaries

Pairs describe data relationships that we often want to consider:

companies paired with annual revenue for each company

employees paired with contact information for each employee

students paired with grade point averages

dates paired with the high temperature for each

web pages paired with the number of times each was accessed

You yourself may consider data in pairs, even in your personal life:

home projects and the amount of time you think each might take

different items you might want to buy at the grocery and the price for each

stores and their distance from your house

restaurants and their ratings

your siblings and their names

your family members and their ages

types of dictionaries

There are a few main ways dictionaries are used:

a lookup table: pairing clients with their addresses allows you to look up the address of any client

a ranking: pairing companies with their market capitalizations allows you to rank them by market cap

an aggregation: pairing each city with the number of students listed from that city allows you to see which cities have the most

initialize a dict

Dicts are marked by curly braces. Keys and values are separated with colons.

initialize a dict

mydict = {} # empty dict mydict = {'a':1, 'b':2, 'c':3} # dict with str keys and int values

add a key/value pair to a dict

We use subscript syntax to assign a value to a key.

mydict = {'a':1, 'b':2, 'c':3}

mydict['d'] = 4                 # setting a new key and value

print(mydict)                   # {'a': 1, 'c': 3, 'b': 2, 'd': 4}

retrieve a value from a dict using a key

We also use subscript syntax to retrieve a value.

mydict = {'a':1, 'b':2, 'c':3, 'd': 4}

dval = mydict['d']                 # value for 'd' is 4

xxx = mydict['c']                  # value for 'c' is 3

You might notice that this subscripting is very close in syntax to list subscripting. The only difference is that instead of an integer index we are using the dict key (most often a string).

the KeyError exception

This exception is raised when we request a key that does not exist in the dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

val = mydict['d']       # KeyError:  'd'

Like the IndexError exception, which is raised if we ask for a list item that doesn't exist, KeyError is raised if we ask for a dict key that doesn't exist.

check for key membership

If we're not sure whether a key is in the dict, before we subscript we can check to confirm.

mydict = {'a': 1, 'b': 2, 'c': 3}

if 'a' in mydict:
    print("'a' is a key in mydict")

Dictionaries: Rankings

dictionary rankings

Dictionaries can be sorted by value to produce a ranking.

dictionaries, particularly ones that have numeric values, can be sorted by value
if the value is a quantitative measure for each key, the dict can be used as a ranking
examples are: students and their grade point averages, companies and their market caps, sports teams and their wins, competing products and their prices

loop through dict keys and values

We loop through keys and then use subscripting to get values.

mydict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

for key in mydict:         # a
    val =  mydict[key]
    print(key)             # a
    print(val)             # 1
    print()
                           # b
                           # 2

                           # (etc.)

Note that plain 'for' looping over a dict delivers the keys:

for key in mydict:
    print(key)             # prints a, then b, then c...

review: sorting any container with sorted()

With any container or iterable (list, tuple, file), sorted() returns a list of sorted elements.

namelist = ['jo', 'pete', 'michael', 'zeb', 'avram']

slist = sorted(namelist)          # ['avram', 'jo', 'michael', 'pete', 'zeb']

Remember that no matter what container is passed to sorted(), the function returns a list. Also remember that the reverse=True argument to sorted() can be used to sort the items in reverse order.

sorting a dict (sorting its keys)

sorted() returns a sorted list of a dict's keys.

bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'alice': 184}

sorted_keys = sorted(bowling_scores)

print(sorted_keys)     # [ 'alice', 'jeb', 'mike', 'zeb' ]

for key in sorted_keys:
    print(f'{key}={bowling_scores[key]}')

sorting a dictionary's keys by its values

A special "sort criteria" argument can cause Python to sort a dict's keys by its values.

bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'alice': 184}

sorted_keys = sorted(bowling_scores, key=bowling_scores.get)

print(sorted_keys)                 # ['zeb', 'jeb', 'alice', 'mike']

for player in sorted_keys:
    print(f"{player} scored {bowling_scores[player]}")

        ##  zeb scored 98
        ##  jeb scored 123
        ##  alice scored 184
        ##  mike scored 202

The key= argument allows us to specify an alternate criteria by which we might sort the keys. The .get() method takes a key and returns a value from the dict, which is what we are asking sorted() to do with each key when sorting by value. However, this complex sorting is more advanced a topic than we cabn cover here.

assign multiple values to individual variables

multi-target assignment performs the assignments in one statement

csv_line = "Haddad's,PA,239.50"

row = csv_line.split(',')        # ["Haddad's", 'PA', '239.50']

codata = ["Haddad's", 'PA', '239.50']

company, state, revenue = codata

print(company)       # "Haddad's"
print(revenue)       # 239.50

csv_line = 'jk43:23 Marfield Ln.:Plainview:NY:10024'

stuid, street, city, state, zip = csv_line.split(':')

print(stuid)      # 'jk43'
print(city)       # 'Plainview'

build up a dict from two fields in a file

As with all containers, we loop through a data source, select and add to a dict.

ids_names = {}                 # initialize an
                               # empty dict

fh = open('student_db.txt')
for line in fh:
    stuid, street, city, state, zip = line.split(':')

    ids_names[stuid] = state   # key id is paired to
                               # student's state


print("here is the state for student 'jb29':  ")
print(ids_names['jb29'])        #  NJ

fh.close()

Dictionaries: Aggregations

dict aggregations

A "counting" or "summing" dictionary answers the question "how many of each" or "how much of each".

Aggregations may answer the following questions:

how many students are from each state or country? (count)
how many cars are sold by each automaker? (count)
what is the total $ sales generated by each sales associate? (sum)
what are the total number of hours billed to each client? (sum)

The dict is used to store this information. Each unique key in the dict will be associated with a count or a sum, depending on how many we found in the data source or the sum of values associated with each key in the data source.

building a counting dict

A "counting" dict increments the value associated with each key, and adds keys as new ones are found.

Customarily we loop through data, using the dictionary to keep a tally as we encounter items.

state_count = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')       # ["Haddad's", 'PA', '239.50']
    state = items[1]              # str, 'PA'

    if state not in state_count:
        state_count[state] = 0

    state_count[state] = state_count[state] + 1


print(state_count)                # {'PA': 2, 'NJ': 2, 'NY': 3}

print("here is the count of states from revenue.csv:  ")
for state in state_count:
    print(f"{state}:  {state_count[state]} occurrences")

print("here is the count for 'NY':  ")
print(state_count['NY'])                   # 3

fh.close()

building a summing dict

A "summing" dict sums the value associated with each key, and adds keys as new ones are found.

As with a counting dict, we loop through data, using the dictionary to keep a tally as we encounter items.

state_sum = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')          # ["Haddad's", 'PA', '239.50']
    state = items[1]                 # str, 'PA'
    value = float(items[2])          # float, 239.5

    if state not in state_sum:
        state_sum[state] = 0

    state_sum[state] = state_sum    [state] + value


print(state_sum)      # {'PA': 263.45, 'NJ': 265.4, 'NY': 133.16}

print("here is the sum for 'NY':  ")
print(state_sum['NY'])                 # 133.16

fh.close()

dictionary size with len()

len() counts the pairs in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

print(len(mydict))                 # 3 (number of keys in dict)

sidebar: dict .get() method

This method may be used to retrieve a value without checking the dict to see if the key exists.

mydict = {'a': 1, 'b': 2, 'c': 3}

xx = mydict.get('a', 0)          # 1 (key exists so paired value is returned)

yy = mydict.get('zzz', 0)        # 0 (key does not exist so the
                                 #    default value is returned)

You may use any value as the default. This method is sometimes used as an alternative to testing for a key in a dict before reading it -- avoiding the KeyError exception that occurs when trying to read a nonexistent key.

sidebar: obtaining keys of a dict

The .keys() method gives access to the keys in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

these_keys = mydict.keys()

for key in these_keys:
    print(key)

print(list(these_keys))            # ['a', 'c', 'b']

sidebar: obtaining values of a dict

The .values() method gives views on the dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

values = list(mydict.values())     # [1, 2, 3]

if 'c' in mydict.values():
    print("'c' was found")

for value in mydict.values():
    print(value)

The values cannot be used to get the keys - it's a one-way lookup from the keys. However, we might want to check for membership in the values, or sort or sum the values, or some other less-used approach.

sidebar: using the dict .items() method

.items() gives key/value pairs as 2-item tuples.

mydict = {'a': 1, 'b': 2, 'c': 3}

print(list(mydict.items()))         # [('a', 1), ('c', 3), ('b', 2)]

for key, value in mydict.items():
    print(key, value)               # a 1
                                    # b 2
                                    # c 3

.items() is usually used as another approach for looping through a dict. With each iteration for 'for', or each item when converted to a list, we see a 2-item tuple. The first item is a key, and the second a value. When looping with 'for', since each iteration produces a 2-item (key/value) tuple, we can assign the key and value to variable names and use them immediately, rather than resorting to subscripting. This is usually easier and it is also more efficient.

sidebar: working with dict items()

dict items() can give us a list of 2-item tuples. dict() can convert this list back to a dictionary.

mydict = {'a': 1, 'b': 2, 'c': 3}
these_items = list(mydict.items())    # [('a', 1), ('c', 3), ('b', 2)]

newdict = dict(these_items)

print(newdict)                        # {'a': 1, 'b': 2, 'c': 3}

2-item tuples can be sorted and sliced, so they are a handy alternate structure.

sidebar: converting parallel lists to tuples

zip() zips up parallel lists into tuples; dict() can convert this to dict.

list1 = ['a', 'b', 'c', 'd']
list2 = [ 1,   2,   3,   4 ]

tupes = list(zip(list1, list2))

print(tupes)          # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print(dict(tupes))    # {'a': 1,    'b': 2,   'c': 3,   'd': 4}

Occasionally we are faced with two lists that relate to each other one a 1-to-1 basis... or, we sometimes even shape our data into this form. Paralell lists like these can be zipped into multi-item tuples.

Exception Trapping

exception trapping: handling errors after they occur

Introduction: unanticipated vs. anticipated errors

Think of errors as being of two general kinds -- unanticipated and anticipated:

unanticipated errors happen due to errors in our code, which we usually find during development. We respond to these by fixing our code so the errors don't occur.
anticipated errors are ones that we know could occur due to external circumstances

Exampls of anticipated errors:

we ask the user for a number to convert to int() but they give us a non-number
we try to open a file, but it has been moved or deleted
we try to connect to a database, but it is down

KeyError: when a dictionary key cannot be found.

If the user enters a key that is not in the dict, we can expect this error.

mydict = {'1972': 3.08, '1973': 1.01, '1974': -1.09}

uin = input('please enter a year: ')         # user enters 2116

print(f'mktrf for {uin} is {mydict[uin]}')

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 5, in <module>
  #      print(f'mktrf for {uin} is {mydict[uin]}')
  #                                  ~~~~~~^^^^^
  #  KeyError: '9999'

ValueError: when the wrong value is used with a function or statement.

If we ask the user for a number, but anticipate they might not give us one.

uin = input('please enter an integer:  ')

intval = int(uin)                           # user enters 'hello'

print('{uin} doubled is {intval*2}')

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 3, in <module>
  #      intval = int(uin)                           # user enters 'hello'
  #               ^^^^^^^^
  #  ValueError: invalid literal for int() with base 10: 'hello'

FileNotFoundError: when a file can't be found.

If we attempt to open a file but it has been moved or deleted.

filename = 'thisfile.txt'

fh = open(filename)

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 3, in <module>
  #      fh = open(filename)
  #           ^^^^^^^^^^^^^^
  #  FileNotFoundError: [Errno 2] No such file or directory: 'thisfile.txt'

handling errors approach: "asking for permission"

Up to now we have managed anticipated errors by testing to make sure an action will be succesful.

Examples of testing for anticipated errors:

we test the user's input to see that it's a number before attempting to convert to int()
we check to see if a file exists before attempting to open it
we check to see if a database is online before we try to connect to it

So far we have been dealing with anticipated errors by checking first -- for example, using .isdigit() to make sure a user's input is all digits before converting to int().
However, there is an alternative to "asking for permission": begging for forgiveness.

handling errors approach: "begging for forgiveness"

the try block and except block

try:
    uin = input('please enter an integer:  ')   # user enters 'hello'
    intval = int(uin)                           # int() raises a ValueError
                                                # ('hello' is not a valid value)

    print('{uin} doubled is {intval*2}')

except ValueError:
    exit('sorry, I needed an int')   # the except block cancels the
                                     # ValueError and takes action

Trapping exceptions is an alternative to testing ahead of time: taking action after an anticipated error occurs ("Begging for Forgiveness" rather than "Asking for Permission")
we trap the exception with try, then we handle it with except
the try: block will contain statements from which a potential error condition is anticipated
the except: block will identify the anticipated exception and contain statements to be excecuted if the exception occurs

the procedure for setting up exception handling

It's important to witness the exception and where it occurs before attempting to trap it.

It's strongly recommended that you follow a specific procedure in order to trap an exception:

allow the exception to occur
note the exception type and line number where it occurs
wrap the line that caused the error in a try: block
wrap statements you would like to be executed if the error occurs in an except: block
test that when the exception is raised, the except block is executed
test that when the exception is not raised, the except block is not executed

trapping multiple exceptions

Multiple exceptions can be trapped using a tuple of exception types.

companies = ['Alpha', 'Beta', 'Gamma']

user_index = input('please enter a ranking:  ')   # user enters '4' or 'hello'

try:
    list_idx = int(user_index) - 1

    print(f'company at ranking {user_index} is {companies[list_idx]}')

except (ValueError, IndexError):
    exit(f'max index is {len(companies) - 1}')

Here we trap two anticipated errors: if the user types a non-number and a ValueError exception is raised, or an invalid list index and an IndexError is raised, the except: block will be executed.

chaining except: blocks

The same try: block can be followed by multiple except: blocks, which we can use to specialize our response to the exception type.

companies = ['Alpha', 'Beta', 'Gamma']

user_index = input('please enter a ranking:  ')   # user enters '4'

try:
    list_idx = int(user_index) - 1

    print(f'company at ranking {user_index} is {companies[list_idx]}')

except ValueError:
    exit('please enter a numeric ranking')

except IndexError:
    exit(f'max index is {len(companies) - 1}')

The exception raised will be matched against each type, and the first one found will excecute its block.

avoiding except: and except exception:

When we don't specify an exception, Python will trap any exception. This is a bad practice.

ui = input('please enter a number: ')

try:
    fval = float(ui)
except:                  # AVOID!!  Should be 'except ValueError:'
    exit('please enter a number - thank you')

However, this is a bad practice. Why?

except: or except Exception: can trap any type of error, so an unexpected error could go undetected
except: or except Exception: does not specify which type of exception was expected, so it is less clear to the reader

There are certain limited circumstances under which we might use except: by itself, or except Exception. These might include wrapping the whole program execution in a try: block and trapping any exception that is raised so the error can be logged and the program doesn't need to exit as a result.

Command Line: Moving Around and Executing a Script

The Command Line

The Command Line (also known as "Command Prompt" or "Terminal Prompt") gives us access to the Operating System's files and programs.

Before the graphical user interface was invented, programmers used a text-based interface called the command line to run programs and read and write files. Programmers still make heavy use of the command line because it provides a much more efficient way to communicate with the operating system than Windows File Explorer or Mac Finder. It is the "power user's" way of talking to the OS, and it should be considered essential for anyone wanting to develop their programming skills. To reach the command line, you must search for and open one of these programs:

On Windows -- search for Command Prompt:

Microsoft Windows [Version 10.0.18363.1016]          # these 2 lines may look different
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Users\david>                       < -- command line

On Mac -- search for Terminal:

Last login: Thu Sep  3 13:46:14 on ttys001

Davids-MBP-3:~ david %                 < -- command line

Your command line will look similar to those shown above, but will have different names and directory paths (for example, your username instead of 'david'). Your prompt may also feature a dollar sign (%) instead of a percent sign. After opening the command line program on your computer, note the blinking cursor: this is the OS awaiting your next command.

The Present Working Directory (pwd)

Your command line session works from one directory location at a time.

When you first launch the command line program, you are placed at a specific directory within your filesystem. We call this the "present working directory". You may "move around" the system, and when you do, your pwd will change. By default, your initial pwd is your home directory -- the directory at which all your individual files are stored. This directory is usually named after your username, and can be found at /Users/[username] or C:\Users\[username]. On Windows: Your present working directory is always displayed as the command prompt.

C:\Users\david>

On Mac: Your present working directory can be shown by using the pwd command:

Davids-MBP-3:~ david % pwd
/Users/david

As we move around the filesystem, we will see the present working directory change. You must always be mindful of the pwd as it is your current location and it will affect how you can access other files and programs in the filesystem.

Listing files in the present working directory: 'ls' or 'dir'

We can list out the contents (files and folders) of any directory.

On Mac, use the 'ls' command to see the files and folders in the present working directory:

Davids-MBP-3:~ david % ls

Applications
Desktop
Documents
Downloads
Dropbox
Library
Movies
Music
Public
PycharmProjects
Sites
archive
ascii_test.py
requests_demo.py
static.zip

On Windows, use the 'dir' command to see the files and folders in the present working directory:

C:\Users\david> dir

 Volume Serial Number is 0246-9FF7

 Directory of C:\Users\david

08/29/2020  11:37 AM    <DIR>          .
08/29/2020  11:37 AM    <DIR>          ..
05/29/2020  06:27 PM    <DIR>          .astropy
05/29/2020  06:35 PM    <DIR>          .config
05/29/2020  06:36 PM    <DIR>          .matplotlib
08/07/2020  10:33 AM             1,460 .python_history
08/29/2020  11:28 AM    <DIR>          3D Objects
08/29/2020  11:28 AM    <DIR>          Contacts
08/29/2020  12:50 PM    <DIR>          Desktop
08/29/2020  11:28 AM    <DIR>          Documents
09/02/2020  10:25 AM    <DIR>          Downloads
08/29/2020  11:28 AM    <DIR>          Favorites
08/29/2020  11:28 AM    <DIR>          Links
08/29/2020  11:28 AM    <DIR>          Music
08/29/2020  11:29 AM    <DIR>          OneDrive
08/29/2020  11:28 AM    <DIR>          Pictures
08/29/2020  12:46 PM    <DIR>          PycharmProjects
08/29/2020  11:28 AM    <DIR>          Saved Games
08/29/2020  11:28 AM    <DIR>          Searches
08/29/2020  11:28 AM    <DIR>          Videos
               1 File(s)          1,460 bytes
              20 Dir(s)   7,049,539,584 bytes free

Moving Around the Directory Tree With 'cd'

The 'change directory' command moves us 'up' or 'down' the tree.

To move around the filesystem (i.e. to change the present working directory), we use the cd ("change directory") command. In the examples below, note how the present working directory changes after we move. [Please note: in the paths below you'll see that my class project directory python_data_ipy/ is in my Downloads/ directory (i.e., at /Users/david/Downloads/python_data_ipy). If you want your output and directory moves to match mine, you can put yours there -- or if you can substitute your own directory path for the one I'm using.]

on Mac:

Davids-MBP-3:~ david % pwd
/Users/david

Davids-MBP-3:~ david % cd Downloads

Davids-MBP-3:~ david % pwd
/Users/david/Downloads

on Windows:

C:\Users\david> cd Downloads
C:\Users\david\Downloads>

So using the ls or dir command together with the cd command, we can travel from directory to directory, listing out the contents of each directory to decide where to go next (for Windows in the below examples, simply substitute the dir command for ls -- also note that Windows output for dir will look different than below):

Davids-MBP-3:Downloads david % pwd
/Users/david/Downloads

Davids-MBP-3:Downloads david % ls       dir on Windows
python_data_ipy
[... likely other files/folders as well ...]

Davids-MBP-3:Downloads david % cd python_data_ipy

Davids-MBP-3:python_data_ipy david % pwd
/Users/david/Downloads/python_data_ipy

Davids-MBP-3:python_data_ipy david % ls       dir on Windows

session_00_test_project
session_01_objects_types
session_02_funcs_condits_methods
session_03_strings_lists_files
session_04_containers_lists_sets
session_05_dictionaries
session_06_multis
session_07_functions_power_tools
session_08_files_dirs_stdout
session_09_funcs_modules
session_10_classes
username.txt

Davids-MBP-3:python_data_ipy david % cd session_06_multis/

Davids-MBP-3:session_06_multis david % ls    dir on Windows
warmup_exercises
inclass_exercises
notebooks_inclass_warmup
[...several more files and folders, may be in a different order...]

Davids-MBP-3:session_06_multis david % cd inclass_exercises

Davids-MBP-3:inclass_exercises david % ls    dir on Windows
inclass_6.1.py
inclass_6.2.py
inclass_6.3.py
inclass_6.4.py
inclass_6.5.py
inclass_6.6.py
...

Davids-MBP-3:inclass_exercises david % pwd
/Users/david/Downloads/python_data_ipy/session_06_multis/inclass_exercises

The 'parent directory'

The '..' (double dot) indicates the parent and can move us one directory "up".

As you saw, we can move "down" the directory tree by using the name of the next directory -- this extends the path (Mac paths will of course look different; use pwd to confirm your present working directory):

C:\Users\david> cd Desktop

C:\Users\david\Desktop>

But if we'd like to travel up the directory tree, we use the special directory shortcut .. which signifies the parent directory:

C:\Users\david\Desktop> cd ..

C:\Users\david\> cd ..

C:\Users\>

We can also travel directly to an inner folder by using the full path. In order to complete the next exercise, I'll travel to an inner folder within my project directory (again, yours may be different depending on where you put the project folder):

C:\Users\> cd david\Downloads\python_data_ipy\session_06_multis\inclass_exercises

Executing a Python Script from the Command Line

This is the "true" way to ask Python to execute our script.

Every developer should be able execute scripts through the command line, without having to use an IDE like PyCharm or Jupyter. If you are in the same directory as the script, you can execute a program by running Python and telling Python the name of the script:

On Windows:

C:\Users\david\Downloads\python_data_ipy\session_06_multis\inclass_exercises\> python inclass_6.1.py

On Mac:

Davids-MBP-3:inclass_exercises david % python3 inclass_6.1.py

Unless you've changed it, you won't see any result from running this program, because it does not print anything. Make a change and run it again to see the result! Each week we'll try to spend a few minutes traveling to and executing one or more Python programs from the command line.

The JSON File Format and Multidimensional Containers

the json file format

JavaScript Object Notation is a simple "data interchange" format for sending or storing structured data as text.

JSON is used by many web APIs and other programs for communcating data
it is "lightweight", meaning it does not require overhead as with a database server
it is text-based, so it can be easily stored to file or transmitted over a network
it was originally created for JavaScript, but became popular for use with any language
it is more flexible than CSV in describing many-to-one relationships
it uses lists and dictionaries to describe data

a sample json file

Fortunately for us, JSON resembles Python in many ways, making it easy to read and understand.

{
   "key1":  ["a", "b", "c"],
   "key2":  {
              "innerkey1": 5,
              "innerkey2": "woah"
            },
   "key3":  false,
   "key4":  null
}

JSON numbers, strings, lists and dictionaries use the same format as Python objects
JSON requires double quotes around strings
the format uses true and false (lowercased) as boolean values
it uses null as its None value
when converted, the file will become a Python container with all values converted to Python objects

reading a structure from a json file

The json.load() function decodes the contents of a JSON file.

Here's a program to read the structure shown earlier, read from a file that contains it:

import json                 # we use this module to read JSON

fh = open('sample.json')
mys = json.load(fh)         # load from a file, convert into Python container
fh.close()

print((type(mys)))            # dict (the outer container of this struct)

print(mys['key2']['innerkey2'])     # woah

reading a structure from a json string

The json.loads() function decodes the contents of a JSON string.

In this example, we show what we must do if we don't have access to a file, but instead receive the data as a string, as in the case with web request.

import json                 # we use this module to read JSON
import requests             # use 'pip install' to install


response = requests.get('https://davidbpython.com/mystruct.json')

text = response.text        # file data as a single string

mys = json.loads(text)      # load from a file, convert into Python container

print((type(mys)))            # dict (the outer container of this struct)

print((mys['key2']['innerkey2']))     # woah

The requests module allows your Python program to act like a web browser, making web requests (i.e., with a URL) and downloading the response. If you try this program and receive a ModuleNotFoundError, you must run pip install at the command line to install it.

printing a complex object readably: writing to a string

A nested object can be confusing to read.

If we have an multidimensional object that is squished together and hard to read, we can use .dumps() with indent=4

import json

obj = {'a': {'x': 1, 'y': 2, 'z': 3}, 'b': {'x': 1, 'y': 2, 'z': 3}, 'c': {'x': 1, 'y': 2, 'z': 3} }

print((json.dumps(obj, indent=4)))

this prints:

{
    "a": {
        "x": 1,
        "y": 2,
        "z": 3
    },
    "b": {
        "x": 1,
        "y": 2,
        "z": 3
    }
}

sidebar: writing an object to json file

We can use json.dump() write to a JSON file.

Dumping a Python structure to JSON

import json

wfh = open('newfile.json', 'w')  # open file for writing

obj = {'a': 1, 'b': 2}

json.dump(obj, wfh)

wfh.close()

Introduction to User-Defined Functions

Proper Code Organization

Core principles

Here are the main components of a properly formatted program:

Triple-quoted string at top of script: "docstring" with description, author, date, etc.
imports: all imports go at the top unless they are expensive imports that may be used only inside some functions
global constants: ALL UPPERCASE variable names of values that are not expected to change and will be available everywhere
functions: all functions appear together before any "main body" code
a "main" function (optional): the "gateway" function that leads to all functions; the program could be "restarted" by calling this function
if __name__ == '__main__': in the "global" or "main body" space (meaning outside of any function), a "module gate" with a test that will be True only if the script was run directly, and False if the script was imported as a module

""" tip_calculator.py -- calculate tip for a restaurant bill
    Author:  David Blaikie dbb212@nyu.edu
    Last modified:  9/19/2017
"""

import sys             # part of Python distribution (installed with Python)
import pandas as pd    # installed "3rd party" modules
import myownmod as mm  # "local" module (part of local codebase)


# constant message strings are not required to be placed
# here, but in professional programs they are kept
# separate from the logic, often in separate "config" files
MSG1 = 'A {}% tip (${}) was added to the bill, for a total of ${}.'
MSG2 = 'With {} in your party, each person must pay ${}.'


# sys.argv[0] is the program's pathname (e.g. /this/that/other.py)
# os.path.basename() returns just the program name (e.g. other.py)
USAGE_STRING = "Usage:  {os.path.basename(sys.argv[0])}   [total amount] [# in party] [tip percentage]


def usage(msg):
    """ print an error message, usage: string and exit

    Args:     msg (str):  an error message
    Returns:  None (exits from here)
    Raises:   N/A (does not explicitly raise an exception)

    """
    sys.stderr.write(f'Error:  {msg}')
    exit(USAGE_STRING)


def validate_normalize_input(args):
    """ verify command-line input

    Args:     N/A (reads from sys.argv)

    Returns:
        bill_amt (float):  the bill amount
        party_size (int):  the number of people
        tip_pct (float):   the percent tip to be applied, in 100’s

    Raises:  N/A (does not explicitly raise an exception)

    """
    if not len(sys.argv) == 4:
        usage('please enter all required arguments')

    try:
        bill_amt = float(sys.argv[1])
        party_size = int(sys.argv[2])
        tip_pct = float(sys.argv[3])
    except ValueError:
        usage('arguments must be numbers')

    return bill_amt, party_size, tip_pct


def perform_calculations(bill_amt, party_size, tip_pct):
    """
    calculate tip amount, total bill and person's share

    Args:
        bill_amount (float):  the total bill
        party_size (int):  the number in party
        tip_pct (float):  the tip percentage in 100’s

    Returns:
        tip_amt (float):  the tip in $
        total_bill (float):  the bill including tip
        person_share (float):  equal share of bill per person

    Raises:
        N/A (does not specifically raise an exception)
    """

    tip_amt = bill_amt * tip_pct * .01
    total_bill = bill_amt + tip_amt
    person_share = total_bill / party_size

    return tip_amt, total_bill, person_share


def report_results(pct, tip_amt, total_bill, size, person_share):
    """ print results in formatted strings

    Args:
        pct (float):  the tip percentage in 100’s
        tip_amt (float):  the tip in $
        total_bill (float):  the bill including tip
        size (int):  the party slize
        person_share (float):  equal share of bill per person
    Returns:
        None (prints result)

    Raises:
        N/A
    """

    print(MSG1.format(pct, tip_amt, total_bill))
    print(MSG2.format(size, person_share))


def main(args):
    """ execute script

    Args:     args (list):  the command-line arguments
    Returns:  None
    Raises:   N/A

    """

    bill, size, pct = validate_normalize_input(args)
    tip_amt, total_bill, person_share = perform_calculations(bill, size,
                                                             pct)

    report_results(pct, tip_amt, total_bill, size, person_share)


if __name__ == '__main__':            # 'main body' code

    main(sys.argv[1:])

The code inside the if __name__ == '__main__' block is intended to be the call that starts the program. If this Python script is imported, the main() function will not be called, because the if test will only be true if the script is executed, and will not be true if it is imported. We do this in order to allow the script's functions to be imported and used without actually running the script -- we may want to test the script's functions (unit testing) or make use of a function from the script in another program. Whether we intend to import a script or not, it is considered a "best practice" to build all of our programs in this way -- with a "main body" of statements collected under function main(), and the call to main() inside the if __name__ == '__main__' gate. This structure will be required for all assignments submitted for the remainder of the course.

user-defined functions

User-defined functions are a block of code that can be executed by name.

def add(val1, val2):
    valsum = val1 + val2
    return valsum

ret = add(5, 10)           # int, 15

ret2 = add(0.3, 0.9)       # float, 1.2

A function is a block of code:

that can be executed by name ("calling" the function)
that can be executed with different inputs
that can be executed repeatedly

user defined functions: calling the function

A user-defined function is simply a named code block that can be executed any number of times.

def print_hello():
    print("Hello, World!")

print_hello()             # prints 'Hello, World!'
print_hello()             # prints 'Hello, World!'
print_hello()             # prints 'Hello, World!'

we are calling the function 3 times in this code
a function call is marked by parentheses after the function name -- this means that the function block is run

user defined functions: arguments

The argument is the input to a function.

def print_hello(greeting, person):              # note we do not
    full_greeting = f'{greeting}, {person}!'    # refer to 'name1'
    print(full_greeting)                        # 'place2', etc.
                                                # inside the function
name1 = 'Hello'
place1 = 'World'

print_hello(name1, place1)             # prints 'Hello, World!'


name2 = 'Bonjour'
place2 = 'Python'

print_hello(name2, place2)             # prints 'Bonjour, Python!'

we are calling the function, passing two arguments (inputs) to the call
the arguments are renamed in the function definition (greeting, person), and the function refers to them by these names
(the argument objects that were passed are copied to the argument names -- they are the same objects with new names)

user defined functions: function return values

A function's return value is passed back from the function using the return statement.

def print_hello(greeting, person):
  full_greeting = f'{greeting}, {person}!'
  return full_greeting

msg = print_hello('Bonjour', 'parrot')

print(msg)                                       # 'Bonjour, parrot!'

now instead of printing the greeting, we are returning it
a value is returned from the function using the return statement
the value returned is assigned to the variable assigned in the call (msg =)

Set Operations, List Comprehensions

Advanced Container Processing

In this unit we will complete our tour of the core Python data processing features.

So far we have explored the reading and parsing of data; the loading of data into built-in structures; and the aggregation and sorting of these structures. This unit explores advanced tools for container processing. list comprehensions and set comparisons are two "power tools" which can do basic things we have been able to do before -- like looping through a list and doing the same thing to each element in a list, loop through and select items from a list, and compare two collections to see what is common or different between them.

set operations

a = {'a', 'b', 'c'}
b = {'b', 'c', 'd'}
print(a.difference(b))            # {'a'}
print(a.union(b))                 # {'a', 'b', 'c', 'd'}
print(a.intersection(b))          # {'b', 'c'}
print(a.symmetric_difference(b))  # {'a', 'd'}

list comprehensions

a = ['hello', 'there', 'harry']
print([ var.upper() for var in a if var.startswith('h') ])
                           # ['HELLO', 'HARRY']

ternary assignment

rev_sort = True if user_input == 'highest' else False

pos_val = x if x >= 0 else x * -1

conditional assignment

val = this or that       # 'this' if this is True else 'that'
val = this and that      # 'this' if this is False else 'that'

Container processing: Set Comparisons

We have used the set to create a unique collection of objects. The set also allows comparisons of sets of objects. Methods like set.union (complete member list of two or more sets), set.difference (elements found in this set not found in another set) and set.intersection (elements common to both sets) are fast and simple to use.

set_a = {1, 2, 3, 4}
set_b = {3, 4, 5, 6}

print(set_a.union(set_b))           # {1, 2, 3, 4, 5, 6}  (set_a + set_b)
print(set_a.difference(set_b))      # {1, 2}              (set_a - set_b)
print(set_a.intersection(set_b))    # {3, 4}     (what is common between them?)

List comprehensions: filtering a container's elements

List comprehensions abbreviate simple loops into one line.

Consider this loop, which filters a list so that it contains only positive integer values:

myints = [0, -1, -5, 7, -33, 18, 19, 55, -100]
myposints = []
for el in myints:
  if el > 0:
    myposints.append(el)

print(myposints)                   # [7, 18, 19, 55]

This loop can be replaced with the following one-liner:

myposints = [ el for el in myints if el > 0 ]

See how the looping and test in the first loop are distilled into the one line? The first el is the element that will be added to myposints - list comprehensions automatically build new lists and return them when the looping is done.

The operation is the same, but the order of operations in the syntax is different:

# this is pseudo code
# target list = item for item in source list if test

Hmm, this makes a list comprehension less intuitive than a loop. However, once you learn how to read them, list comprehensions can actually be easier and quicker to read - primarily because they are on one line. This is an example of a filtering list comprehension - it allows some, but not all, elements through to the new list.

List comprehensions: transforming a container's elements

Consider this loop, which doubles the value of each value in it:

nums = [1, 2, 3, 4, 5]
dblnums = []
for val in nums:
  dblnums.append(val*2)

print(dblnums)                          # [2, 4, 6, 8, 10]

This loop can be distilled into a list comprehension thusly:

dblnums = [ val * 2 for val in nums ]

This transforming list comprehension transforms each value in the source list before sending it to the target list:

# this is pseudo code
# target list = item transform for item in source list

We can of course combine filtering and transforming:

vals = [0, -1, -5, 7, -33, 18, 19, 55, -100]
doubled_pos_vals = [ i*2 for i in vals if i > 0 ]
print(doubled_pos_vals)                # [14, 36, 38, 110]

List comprehensions: examples

If they only replace simple loops that we already know how to do, why do we need list comprehensions? As mentioned, once you are comfortable with them, list comprehensions are much easier to read and comprehend than traditional loops. They say in one statement what loops need several statements to say - and reading multiple lines certainly takes more time and focus to understand.

Some common operations can also be accomplished in a single line. In this example, we produce a list of lines from a file, stripped of whitespace:

stripped_lines = [ i.rstrip() for i in open(r'FF_daily.txt').readlines() ]

Here, we're only interested in lines of a file that begin with the desired year (1972):

totals = [ i for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]

If we want the MktRF values (the leftmost floating-point value on each line) for our desired year, we could gather the bare amounts this way:

mktrf_vals = [ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]

And in fact we can do part of an earlier assignment in one line -- the sum of MktRF values for a year:

mktrf_sum = sum([ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ])

From experience I can tell you that familiarity with these forms make it very easy to construct and also to decode them very quickly - much more quickly than a 4-6 line loop.

List Comprehensions with Dictionaries

Remember that dictionaries can be expressed as a list of 2-element tuples, converted using items(). Such a list of 2-element tuples can be converted back to a dictionary with dict():

mydict =  {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': 1, 'f': 4}

my_items = list(mydict.items())      # my_items is now [('a',5), ('b',0), ('c',-3), ('d',2), ('e',1), ('f',4)]
mydict2 = dict(my_items)       # mydict2 is now   {'a':5,   'b':0,   'c':-3,   'd':2,   'e':1,   'f':4}

It becomes very easy to filter or transform a dictionary using this structure. Here, we're filtering a dictionary by value - accepting only those pairs whose value is larger than 0:

mydict = {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': -22, 'f': 4}
filtered_dict = dict([ (i, j) for (i, j) in mydict.items() if j > 0 ])

Here we're switching the keys and values in a dictionary, and assigning the resulting dict back to mydict, thus seeming to change it in-place:

mydict = dict([ (j, i) for (i, j) in mydict.items() ])

The Python database module returns database results as tuples. Here we're pulling two of three values returned from each row and folding them into a dictionary.

# 'tuple_db_results' simulates what a database returns
tuple_db_results = [
  ('joe', 22, 'clerk'),
  ('pete', 34, 'salesman'),
  ('mary', 25, 'manager'),
]

names_jobs = dict([ (name, role) for name, age, role in tuple_db_results ])

The Command Prompt: Program Arguments

sys.argv to capture command line arguments

sys.argv is a list that holds string arguments entered at the command line

a python script get_args.py

import sys                           # import the sys library

print('first arg: ' + sys.argv[1])   # print first command line arg
print('second arg: ' + sys.argv[2])  # print second command line arg

running the script from the command line, with two arguments

$ python myscript.py hello there
first arg: hello
second arg: there

sys.argv is a list that is automatically provided by the sys module.
This list contains any string arguments to the program that were entered at the command line by the user.
If the user does not type arguments at the command line, then they will not be added to the sys.argv list.

The default item in sys.argv: the program name

sys.argv[0] will always contain the name of our program.

sys.argv[0] always contains the name of the program itself
or, it may contain the pathname that was used to execute the program (i.e., directory path to the program filename)
even if no arguments are passed at the command line, sys.argv always holds this one value

a python script print_args.py

import sys
print(sys.argv)

(passing 3 arguments)

$ python print_args.py hello there budgie
['myscript2.py', 'hello', 'there', 'budgie']

running the script from the command line (passing no arguments)

$ python print_args.py
['myscript2.py']

IndexError with sys.argv (when user passes no argument)

Since we read arguments from a list, we can trigger an IndexError if we try to read an argument that wasn't passed.

a python script addtwo.py

import sys

firstint = int(sys.argv[1])
secondint = int(sys.argv[2])

mysum = firstint + secondint

print(f'the sum of the two values is {mysum}')

passing 2 arguments

$ python addtwo.py 5 10
the sum of the two values is 15

passing no arguments

$ python addtwo.py
Traceback (most recent call last):
  File "addtwo.py", line 3, in <module>
firstint = int(sys.argv[1])
IndexError: list index out of range

How to handle this exception? Test the len() of sys.argv, or trap the exception.

File Tests and Manipulations

os.path.isfile() and os.path.isdir()

With these we can see whether a file is a plain file, or a directory.

import os                         # os ('operating system') module talks
                                  # to the os (for file access & more)
mydirectory = '/Users/david'

items = os.listdir(mydirectory)

for item in items:

    item_path = os.path.join(mydirectory, item)

    if os.path.isdir(item_path):
        print(f"{item}:  directory")
    elif os.path.isfile(item_path):
        print(f"{item}:  file")
                                     # photos:  directory
                                     # backups:  directory
                                     # college_letter.docx:  file
                                     # notes.txt:  file
                                     # finances.xlsx:  file

.isdir() returns True if the listing is a directory
.isfile() returns True if the listing is a file

os.path.exists()

This function tests to see if a file exists on the filesystem.

import os

fn = input('please enter a file or directory name:  ')
if not os.path.exists(fn):
    print('item does not exist')

elif os.path.isfile(fn):
    print('item is a file')

elif os.path.isdir(fn):
    print('item is a directory')

Keep in mind that if an item doesn't even exist, .isdir() and .isfile() will return False
This could lead you to believe that a non-existent entity is a dir based on getting False from .isfile(), or a file based on getting False from .isdir()
These methods will also return False if the item is another kind of entity -- depending on your operating system, a link, or block device, or a socket may be encountered

read file size with os.path.getsize()

os.path.getsize() takes a filename and returns the size of the file in bytes

import os                        # os ('operating system') module
                                 # talks to the os (for file access & more)
mydirectory = '/Users/david'

items = os.listdir(mydirectory)

for item in items:
    item_path = os.path.join(mydirectory, item)
    item_size = os.path.getsize(item_path)
    print(f"{item_path}:  {item_size} bytes")

Remember, as before, that when looping through a directory, Python won't be able to find a file unless its path is prepended. This is why os.path.join() is so important.

moving or renaming a file

moving and renaming a file are essentailly the same thing

import os

filename = 'file1.txt'
new_filename = 'newname.txt'

os.rename(filename, new_filename)

import os

filename = 'file1.txt'      # or could be a filepath incluing directory
move_to_dir = 'old/'

os.rename(filename, os.path.join(move_to_dir, filename))  # file1.txt, old/file1.txt

in the first example, we simply rename a file
in the second example, we retain the old filename, but give it a new path. This is the same as moving the file to a new location - in a sense, we'r renaming the path.

copying or backing up a file

import shutil

filename = 'file1.txt'
backup_filename = 'file1.txt_bk'        # must be a filepath, including filename

shutil.copyfile(filename, backup_filename)

import shutil

filename = 'file1.txt'
target_dir = 'backup'                   # can be a filepath or just a directory name

shutil.copy(filename, target_dir)  # dst can be a folder; use shutil.copy2()

shutil ("shell utilities") is a module for doing all kinds of operations on files, directories, and the like
shutil.copy() can copy a file to a new name or pathname, or to a new directory
it's important to note that the file "metadata" (creation date, modify date, etc.) will not be copied with the file

creating a directory: os.mkdir()

This function is named after the unix utility mkdir.

import os

os.mkdir('newdir')

removing a directory or filetree: os.remove() and shutil.rmtree()

If your directory has files, shutil.rmtree must be used.

import os
import shutil

os.mkdir('newdir')

wfh = open('newdir/newfile.txt', 'w')  # creating a file in the dir
wfh.write('some data')
wfh.close()

os.rmdir('newdir')        # OSError: [Errno 66] Directory not empty: 'newdir'

shutil.rmtree('newdir')   # success

if a directory is empty, os.rmdir() can remove it
if the directory is not empty, this function will fail
in that case, shutil.rmtree() will succeed
obviously, care must be taken before removing an entire file tree!

copying a filetree

import shutil

shutil.copytree('olddir', 'newdir')

Regardless of what files and folders are in the directory to be copied, all files and folders (and indeed all folders and files within) will be copied to the new name or location.

File and Directory Listings

writing to files using the file object

Opening an existing file for writing truncates the file.

fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()

Files can be opened for writing or appending, but not usually for both.
Note that we are explicitly adding newlines to the end of each line. The write() method doesn't do this for us.

appending to files

Appending is usually used for log files.

fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()

Again, note that we are explicitly adding newlines to the end of each line.

show the present/current working directory.

The pwd is the location from which we run our programs.

import os

cwd = os.getcwd()        # str (your current directory)

print(cwd)

a sample file tree

this tree can be found among your course files.

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py

relative filepaths

These paths locate files relative to the present working directory.

If the file you want to open is in the same directory as the script you're executing, use the filename alone:

fh = open('filename.txt')

relative filepaths: parent directory

To reach the parent directory, prepend the filename with ../

fh = open('../filename.txt')

relative filepaths: parent directory

To reach the child directory, prepend the filename with the name of the child directory.

fh = open('<childdir>/filename.txt')

DO NOT INCLUDE ANGLE BRACKETS. <childdir> should be completely replaced with the child directory name.

relative filepaths: sibling directory

To reach a sibling directory, prepend the filename with ../ and the name of the child directory.

fh = open('<childdir>/filename.txt')

DO NOT INCLUDE ANGLE BRACKETS. <childdir> should be completely replaced with the child directory name. To reach a sibling directory, we must go "up, then down" by using ../ to go to the parent, then the sibling directory name to go down to the child.

absolute filepaths

These paths locate files from the root of the filesystem.

In Windows, absolute paths begin with a drive letter, usually C:\:

""" test3a.py:  open and read a file """

filepath = r'C:\Users\david\Desktop\python_data\dir1\file1.txt'
fh = open(filepath)

print(fh.read())

(Note that r'' should be used with any Windows paths that contain backslashes.)

On the Mac, absolute paths begin with a forward slash:

""" test3a.py:  open and read a file """

filepath = '/Users/david/Desktop/python_data/dir1/file1.txt'
fh = open(filepath)

print(fh.read())

(The above paths assume that the python_data folder is in the Desktop directory; your may have placed yours elsewhere on your system. Of course, the above paths also assume that my home directory is called david/; yours is likely different.)

os.path.join()

This function joins together directory and file strings with slashes appropriate with the current operating system.

dirname = '/Users/david'
filename = 'journal.txt'

filepath = os.path.join(dirname, filename)   # '/Users/david/journal.txt'

filepath2 = os.path.join(dirname, 'backup', filename)  # '/Users/david/backup/journal.txt'

this function inserts slashes (when needed) in between directory name and filename elements, joining them together into one string
the slash inserted will be forward slash on Mac/Linux, and backslash on Windows machines
Keep in mind these are only strings, and this is a string join method. Its work is not directly related to any files or directories.

os.listdir(): list a directory

os.listdir() can read the contents of any directory.

import os

mydirectory = '/Users/david'

items = os.listdir(mydirectory)

for item in items:                                # 'photos'

    item_path = os.path.join(mydirectory, item)

    print(item_path)   # /Users/david/photos/
                      # /Users/david/backups/
                      # /Users/david/college_letter.docx
                      # /Users/david/notes.txt
                      # /Users/david/finances.xlsx

Note the os.path.join() call. This is a standard algorithm for looping through a directory -- each item must be joined to the directory to ensure that the filepath is correct.

exceptions for missing or incorrect files or directories

Several exceptions can indicate a file or directory misfire.

exception type	example trigger
FileNotFoundError	attempt to open a file not in this location
FileExistsError	attempt to create a directory (or in some cases a file) that already exists
IsADirectoryError	attempt to open() a file that is already a directory
NotADirectoryError	attempt to os.listdir() a directory that is not a directory
PermissionError	attempt to read or write a file or directory to which you haven't the permissions
WindowsError, OSError	these exception types are sometimes raised in place of one or more of the above when on a Windows computer

traversing a directory tree with os.walk()

os.walk() visits every directory in a directory tree so we can list files and folders.

import os
root_dir = '/Users/david'
for root, dirs, files in os.walk(root_dir):

    for tdir in dirs:                    # loop through dirs in this directory
        print(os.path.join(root, tdir))  # print full path to tdir

    for tfile in files:                  # loop through files in this dir
        print(os.path.join(root, tfile)) # print full path to file

os.walk() traverses an entire directory tree
at each iteration of the 'for' loop, it visits each node (i.e., directory), one at a time
starting with the supplied directory (above, root_dir), it visits each subdirectory, traveling to every node beneath the root

At each iteration, these three variables are assigned these values:

root: str, the "node" or directory currently being read
dirs: list, names of directories found in the current directory
files: list, names of files found in the current directory

More About User-Defined Functions

user-defined functions and code organization

User-defined functions help us organize our code -- and our thinking.

Let's now return to functions from the point of view of code organization. Functions are useful because they:

are separate from, and do not interfere with, the rest of the code
help us implement a modular program design
allow us to test the function separately ("unit testing")
help us avoid repetition in our code
help organize our thinking

review: function block, argument and return value

def add(val1, val2):
    mysum = val1 + val2
    return mysum

a = add(5, 10)      # int, 15

b = add(0.2, 0.2)   # float, 0.4

Review what we've learned about functions:

when we call the function, program execution jumps up to the function block, executes the block, and then returns
the arguments are the two values passed to the function call (inside the parentheses)
inside the function, the arguments are assigned to variables val1 and val2
the return value inside the function is mysum
once we have returned from the function, the return value is assigned to the variable assignment in the call, i.e. a or b above

functions without a return statement return None

When a function does not return anything, it returns None.

def do(arg):
    print(f'{arg} doubled is {arg * 2}')
    # no return statement returns None

x = do(5)        # (prints '5 doubled is 10')

print(x)         # None

note that this function does not have a return statement
even so, we are assigning the return value to a variable (x =)
x would normally be assigned whatever was returned from the function
since the function does not explicitly return anything, it returns a default value (None)
None is the "value that means no value": in other languages it may be called "NULL", "undefined", "void"

Actually, since do() does not return anything useful, then we should not call it with an assignment (i.e., x = above), because no useful value will be returned. If you should call a function and find that its return value is None, it often means that it was not meant to be assigned because there is no useful return value.

the None object type

The None value is the "value that means 'no value'".

zz = None

print(zz)        # None
print(type(zz))  # <class 'NoneType'>

aa = 'None'      # a string -- not the None value!

None is a distinct value: it is expressed by None (no quotes)
None is a value of its own type: NoneType
it is capitalized the same way that are the boolean values True and False
None represents "nothing", "empty", "void" or "undefined"
the null/empty/void value is present in most other programming languages
it is most often used to say "nothing here", "nothing was found", "the request came up empty" and other such meanings
the value must not have quotes -- quotes would be a 4-character string, not None

function argument type: positional

Positional arguments are required to be passed, and assigned by position.

def greet(firstname, lastname):
    print(f"Hello, {firstname} {lastname}!")

greet('Joe', 'Wilson')   # passed two arguments:  correct

greet('Marie')           # TypeError: greet() missing 1 required positional argument: 'lastname'

positional arguments are the ones we have used up until now
the number of arguments shown in the definition, as well as those in the call, must match
(side note) There is no type requirement for arguments to a function. Python will accept whatever objects are passed.

function argument type: keyword

Keyword args are not required, and if not passed return a default value.

def greet(lastname, firstname='Citizen'):
    print(f"Hello, {firstname} {lastname}!")

greet('Kim', firstname='Joe')   # Hello, Joe Kim!

greet('Kim')                    # Hello, Citizen Kim!

this function has one positional (lastname) and one keyword argument (firstname='Citizen')
in the def, the keyword argument specifies a default value
in the first call, the positional and keyword arguments are passed
in the second call, the keyword argument is not passed, and so the function supplies the default value for that variable

User-Defined Function Variable Scoping

variable name scoping: the local variable

Variable names initialized inside a function are local to the function.

def myfunc():
    a = 10
    return a

var = myfunc()
print(var)          # 10
print(a)            # NameError ('a' does not exist here)

variable a does not exist because it was defined/assigned inside the function
assignment inside the function makes the variable local to the function
we call this behavior scoping and say that a is scoped to the function
(it is true that the value 10 survives outside the function, but scoping refers to names, not objects)

variable name scoping: the global variable

Any variable defined outside a function is global.

var = 'hello global'

def myfunc():
    print(var)

myfunc()                  # hello global

any non-local variable defined in our code is global
globals are available both inside and outside a function

"pure" functions

Functions that do not touch outside variables, and do not create "side effects" (for example, calling exit(), print() or input()), are considered "pure" -- and are preferred.

"Pure" functions have the following characteristics:

pure functions do not read from or write to "outside" variables (instead, they work only with arguments passed to the function)
pure functions do not call input() from inside the function
pure functions do not call print() (instead, they return values to be printed outside the function)
pure functions do not call exit() (instead, they use the raise statement to signal errors - discussed later in this course)

"pure" functions: working only with "inside" (local) variables

"Outside" (Global) variables are ones defined outside the function -- they should be avoided.

wrong way: referring to an outside variable inside a function

val = '5'                   # defined outside any function

def doubleit():
    dval = int(val) * 2     # BAD:  refers to "global" variable 'val'
    return dval

new_val = doubleit()

right way: passing outside variables as arguments

val = '5'                   # defined outside any function

def doubleit(arg):
    dval = int(arg) * 2     # GOOD:  refers to same value as 'val',
    return dval             #        but accessed through local
                            #        argument 'arg'

new_val = doubleit(val)     # passing variable to function -
                            #   correct way to get a value into the function

using an outside variable creates a "dependency" between the outside variable and the function: if the variable changes, the behavior of the function changes
the outside variable could be defined in another part of the program, where we can't see it and may have lost mental track of it
one exception to this rule is the use of constants -- values that are intended never to be changed -- these could be used inside a function without this risk
however, passing the value to the function is the best approach -- this means that we can see explicitly (as well as being able to test) what value is going into the function

"pure" functions: avoiding "side-effects"

print(), input(), exit() all "touch" the outside world and in many cases should be avoided inside functions.

a "side-effect" is something that happens outside the function, but as a result of calling the function
print(): this function reflects values to the screen
input(): this function takes input from the keyboard
exit(): this function terminates program execution altogether

Although it is of course possible (and sometimes practical) to use these built-in functions inside our function, we should avoid them if we are interested in making a function "pure".

"pure" functions: why prefer them?

Here are some positive reasons to strive for purity.

You may notice that these "impure" practices do not cause errors. So why should we avoid them?

pure functions are easier to maintain and extend
pure functions are more modular and thus make it easier to control our programs
pure functions can be tested in isolation
pure functions make code more reliable and less prone to error
pure functions make errors easier to trace and fix

The above rationales will become clearer as you write longer programs.
As your programs become more complex, you will be confronted with more complex errors that are sometimes difficult to trace.
Over time you'll realize that the best practice of using pure functions enhances the "quality" of your code -- making it easier to write, maintain, extend and understand the programs you create.

Please note

print()

exit()

input()

"pure" functions: using 'raise' instead of exit() inside functions

exit() should not be called inside a function.

def doubleit(arg):
    if not arg.isdigit():
        raise ValueError('arg must be all digits')   # GOOD:  error signaled with raise
    dval = int(arg) * 2
    return dval

val = input('what is your value? ')
new_val = doubleit(val)

this function requires input of a certain value - a string that is all digits
if the wrong value is passed, the function needs to respond
"pure" functions avoid exit() because it is a side-effect
the proper way to signal an error is with the raise statement

signalling errors (exceptions) with 'raise'

'raise' creates an error condition (exception) that usually terminates program execution.

Python uses exceptions to signal that it cannot continue
this may be because it doesn't understand, or can't do or won't do what we request
when writing functions, we prefer to signal an error using the raise statement rather than by calling exit() - this is how all built-in functions behave
the responsibility to exit should rest with the calling code, not with a function

To raise an exception, we simply follow raise with the type of error we would like to raise, and an optional message:

raise IndexError('I am now raising an IndexError exception')

You may raise any existing exception (you may even define your own). Here is a list of common exceptions:

Exception Type	Reason
TypeError	the wrong type used in an expression
ValueError	the wrong value used in an expression
FileNotFoundError	a file or directory is requested that doesn't exist
IndexError	use of an index for a nonexistent list/tuple item
KeyError	a requested key does not exist in the dictionary

global variables and function "purity"

Globals should be used inside functions only in select circumstances.

STATE_TAX = .05    # ALL CAPS designates a "constant"


def calculate_bill(bill_amount, tip_pct):

    tax = bill_amount * STATE_TAX     # int, 5
    tip = bill_amount * tip_pct       # float, 20.0

    total_amount = bill_amount + tax + tip   # float, 125.0

    return total_amount


total = calculate_bill(100, .20)      # float, 125.0

here we're using global STATE_TAX inside the function
ALL_CAPS names indicate that we don't intend to change this value
because the value isn't changing, we can use it inside the function knowing what its value will be
if the global were to change, that would change the behavior of this function and lead to bugs that are hard to track down
you must not except in very special cases make changes to a global variable inside a function (for example, a list) as this can cause very challenging debugging situations

the four variable scopes: l-e-g-b

Four kinds of variables: (L)ocal, (E)nclosing, (G)lobal and (B)uiltin.

filename = 'pyku.txt'        # 'filename':  global

                                # 'get_text':  global (function name is a
                                #                      variable as well)
def get_text(fname):            # 'fname':     local
    fh = open(fname)            # 'fh':        local; 'open':  builtin
    text = fh.read()            # 'text':      local
    return text

txt = get_text(filename)        # 'txt':       global
print(txt)                      # 'print':     builtin

builtin variables are those defined by python: print(), len(), etc.
enclosing variables (not shown here) would be those variables defined inside a function that are used inside a nested function, i.e a function defined inside another function

proper code organization

Core principles.

Here are the main components of a properly formatted program:

Triple-quoted string at top of script: "docstring" with description, author, date, etc.
imports: all imports go at the top unless they are expensive imports that may be used only inside some functions
global constants: ALL UPPERCASE variable names of values that are not expected to change and will be available everywhere
functions: all functions appear together before any "main body" code
a "main" function (optional): the "gateway" function that leads to all functions; the program could be "restarted" by calling this function
if __name__ == '__main__': in the "global" or "main body" space (meaning outside of any function), a "module gate" with a test that will be True only if the script was run directly, and False if the script was imported as a module

See the tip_calculator.py file in your files directory for an example and notes below.

Modules in Python

importing built-in modules

Python comes with hundreds of preinstalled modules.

import sys             # find and import the sys
import json            # find and importa the json

print(sys.copyright)   # the .copyright attribute points
                       # to a string with the copyright notice

      # Copyright (c) 2001-2023 Python...


obj = json.loads('{"a": 1, "b": 2}')   # the .loads attribute points to
                                       # a function that reads str JSON data

print(type(obj))    # <class 'dict'>

modules are files that contain reusable Python code for use in our programs
modules are made available through an import statement
we often import several modules in a given script
following import we can access the module code through its attributes
each attribute points to a function, string, list, etc. that is a global variable in the module

other module import patterns

These patterns are purely for convenience when needed.

Abbreviating the name of a module:

import json as js           # 'json' is now referred to as 'js'

obj = js.loads('{"a": 1, "b": 2}')

Importing a module variable into our program directly:

from json import loads      # making the 'loads' function part of the global namespace

obj = loads('{"a": 1, "b": 2}')

Please note that this does not import only a part of the module: the entire module code is still imported.

built-in module examples

Each module has a specific focus.

The sys module has functions that let us work with python's interpreter and how it interacts with the operating system
The os module has functions that let us work with the operating system's files, folders and other processes
The datetime module has functions that let us easily calculate date into the future or past, or compare two dates
The urllib2 module has functions that let us easily make HTTP requests over the internet

user-defined modules

A module of our own design may be saved as a .py file.

messages.py: a simple Python module that prints messages

import sys

def print_warning(msg):
    print(f'Warning!  {msg}')

test.py: a Python script that imports messages.py

import messages

# accessing the print_warning() function
messages.print_warning('Look out!')   # Warning!  Look out!

we can also build our only modules; one way is by creating a .py file with module code
upon import, the entire module is read, compiled and executed
global variables in the module then become attributes of the module

module search path

Python must be told where to find our own custom modules.

To view the currently used module search paths, we can use sys.path

import sys

print(sys.path)        # shows a list of strings, each a directory
                       # where modules can be found

the sys.path list contains all paths that your distribution of Python uses to store modules
to add our own folders (i.e., containing modules we'd like to import) we can augment this list with the PYTHONPATH environment variable (next)

setting the PYTHONPATH system environment variable

Like the PATH for programs, this variable tells Python where to find modules.

when we import a module, Python needs to find it
modules may be located in any of several directories (stored in sys.path)
to extend this list and add our own directories, we add them to the PYTHONPATH
when Python starts up to run a program, it looks for the PYTHONPATH variable and if found, adds the paths specified there to sys.path
for more specific instructions, please see your supplementary documents.

the python standard distribution of modules

Modules included with Python are installed when Python is installed -- they are always available.

Python provides hundreds of supplementary modules to perform myriad tasks. The modules do not need to be installed because they come bundled in the Python distribution, that is they are installed at the time that Python itself is installed. The documentation for the standard library is part of the official Python docs.

various string-related services
specialized containers (type-specific lists and dicts, pseudohashes, etc.)
math calculations and number generation
file and directory manipulation
persistence (saving data on disk)
data compression and archiving (e.g., creating zip files)
encryption
networking and interprocess (program-to-program) communication
internet tasks: web server, web client, email, file transfer, etc.
XML and HTML parsing
multimedia: audio and image file manipulation
GUI (graphical user interface) development
code testing
etc...

PyPI: the python package index

This index contains links to all modules ever added by anyone to the index.

Search for any module's home page at the PyPI website:

https://pypi.python.org/pypi

PyPI is the definitive index of modules written in Python
there are more than 70,000 projects uploaded there, from serious modules used by millions of developers to half-baked ideas that someone decided to share prematurely
usually, we will not search for modules here; we will hear about a module from an online article or through a search

finding third-party modules

Take some care when installing modules -- it is possible to install nefarious code.

We generally find third-party modules by doing web searches, or from colleagues
When we find a module that meets our needs, we should do some research to make sure it's the one we want and need
Although it's very rare, sometimes bad actors have created modules designed to pass viruses
These modules are named similarly to popular modules
We must always be careful when selecting a module to install

- [demo: searching for powerpoint module, verifying ]

installing modules

Third-party modules must be downloaded and installed into your Python distribution.

Commands to use at the command line:

pip search pandas         # searches for pandas in the PyPI repository
pip install pandas        # installs pandas

third-party modules are not part of the Python distribution but have been created for the public to use
thousands of these modules are free and available on demand
the pip utility is installed along with Python

Featured module: math

The math module handles advanced math calculations.

These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)

A quick look at the module's attributes gives us an idea of what is included:

import math

print(dir(math))

   # ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
   #  'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
   #  'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
   #  'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
   #  'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
   #  'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
   #  'tanh', 'tau', 'trunc']

For example, here are some simple geometry calculations using math:

import math

print(math.pi)                           # 3.141592653589793

radius = 3
circumference = 2 * math.pi * radius     # 18.84955592153876

area = math.pi * radius * radius         # 28.274333882308138

Featured module: statistics

This module provides basic statistical analysis.

Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.

import statistics as stats                       # set a convenient name for the module

values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]

# average value
meanval = stats.mean(values)                     # 4.083333333333333

# "middle" value in a list of sorted values (list does not need to be sorted)
medianval = stats.median(values)                 # 4.0

# average distance of each value from the mean
standev = stats.stdev(values)                    # 1.781640374554423

# square of the standard deviation
varianceval = stats.variance(values)             # 3.1742424242424243

# most common value
modeval = stats.mode(values)                     # 6

Featured module: string

This module provides useful lists of characters.

import string

print(string.ascii_letters)       # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.ascii_lowercase)     # abcdefghijklmnopqrstuvwxyz

print(string.ascii_uppercase)     # ABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.digits)              # 0123456789

print(string.hexdigits)           # 0123456789abcdefABCDEF

print(string.punctuation)         # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

print(string.whitespace)          # \t\n\r\x0b\x0c'   (prints as invisible characters)

Featured module: zipfile

The zipfile module builds, unpacks and inspects .zip archives.

import zipfile as zp

myzip = zp.ZipFile('myzip.zip', 'w')

# add names of files (of course these must exist)
myzip.write('file1.txt')
myzip.write('file2.pdf')
myzip.write('file3.doc')

myzip.close()                     # builds and writes zip file

print('done')

After running the above code and referencing real files, check the session files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.

Featured module: time

The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.

time can be used to sleep (or pause execution) for a set number of seconds:

import time

# pause execution # of seconds
time.sleep(5)

We can also use time to show the current time:

# current time and date
print(time.ctime())                 # Sat May 23 17:10:55 2020

At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).

# read current time in seconds
secs = time.time()                  # 1590257729.297496  (includes milliseconds)

# calculate 24 hours, in seconds (subtract 86,400 seconds)
yestersecs = secs - (60 * 60 * 24)

# show the current time minus 24 hours
print(time.ctime(yestersecs))          # Fri May 22 17:10:55 2020

# a "time struct"
print(time.localtime(yestersecs))
                                    # time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
                                    # tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
                                    # tm_yday=143, tm_isdst=1)

The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.

Featured module: datetime

The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.

import datetime as dt


# build a 'date' object from year, month, day
mydate1 = dt.date(2019, 9, 3)


# build a 'date' object representing today
mydate2 = dt.date.today()


# build a datetime object from year, month, day, hour, minute and second
mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)


# build a datetime object representing right now
mydatetime2 = dt.datetime.now()


# build a datetime object from a formatted string
mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')


# build a "timedelta" (time interval) object:  3 days, 2 hours
myinterval = dt.timedelta(days=3, seconds=7200)


# date objects and intervals can be calculated like math
newdate = mydatetime3 + myinterval

print(newdate)                                # 2019-03-06 00:02:00


# render a date object in a string format
print(newdate.strftime('%Y-%m-%d  (%H:%M)'))  # 2019-03-06 (02:00)

Featured module: random

The random module generates pseudorandom numbers.

'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.

import random

# random float from 0 to 1
myfloat = random.random()        # 0.22845730036901912


# random integer from 1 to 10
num = random.randint(1, 10)


# random choice from a list
x = ['a', 'b', 'c']
choice = random.choice(x)        # 'b'

Featured module: csv

The csv module reads and writes CSV files.

import csv

# reading a CSV file
fh = open('dated_file.csv')
reader = csv.reader(fh)

for row in reader:
    print(row)

fh.close()

# writing to a CSV file
wfh = open('newfile.csv', 'w', newline='')
writer = csv.writer(wfh)

writer.writerow(['a', 'b', 'c'])
writer.writerow(['d', 'e', 'f'])
writer.writerow(['g', 'b', 'i'])

wfh.close()                 # required - otherwise you may not see the writes

(newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits. (With Jupyter notebooks or the Python interactive interpreter, the unclosed file will not see changes until after the interpreter is closed.)

Featured module: sqlite3

The sqlite3 module allows file-based writing and reading of relational tables.

# connecting
import sqlite3

conn = sqlite3.connect('mydatabase.db')     # open an existing, or create a new file

cur = conn.cursor()


#creating a table
cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")


# insert rows into a table
rows = [
  [ 'Joe', 23, 23.9],
  [ 'Marie', 19, 7.95 ],
  [ 'Zoe', 29, 17.5 ]
]

for row in rows:
    cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)

conn.commit()                                # essential to see the write


# selecting data from a table
cur = conn.cursor()

cur.execute('SELECT name, years, balance FROM mytable')

for row in cur:
    print(row)            # ('Joe', 23, 23.9)
                          # ('Marie', 19, 7.95)
                          # ('Zoe', 29, 17.5)

Featured module: requests

requests

requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.

import requests

# make URL request; download the response
response = requests.get('http://www.nytimes.com')

# the HTTP response code (200 OK, 404 not found, 500 error, etc.)
status_code = response.status_code

# the text of the response
page_text =   response.text

# decoding the text of the response (if necessary)
page_text = page_text.encode('utf-8')

print(f'status code:  {status_code}')
print('======================= page text =======================')
print(page_text)

Featured module: urllib

If requests is not available on your system, urllib provides similar functionality.

import urllib

# make URL request; download the text of the response
read_object = urllib.request.urlopen('http://www.nytimes.com')

# a file-like object, can also 'for' loop or use .readlines()
text = read_object.read()

# decoding the text of the response (if necessary)
text = text.decode('utf-8')

SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
read_object = urllib.request.urlopen(my_url, context=ctx)

Featured module: bs4 (Beautiful Soup)

The bs4 module can parse HTML to extract data from web pages.

This module must be installed separately.

import bs4

fh = open('dormouse.html')
text = fh.read()
fh.close()

soup = bs4.BeautifulSoup(text, 'html.parser')


# show all plain text in a page
print(soup.get_text())


# retrieve first tag with this name (a <title> tag)
tag = page.title


# same, using <B>.find()
tag = page.find('title')


# find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
link1_a_tags = soup.find_all('a', {'id': 'link1'})


# find all <a> tags (hyperlinks)
tags = soup.find_all('a')

Featured module: re (regular expressions)

The re module can recognize patterns in text and extract portions of text based on patterns.

import re

line = 'a phone number:  213-298-1990'

matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)

print(matchobj.group(1))   # '213-298-1990'

The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.

Featured module: textwrap

The textwrap module allows you to wrap text at a certain width.

import textwrap

text = "This is some really long text that we would like to wrap.  Wouldn't you know it, there's a module for that!  "


# returns a list of lines
# text is limited to 10 characters width
items = textwrap.wrap(text, 10)


# join lines together into multi-line string with new width
print('\n'.join(items))

pandas for table manipulation

The pandas module enables table manipulations similar to those done by excel relational databases.

The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.

pandas can read and write to and from a multitude of formats

import pandas as pd
import sqlite3

# read from multiple formats to a DataFrame
df = pd.read_csv('dated_file.csv')
# df = pd.read_excel('dated_file.xls')
# df = pd.read_json('dated_file.json')

# write DataFrame to multiple formats
df.to_csv('new_file.csv')
# df.to_excel('new_file.xls')
# df.to_json('new_file.json')


# read from database through query
conn = sqlite3.connect('testdb.db')
df = pd.read_sql('SELECT * FROM test', conn)

pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:

df = pd.read_csv('dated_file.csv')


# select rows thru a filter
df2 = df[ df[3] > 18 ]      # all rows where the field in column '3' (4th column) is > 18


# sum, average, etc. a column
df.tax.mean()                         # average values in 'tax' column
df.revenue.sum()                      # sum values in 'revenue' column


# create a new column
df['col99'] = df.col1 + df.revenue   # new column sums 'col1' and 'revenue' field from each line


# groupby aggregation
dfgb = df.state.groupby.sum().revenue       # show sum of revenue for each state

pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.

# groupby bar chart
dfgb.plot().bar()

# weather temp line chart
weather_df.temp.plot().line()

Useful Modules

Useful Modules: Introduction

This slide deck contains basic documentation on some of the most useful modules in the Python standard distribution. There are many more!

As you know, a module is Python code stored in a separate file or files that we can import into our code, to help us do specialized work. The Python documentation lists modules that come installed with Python (collectively, these modules are known as the "Standard Library"). Every module demonstrated below has many features and options. You can refer to documentation, or an article or blog post, to learn more about each.

Featured module: math

The math module handles advanced math calculations.

These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)

A quick look at the module's attributes gives us an idea of what is included:

import math

print(dir(math))

   # ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
   #  'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
   #  'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
   #  'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
   #  'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
   #  'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
   #  'tanh', 'tau', 'trunc']

For example, here are some simple geometry calculations using math:

import math

print(math.pi)                           # 3.141592653589793

radius = 3
circumference = 2 * math.pi * radius     # 18.84955592153876

area = math.pi * radius * radius         # 28.274333882308138

Featured module: statistics

This module provides basic statistical analysis.

Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.

import statistics as stats                       # set a convenient name for the module

values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]

# average value
meanval = stats.mean(values)                     # 4.083333333333333

# "middle" value in a list of sorted values (list does not need to be sorted)
medianval = stats.median(values)                 # 4.0

# average distance of each value from the mean
standev = stats.stdev(values)                    # 1.781640374554423

# square of the standard deviation
varianceval = stats.variance(values)             # 3.1742424242424243

# most common value
modeval = stats.mode(values)                     # 6

Featured module: string

This module provides useful lists of characters.

import string

print(string.ascii_letters)       # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.ascii_lowercase)     # abcdefghijklmnopqrstuvwxyz

print(string.ascii_uppercase)     # ABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.digits)              # 0123456789

print(string.hexdigits)           # 0123456789abcdefABCDEF

print(string.punctuation)         # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

print(string.whitespace)          # \t\n\r\x0b\x0c'   (prints as invisible characters)

Featured module: zipfile

The zipfile module builds, unpacks and inspects .zip archives.

import zipfile as zp

myzip = zp.ZipFile('myzip.zip', 'w')

# add names of files (of course these must exist)
myzip.write('file1.txt')
myzip.write('file2.pdf')
myzip.write('file3.doc')

myzip.close()                     # builds and writes zip file

print('done')

After running the above code and referencing real files, check this unit's files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.

Featured module: time

The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.

time can be used to sleep (or pause execution) for a set number of seconds:

import time

# pause execution # of seconds
time.sleep(5)

We can also use time to show the current time:

# current time and date
print(time.ctime())                 # Sat May 23 17:10:55 2020

At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).

# read current time in seconds
secs = time.time()                  # 1590257729.297496  (includes milliseconds)

# calculate 24 hours, in seconds (subtract 86,400 seconds)
yestersecs = secs - (60 * 60 * 24)

# show the current time minus 24 hours
print(time.ctime(yestersecs))          # Fri May 22 17:10:55 2020

# a "time struct"
print(time.localtime(yestersecs))
                                    # time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
                                    # tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
                                    # tm_yday=143, tm_isdst=1)

The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.

Featured module: datetime

The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.

import datetime as dt


# build a 'date' object from year, month, day
mydate1 = dt.date(2019, 9, 3)


# build a 'date' object representing today
mydate2 = dt.date.today()


# build a datetime object from year, month, day, hour, minute and second
mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)


# build a datetime object representing right now
mydatetime2 = dt.datetime.now()


# build a datetime object from a formatted string
mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')


# build a "timedelta" (time interval) object:  3 days, 2 hours
myinterval = dt.timedelta(days=3, seconds=7200)


# date objects and intervals can be calculated like math
newdate = mydatetime3 + myinterval

print(newdate)                                # 2019-03-06 00:02:00


# render a date object in a string format
print(newdate.strftime('%Y-%m-%d  (%H:%M)'))  # 2019-03-06 (02:00)

Featured module: random

The random module generates pseudorandom numbers.

'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.

import random

# random float from 0 to 1
myfloat = random.random()        # 0.22845730036901912


# random integer from 1 to 10
num = random.randint(1, 10)


# random choice from a list
x = ['a', 'b', 'c']
choice = random.choice(x)        # 'b'

Featured module: csv

The csv module reads and writes CSV files.

import csv

# reading a CSV file
fh = open('dated_file.csv')
reader = csv.reader(fh)

for row in reader:
    print(row)

fh.close()

# writing to a CSV file
wfh = open('newfile.csv', 'w', newline='')
writer = csv.writer(wfh)

writer.writerow(['a', 'b', 'c'])
writer.writerow(['d', 'e', 'f'])
writer.writerow(['g', 'b', 'i'])

wfh.close()                 # essential - otherwise you may not see the writes until the program exits

Featured module: sqlite3

The sqlite3 module allows file-based writing and reading of relational tables.

# connecting
import sqlite3

conn = sqlite3.connect('mydatabase.db')     # open an existing, or create a new file

cur = conn.cursor()


#creating a table
cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")


# insert rows into a table
rows = [
  [ 'Joe', 23, 23.9],
  [ 'Marie', 19, 7.95 ],
  [ 'Zoe', 29, 17.5 ]
]

for row in rows:
    cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)

conn.commit()                                # essential to see the write


# selecting data from a table
cur = conn.cursor()

cur.execute('SELECT name, years, balance FROM mytable')

for row in cur:
    print(row)            # ('Joe', 23, 23.9)
                          # ('Marie', 19, 7.95)
                          # ('Zoe', 29, 17.5)

Featured module: requests

requests

import requests

# make URL request; download the response
response = requests.get('http://www.nytimes.com')

# the HTTP response code (200 OK, 404 not found, 500 error, etc.)
status_code = response.status_code

# the text of the response
page_text =   response.text

# decoding the text of the response (if necessary)
page_text = page_text.encode('utf-8')

print(f'status code:  {status_code}')
print('======================= page text =======================')
print(page_text)

Featured module: urllib

If requests is not available on your system, urllib provides similar functionality.

import urllib

# make URL request; download the text of the response
read_object = urllib.request.urlopen('http://www.nytimes.com')

# a file-like object, can also 'for' loop or use .readlines()
text = read_object.read()

# decoding the text of the response (if necessary)
text = text.decode('utf-8')

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
read_object = urllib.request.urlopen(my_url, context=ctx)

Featured module: bs4 (Beautiful Soup)

The bs4 module can parse HTML to extract data from web pages.

This module must be installed separately.

import bs4

fh = open('dormouse.html')
text = fh.read()
fh.close()

soup = bs4.BeautifulSoup(text, 'html.parser')


# show all plain text in a page
print(soup.get_text())


# retrieve first tag with this name (a <title> tag)
tag = page.title


# same, using <B>.find()
tag = page.find('title')


# find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
link1_a_tags = soup.find_all('a', {'id': 'link1'})


# find all <a> tags (hyperlinks)
tags = soup.find_all('a')

Featured module: re (regular expressions)

The re module can recognize patterns in text and extract portions of text based on patterns.

import re

line = 'a phone number:  213-298-1990'

matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)

print(matchobj.group(1))   # '213-298-1990'

Featured module: subprocess

The subprocess module allows your program to launch other programs / applications.

import subprocess


# execute another program; read from STDIN and write to STDOUT
subprocess.call(['ls', 'path/to/my/dir'])


# execute another Python script
subprocess.call(['python', 'hello.py'])


# execute another program and capture output
out = subprocess.check_output(['python', 'hello.py'])

Featured module: textwrap

The textwrap module allows you to wrap text at a certain width.

import textwrap

text = "This is some really long text that we would like to wrap.  Wouldn't you know it, there's a module for that!  "


# returns a list of lines
# text is limited to 10 characters width
items = textwrap.wrap(text, 10)


# join lines together into multi-line string with new width
print('\n'.join(items))

pandas for table manipulation

The pandas module enables table manipulations similar to those done by excel relational databases.

pandas can read and write to and from a multitude of formats

import pandas as pd
import sqlite3

# read from multiple formats to a DataFrame
df = pd.read_csv('dated_file.csv')
# df = pd.read_excel('dated_file.xls')
# df = pd.read_json('dated_file.json')

# write DataFrame to multiple formats
df.to_csv('new_file.csv')
# df.to_excel('new_file.xls')
# df.to_json('new_file.json')


# read from database through query
conn = sqlite3.connect('testdb.db')
df = pd.read_sql('SELECT * FROM test', conn)

pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:

df = pd.read_csv('dated_file.csv')


# select rows thru a filter
df2 = df[ df[3] > 18 ]      # all rows where '3' field is > 18


# sum, average, etc. a column
df.tax.mean()                         # average values in 'tax' column
df.revenue.sum()                      # sum values in 'revenue' column


# create a new column
df['col99'] = df.col1 + df.revenue   # new column sums 'col1' and 'revenue' field from each line


# groupby aggregation
dfgb = df.state.groupby.sum().revenue       # show sum of revenue for each state

pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.

# groupby bar chart
dfgb.plot().bar()

# weather temp line chart
weather_df.temp.plot().line()

User-Defined Classes and Object-Oriented Programming

Introduction: Classes

Classes allow us to create a custom type of object -- that is, an object with its own behaviors and its own ways of storing data. Consider that each of the objects we've worked with previously has its own behavior, and stores data in its own way: dicts store pairs, sets store unique values, lists store sequential values, etc. An object's behaviors can be seen in its methods, as well as how it responds to operations like subscript, operators, etc. An object's data is simply the data contained in the object or that the object represents: a string's characters, a list's object sequence, etc.

Objectives for this Unit: Classes

Understand what classes, instances and attributes are and why they are useful
Create our own classes -- our own object types
Set attributes in instances and read attributes from instances
Define methods in classes that can be used by an instance
Define instance initializers with __init__()
Use getter and setter methods to enforce encapsulation
Understand class inheritance

Understand polymorphism

Class Example: the date and timedelta object types

First let's look at object types that demonstrate the convenience and range of behaviors of objects.

A date object can be set to any date and knows how to calculate dates into the future or past. To change the date, we use a timedelta object, which can be set to an "interval" of days to be added to or subtracted from a date object.

from datetime import date, timedelta

dt = date(1926, 12, 30)         # create a new date object set to 12/30/1926
td = timedelta(days=3)          # create a new timedelta object:  3 day interval

dt = dt + timedelta(days=3)     # add the interval to the date object:  produces a new date object

print(dt)                        # '1927-01-02' (3 days after the original date)


dt2 = date.today()              # as of this writing:  set to 2016-08-01
dt2 = dt2 + timedelta(days=1)   # add 1 day to today's date

print(dt2)                       # '2016-08-02'

print(type(dt))                  # <type 'datetime.datetime'>
print(type(td))                  # <type 'datetime.timedelta'>

Class Example: the proposed server object type

Now let's imagine a useful object -- this proposed class will allow you to interact with a server programmatically. Each server object represents a server that you can ping, restart, copy files to and from, etc.

import time
from sysadmin import Server


s1 = Server('blaikieserv')

if s1.ping():
    print('{} is alive '.format(s1.hostname))

s1.restart()                       # restarts the server

s1.copyfile_up('myfile.txt')       # copies a file to the server
s1.copyfile_down('yourfile.txt')   # copies a file from the server

print(s1.uptime())                  # blaikieserv has been alive for 2 seconds

A class block defines an instance "factory" which produces instances of the class.

Method calls on the instance refer to functions defined in the class.

class Greeting:
    """ greets the user """

    def greet(self):
        print('hello, user!')


c = Greeting()

c.greet()                    # hello, user!

print(type(c))                # <class '__main__.Greeting'>

Each class object or instance is of a type named after the class. In this way, class and type are almost synonymous.

Each instance holds an attribute dictionary

Data is stored in each instance through its attributes, which can be written and read just like dictionary keys and values.

class Something:
    """ just makes 'Something' objects """

obj1 = Something()
obj2 = Something()

obj1.var = 5             # set attribute 'var' to int 5
obj1.var2 = 'hello'      # set attribute 'var2' to str 'hello'

obj2.var = 1000          # set attribute 'var' to int 1000
obj2.var2 = [1, 2, 3, 4] # set attribute 'var2' to list [1, 2, 3, 4]


print(obj1.var)           # 5
print(obj1.var2)          # hello

print(obj2.var)           # 1000
print(obj2.var2)          # [1, 2, 3, 4]

obj2.var2.append(5)      # appending to the list stored to attribute var2

print(obj2.var2)          # [1, 2, 3, 4, 5]

In fact the attribute dictionary is a real dict, stored within a "magic" attribute of the instance:

print(obj1.__dict__)      # {'var': 5, 'var2': 'hello'}

print(obj2.__dict__)      # {'var': 1000, 'var2': [1, 2, 3, 4, 5]}

The class also holds an attribute dictionary

Data can also be stored in a class through class attributes or through variables defined in the class.

class MyClass:
    """ The MyClass class holds some data """

    var = 10              # set a variable in the class (a class variable)


MyClass.var2 = 'hello'    # set an attribute directly in the class object

print(MyClass.var)         # 10      (attribute was set as variable in class block)
print(MyClass.var2)        # 'hello' (attribute was set as attribute in class object)

print(MyClass.__dict__)    # {'var': 10,
                          #  '__module__': '__main__',
                          #  '__doc__': ' The MyClass class holds some data ',
                          #  'var2': 'hello'}

The additional __module__ and __doc__ attributes are automatically added -- __module__ indicates the active module (here, that the class is defined in the script being run); __doc__ is a special string reserved for documentation on the class).

object.attribute lookup tries to read from object, then from class

If an attribute can't be found in an object, it is searched for in the class.

class MyClass:
    classval = 10         # class attribute

a = MyClass()
b = MyClass()

b.classval = 99         # instance attribute of same name

print(a.classval)        # 10 - still class attribute
print(b.classval)        # 99 - instance attribute

del b.classval          # delete instance attribute

print(b.classval)        # 10 -- now back to class attribute

print(MyClass.classval)  # 10 -- class attributes are accessible through Class as well

Method calls pass the instance as first (implicit) argument, called self

Object methods or instance methods allow us to work with the instance's data.

class Do:
    def printme(self):
        print(self)      # <__main__.Do object at 0x1006de910>

x = Do()

print(x)                 # <__main__.Do object at 0x1006de910>
x.printme()

Note that x and self have the same hex code. This indicates that they are the very same object.

Instance methods / object methods and instance attributes: changing instance "state"

Since instance methods pass the instance, and we can store values in instance attributes, we can combine these to have a method modify an instance's values.

class Sum:
    def add(self, val):
        if not hasattr(self, 'x'):
            self.x = 0
        self.x = self.x + val

myobj = Sum()
myobj.add(5)
myobj.add(10)

print(myobj.x)      # 15

Instances are often modified using getter and setter methods

These methods are used to read and write instance attributes in a controlled way.

class Counter:
    def setval(self, val):     # arguments are:  the instance, and the value to be set
        if not isinstance(val, int):
            raise TypeError('arg must be a string')

        self.value = val        # set the value in the instance's attribute

    def getval(self):          # only one argument:  the instance
        return self.value       # return the instance attribute value

    def increment(self):
        self.value = self.value + 1

a = Counter()
b = Counter()

a.setval(10)       # although we pass one argument, the implied first argument is a itself

a.increment()
a.increment()

print(a.getval())   # 12


b.setval('hello')  # TypeError

init() is automagically called when a new instance is created

The initializer of an instance allows us to set the initial attribute values of the instance.

class MyCounter:
    def __init__(self, initval):   # self is implied 1st argument (the instance)
        try:
            initval = int(initval)     # test initval to be an int,
        except ValueError:           # set to 0 if incorrect
            initval = 0
        self.value = initval         # initval was passed to the constructor

    def increment_val(self):
        self.value = self.value + 1

    def get_val(self):
        return self.value

a = MyCounter(0)
b = MyCounter(100)

a.increment_val()
a.increment_val()
a.increment_val()

b.increment_val()
b.increment_val()

print(a.get_val())    # 3
print(b.get_val())    # 102

Classes can be organized into an an inheritance tree

When a class inherits from another class, attribute lookups can pass to the parent class when accessed from the child.

class Animal:
    def __init__(self, name):
        self.name = name
    def eat(self, food):
        print('{} eats {}'.format(self.name, food))

class Dog(Animal):
    def fetch(self, thing):
        print('{} goes after the {}!'.format(self.name, thing))

class Cat(Animal):
    def swatstring(self):
        print('{} shreds the string!'.format(self.name))
    def eat(self, food):
        if food in ['cat food', 'fish', 'chicken']:
            print('{} eats the {}'.format(self.name, food))
        else:
            print('{}:  snif - snif - snif - nah...'.format(self.name))

d = Dog('Rover')
c = Cat('Atilla')

d.eat('wood')                 # Rover eats wood.
c.eat('dog food')             # Atilla:  snif - snif - snif - nah...

Conceptually similar methods can be unified through polymorphism

Same-named methods in two different classes can share a conceptual similarity.

class Animal:
    def __init__(self, name):
        self.name = name
    def eat(self, food):
        print('{} eats {}'.format(self.name, food))

class Dog(Animal):
    def fetch(self, thing):
        print('{} goes after the {}!'.format(self.name, thing))
    def speak(self):
        print('{}:  Bark!  Bark!'.format(self.name))

class Cat(Animal):
    def swatstring(self):
        print('{} shreds the string!'.format(self.name))
    def eat(self, food):
        if food in ['cat food', 'fish', 'chicken']:
            print('{} eats the {}'.format(self.name, food))
        else:
            print('{}:  snif - snif - snif - nah...'.format(self.name))
    def speak(self):
        print('{}:  Meow!'.format(self.name))

for a in (Dog('Rover'), Dog('Fido'), Cat('Fluffy'), Cat('Precious'), Dog('Rex'), Cat('Kittypie')):
    a.speak()

                   # Rover:  Bark!  Bark!
                   # Fido:  Bark!  Bark!
                   # Fluffy:  Meow!
                   # Precious:  Meow!
                   # Rex:  Bark!  Bark!
                   # Kittypie:  Meow!

Static Methods and Class Methods

A class method can be called through the instance or the class, and passes the class as the first argument. We use these methods to do class-wide work, such as counting instances or maintaining a table of variables available to all instances. A static method can be called through the instance or the class, but knows nothing about either. In this way it is like a regular function -- it takes no implicit argument. We can think of these as 'helper' functions that just do some utility work and don't need to involve either class or instance.

class MyClass:

    def myfunc(self):
        print("myfunc:  arg is {}".format(self))

    @classmethod
    def myclassfunc(klass):      # we spell it differently because 'class' will confuse the interpreter
        print("myclassfunc:  arg is {}".format(klass))

    @staticmethod
    def mystaticfunc():
        print("mystaticfunc: (no arg)")

a = MyClass()

a.myfunc()             # myfunc:  arg is <__main__.MyClass instance at 0x6c210>

MyClass.myclassfunc()  # myclassfunc:  arg is __main__.MyClass
a.myclassfunc()        # [ same ]

a.mystaticfunc()       # mystaticfunc: (no arg)

Here is an example from Learning Python, which counts instances that are constructed:

class Spam:

    numInstances = 0

    def __init__(self):
        Spam.numInstances += 1

    @staticmethod
    def printNumInstances():
        print("instances created:  ", Spam.numInstances)

s1 = Spam()
s2 = Spam()
s3 = Spam()

Spam.printNumInstances()        # instances created:  3
s3.printNumInstances()          # instances created:  3

Class methods are often used as class "Factories", producing customized objects based on preset values. Here's an example from the RealPython blog that uses a class method as a factory method to produce variations on a Pizza object:

class Pizza:
    def __init__(self, ingredients):
        self.ingredients = ingredients

    def __repr__(self):
        return f'Pizza({self.ingredients!r})'

    @classmethod
    def margherita(cls):
        return cls(['mozzarella', 'tomatoes'])

    @classmethod
    def prosciutto(cls):
        return cls(['mozzarella', 'tomatoes', 'ham'])


marg = Pizza.margherita()
print(marg.ingredients)       # ['mozzarella', 'tomatoes']

schute = Pizza.prosciutto()
print(schute.ingredients)     # ['mozzarella', 'tomatoes']

[pr]