Python 3

home

All Slides on One Page

Introduction; Installations and Setup

class goals

learn Python Fundamentals from the ground up
learn practical skills that model what Python programmers do every day
get to know our new partner, the Python Interpreter
think like a coder: learn specific strategies for debugging

- hello and welcome! this is Python Programming - course is practically focused - all examples build towards practical skills - learn to think like a coder - important to pay close attention to how we solve problems - in fact, our main goal in this class is getting to know the Interpreter

about python

Python's popularity is due to its elegance and simplicity.

first released in 1991 by Guido van Rossum
most popular language in use today
designed to be readable, simple, and even beautiful
emphasis on explicitness, consistency and practicality
it is an elegant language -- usually only one way to do something

Guido points out that we spend much more time reading code than writing it

the zen of python

This is the manifesto of the Python language.

The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!

about me: David Blaikie

I am dedicated to student success.

software developer, release engineer, teacher
worked in the IT industry since 1998
worked at Google, AppNexus and DoubleClick (advertising tech)
taught at New York University School of Profesional Studies since 2000
taught at NASA, U.S. Navy, Cisco, Inuit, Salesforce, and many others

about you: welcome!

Prior exposure to Python is helpful, but not required.

You do not have to know anything about Python or programming, but some personal qualities will be very helpful. These are "soft skills" that will benefit you greatly as you proceed:

being observant: considering carefully what you are seeing -- error messages and program output
curiosity: wondering why things work the way they do
patience: considering what you are seeing before trying your program again
thoughtfulness: realizing that you are discovering a system of logic that follows specific rules. Our most important work will be in discovering how Python responds in different situations
wisdom: seeing mistakes as a learning opportunity, and not something only to be avoided
taking ownership: an "owner" wants the language to work for themselves, not just to pass a test

thinking like a coder means keeping certain understandings in mind throughout this course and especially at the start, I will emphasize these skills

three technical requirements to write and run programs

If you already have an editor and Python installed, you do not need to add the editor or Python.

editor: VS Code (or you may use another editor if you are familiar with it)
python: miniconda Python distribution from Anaconda
class files: download from your course website, unzip

Please keep in mind that if you are already able to write and run Python programs, you only need to add the class files.

configuring VS Code

I personally feel that suggestions and popups are more of a distraction than a help.

VS Code, like most IDEs, tries to be helpful wherever it can
it may make suggestions about your code while you type
these suggestions may relate to your code's logic, style, or other aspects
VS Code and the IntelliSense plugin can be configured to be quieter
it is up to you how to set up your workspace and your IDE

To suppress some suggestions in VS Code: 1. go to your Settings (on Windows, File > Settings; on Mac, Code > Settings) 2. in the search blank, type 'suggestions' (no quotes) 3. check the 'Suppress Suggestions' box 4. set all three 'Quick Suggestions' to Value 'off' 5. you may also want to increase your font: search for 'Font' and set it to a higher number

the course materials

The zip file contains all files needed for our course exercises.

1. Please look for the file called python_data.zip in your course files. 2. Unzip the folder so that it has the following structure:

python_data/
├── 01/
│   ├── 1.1.py
│   ├── 1.2.py
│   ├── ..etc
│   ├── solutions/
├── 02
│   ├── 2.1.py
│   ├── 2.2.py
│   ├── ..etc
│   ├── solutions/
├── 03
├── 04
├── ..etc
├── 13
└── dir1

3. Place this folder in a location where you can find it 4. Later in this session we'll open and explore the folder.

Your New Partner: the Python Interpreter

what do computers do?

Computers can do many different things for us.

Think about what our computers do for us:

perform numeric calculations
analyze, gather, compose, edit text
move files around, search files
download web pages
send email
process image or sound files
play videos and music
display images
operate storage devices to save files
operate hardware like printers, light switches, automobiles, drones, etc.

what do computers really do?

At base, computers really only do three things.

store data in memory and perform calculations
send messages over a network
operate devices

Python can do many things, but we will focus on the first item -- working with data. The main purpose of any programming language is to allow us to store data in memory and then to process that data according to our needs.

programming languages

A programming language like Python is designed to allow us to give instructions to our computer.

your computer understands "machine language", which is a "lower level" programming language
however, machine code is challenging to write
"high level" programming languages were devised to make it easier to communicate with a machine
Python, Java, C, C++, JavaScript, php, Ruby, and C# are all "general purpose" languages
languages like SQL, HTML, CSS, etc. are "domain specific" languages, designed for a specific purpose

the Python Interpreter

The Interpreter is Python itself.

the Interpreter is the program that processes our Python code
it is what we mean when we use action words: "Python runs the program"; "Python prints to the screen"; "Python raises an error" -- all of these refer to the Interpreter
when we run a Python program, the Interpreter reads our Python code and translates it into machine instructions
as it is translating "Python" into "Machine", this is why we call it an Interpreter
we should think of the Interpreter as our new coding partner. It will execute our code, and tell us when and how we have made an error
the true purpose of any study of Python is to understand the Interpreter

evaluate - compile - run

When we run a python program, the Interpeter takes these three steps.

first, the Interpreter reads the Python code stored in a file
it validates the code, checking for errors in syntax
(a SyntaxError occurs when our code is missing or misplacing elements, like a missing period at the end of a sentence.)
code syntax must be perfect for the code to run
after validating the code, the Interpreter converts it into bytecode
it then executes ("runs") the bytecode, statement by statement until complete

what the interpreter can do

Python is very smart in some ways.

execute code very quickly, sometimes instantly
execute any valid instruction
allocate as much memory as we need for a task
tell us immediately when something goes wrong

what the interpreter can't do

Python is not smart in some ways, too!

understand what we're trying to do or what our program means
tell us when we're doing something inefficiently or incorrectly
explain every error to us or explain exactly what went wrong
tell us what we need to do to fix errors

how to respond to exceptions (errors)

We should seek to understand what the Interpreter is telling us.

most of us think of errors only as problems
when something goes wrong, our first instinct is to just fix the error any way we can and move on
however, exceptions are learning opportunities

This learning is not just about making programs work -- it's about understanding the interpreter -- what it can and can't do.

Executing Programs and Using the Lab Exercises

creating a new script (.py file) in VS Code

A 'workspace' in VS Code is usually the same as a'project'.

Open a Folder, which will correspond to a new workspace.

Choose File > Open Folder. A dialog opens to select a folder.
Choose 'New Folder' to create a folder, or select a folder to open
A new workspace is shown at the left margin with your folder name as the workspace name

Add a new file.

Roll over the workspace name on the left, and click the 'new file' icon (the page icon with a 'plus' at the lower left)
Type the name of your file (with a .py extension) and hit [Enter]. The file appears on the left, and the text area for the file appears in the main window on the right, with the first line highlighted and numbered 1.

Create a 'hello, world!' script.

Type the following:

print('hello, world!')
print()

Take care when reproducing the above script - every character must be in its place. (The print() at end is to clarify the Terminal output.) Next, we'll execute the script. scripts vs. programs

executing a script

VS Code may be able to run your script, or some configuration may be required.

Attempt to run your script.

At the upper right, click the "run" arrow
You may see output at the bottom, including the 'hello, world!' text. (Look carefully for it, it may be squeezed in between other output.)
If you see an error popup indicating that a version of Python must be installed: click 'Select Python Interpreter' at the far lower right, and choose your preferred version of Python. Your version must be numbered 3.8 or greater. Make sure not to select 2.7!
If you do not see your version of Python listed, please seek assistance from your course manager.

understanding terminal output in VS Code

By default, VS Code passes your code to the Python Interpreter and executes it at the command line.

On my Mac, I see this output:

->  /Users/david/miniconda3/bin/python /Users/david/test_project/test.py
->  (base) DavidBs-MacBook-Pro:test_project david% /Users/david/miniconda3/bin/python /Users/david/test_project/test.py
->  hello, world!

->  (base) DavidBs-MacBook-Pro:test_project david%

(Note that arrows above are just to indicate each line of output - they will not appear in your display.)
Your output will look slightly different than mine; for example, on Windows you're likely to see paths starting with C:\ rather than / (forward slash) on Mac/Linux
The first and second -> lines indicate that I am running Python and feeding my test.py file to it
Below that is the output of the program
Lastly, we see the terminal prompt again, indicating the Terminal is ready for the next command

when programs run without error in VSCode terminal

'Without error' means Python did everything you asked.

On my Mac, I see this output:

->  /Users/david/miniconda3/bin/python /Users/david/test_project/test.py
->  (base) DavidBs-MacBook-Pro:test_project david% /Users/david/miniconda3/bin/python /Users/david/test_project/test.py
->  hello, world!

->  (base) DavidBs-MacBook-Pro:test_project david%

(Again, the arrows above are just to indicate each line of output - they will not appear in your display)
When you see the terminal prompt repeated, it means that the script has completed executing
Your path to python will be different than mine, and the % may also be $ or > depending on your system's configuration

when exceptions occur

An 'exception' is raised when Python cannot, or will not, do everything you asked in you program, or doesn't understand what the code means (e.g., because of a SyntaxError).

To demonstrate an exception, I removed one character from my code. Here is the result:

->  /Users/david/miniconda3/bin/python /Users/david/test_project/test.py
->  (base) DavidBs-MacBook-Pro:test_project david% /Users/david/miniconda3/bin/python /Users/david/test_project/test.py
->   File "/Users/david/test_project/test.py", line 2
->     print('hello, world!)
          ^
-> SyntaxError: unterminated string literal (detected at line 2)

(Again, the arrows above are just indicating where each line of output starts - they will not appear in your output.) How should we read our exception?

First, look at the CamelCase error type: SyntaxError. This tells us the category of error we are witnessing. This error type indicates there is or are character(s) missing or in the wrong place.
Second, look at the line highlighted by Python: Line 2 (the first print() statement). This is where the error occurred.
Third, ask yourself why this problem was detected on this line.
I can't emphasize enough the importance of going straight to the line indicated by Python, and determining why this error occurred there

Throughout this course I will repeatedly call upon you to identify the exception type, pinpoint the error to the line, and seek to understand the error in terms of where Python says it occurred.

the SyntaxError exception

Some element of the code is misplaced or missing.

print('hello, world!)
print()

File "/Users/david/test_project/test.py", line 2
  print('hello, world!)
        ^
SyntaxError: unterminated string literal (detected at line 2)

How do we respond to a SyntaxError? First by understanding that there's something missing or out of place in the syntax (the proper placement of language elements -- brackets, braces, parentheses, quotes, etc.) We look at the syntax on the line, and compare it to similar examples in other code that we've seen. Careful comparison between our code and working code will usually show us what's missing or misplaced. In the example above, the first print() statement is missing a quotation mark. It might be hard to see at first, but eventually you will develop "eyes" for this kind of error. pythonreference.com

writing code: comments and blank lines

Use hash marks to comment individual lines; blank lines are ignored.

 1 # this program adds numbers
 2 var1 = 5
 3 var2 = 2
 4
 5 # add these numbers together
 6 var3 = var1 + var2
 7
 8 # these lines are 'commented out'
 9 # var3 = var3 * 2
10 # var3 = var3 / 20
11
12 print(var3)

When Python reads this code, it begins by ignoring blank lines
It also ignores any code to the right of a hash mark (#)
Text marked with '#' are called 'comments'
We may want to comment in order to note something in the code (lines 1, 5 and 8), or to disable certain lines temporarily (lines 9 and 10)
It is a common practice to enable and disable some statements in our code while testing

Opening the Lab Exercises in VS Code

The exercises should be opened as a single folder in VS Code.

python_data
├── 01
├── 02
│   ├── 2.1.py
│   ├── 2.2.py
│   ├── ..etc
│   ├── 2.6_lab.py
│   ├── 2.7_lab.py
│   ├── ..etc
│   └── solutions/

make sure that your 01, 02, etc. folders are all inside the python_data folder
in VS Code, close any open project
choose File > Open Folder... and select your folder
you should see the PYTHON DATA workspace opened in the left side of your VS Code window
if you have any difficulty unzipping, finding or opening the python_data folder in VS Code, please contact your course facilitator

Using the Lab Exercises

We will use some exercises for demos in class; you will use them to practice your skills, and prepare for tests.

├── 02
│   ├── 2.1.py       <-- 'journey' exercise
│   ├── 2.2.py
│   ├── ..etc
│   ├── 2.6_lab.py   <-- 'lab' (practice) exercise
│   ├── 2.7_lab.py
│   ├── ..etc
│   └── solutions/

The exercises come in two forms:

the journey exercises (named 2.1.py, 2.2.py, etc.)
the lab exercises (named 2.6_lab.py, 2.7_lab.py, etc.)
we will work through many of the journey exercises as we discover and discuss new features of the language
you will have the opportunity to practice your skills using the lab exercises
the exercise_solutions/ folder contains all lab exerise solutions. You are encouraged to try the exercises on your own, and if you get stuck, you may consult the lab exercises.

Creating and Identifying Objects by Type

the variable

A variable is a value assigned ("assigned" or "bound") to an object.

xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)             # print 20 to screen

xx is a variable, bound to 10 = is an assignment operator assigning 10 to xx yy is another variable, bound to 2 * is a multiplication operator computing its operands (10 and 2) zz is bound to the product, 20 print() is a function that renders its argument to the screen.

the literal: a value typed into our code

early on we need to distinguish between a variable and a literal.

xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)             # print 20 to screen

can you name the 3 variables and 2 literals in this code?

next slide should be an update or continuation of this same slide, with bullet points added

the literal: a value typed into our code

early on we need to distinguish between a variable and a literal.

xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)             # print 20 to screen

can you name the 3 variables and 2 literals in this code?
variables: xx, yy, zz. These are names that have been assigned a value.
literals: 10, 2. These are values that have been typed directly into our code.

the object

An object is a data value of a particular type.

Every data value in Python is an object.

var_int = 100                  # assign integer object 100 to variable var_int

var2_float = 100.0             # assign float object 100.0 to variable var2_float

var3_str = 'hello!'            # assign str object 'hello' to variable var3_str

At every point you must be aware of the type and value of every object in your code.

object types for this session

The three object types we'll look at in this unit are int, float and str. They are the "atoms" of Python's data model.

data type	known as	description	example value
int	integer	a whole number	5
float	float	a floating-point number	5.03
str	string	a character sequence, i.e. text	'hello, world!'

sidebar: string literal syntax

The string may be bounded by 3 different quotation marks -- all produce a string.

s1 = 'hello, quote'          # single quotes
s2 = "hello, quote"          # double quotes

# triple quotes:  put quotes around multiple lines
s3 = """hello, quote
Sincerely, Python"""


s4 = 'He said "yes!"'               # using single quotes to include double quotes
s5 = "Don't worry about that."      # using double quotes to include a single quote

double and single quotes are identical in purpose and meaning
this allows us to easily put a single quote in a string (use double) or double quote in a string (use single)
style-wise, we usually prefer single quotes, but the choice is yours
triple quotes allow us to put multiple lines in a string

identifying type through syntax

The way a variable is written in the code determines type.

It's vital that we always be aware of type.

a = 5.0
b = '5.0'
c = 5

Can you identify the type of each variable? (answers in next slide)

Other languages (like Java and C) use explicit type declarations to indicate type, for example int a = 5. But Python does not do this. Instead, it relies on the syntax of the literal (whole number, floating-point, quotation marks, etc.)

the next slide should be a continuation of this one

identifying type through syntax

The way a variable is written in the code determines type.

It's vital that we always be aware of type.

a = 5.0         # float (written with a decimal point)
b = '5.0'       # str   (written with quotes)
c = 5           # int   (written as a whole number)

Other languages (like Java and C) use explicit type declarations to indicate type, for example int c = 5. But Python does not do this. Instead, it relies on the syntax of the literal (whole number, floating-point, quotation marks, etc.)

can we identify type through printing?

Printing is usually not enough to determine type, since a string can look like any object.

a = 5.0
b = '5.0'
c = 5

print(a)         # 5.0
print(b)         # 5.0
print(c)         # 5

b looks like a float, but it is a str.

identifying type through the type() function

If we're not sure, we can always have Python tell us an object's type.

a = 5.0
b = '5.0'
c = 5

print(type(a))         # <class 'float'>
print(type(b))         # <class 'str'>
print(type(c))         # <class 'int'>

exercise 2.1

python is strongly typed

This means that what an object can do is defined by its type.

a = 5            # int, 5
b = 10.0         # float, 10.0
c = '10.0'       # str, '10.0'

x = a + b        # 15.0           (adding int to float)

y = a + c        # TypeError      (cannot add int to str!)

Each type carries with it specific behaviors that are allowed, and those that are disallowed.
Even though the value '10.0' looks like a number, it is of type str. Python will not add an int to a str.

variable naming rules

You must follow correct style even though Python does not always require it.

name = 'Joe'
age = 29

my_wordy_variable = 100

student3 = 'jg61'

a variable name must use lowercase letters and the underscore
the name may include numbers, but not as the first character in the name
you must not use capital letters, although Python will accept them
within these rules, you may name your variables anything you'd like

Math and String Operators

+, -, *, /: math operators

Math operators behave as you might expect.

var_int = 5
var2_float = 10.3

var3_float = 5 + 10.3       # int plus a float:  15.3, a float

var4_float = 10.3 - 0.3     # float minus a float:  15.0, a float

var5_float = 15.0 / 3       # float divided by an int:  5.0, a float

The general rule on type is that if a float is involved, the result will be a float.
If only ints are involved, the result will be int.
However one exception is integer division: this will always return float type, regardless of remainder.

Ex. 2.2

identifying type through an operation

Every operation or function call results in a predictable type.

With two integers, the result is integer. If a float is involved, it's always float.

vari = 7
vari2 = 3
varf = 3.0

var3 = var * var2      # 35, an int.

var4 = var + var2      # 10.0, a float

However when an integer is divided into another integer, the result is always a float, even if there is no remainder.

var = 6
var2 = 3

var3 = var / var2      # 2.0, a float

division always returns float to maintain consistency with other division operations
we usually don't worry too much about ints vs. floats, because they work well together
however, it is important to always be aware of type when evaluating the result of any operation

pythonreference

** exponentiation operator

The exponentiation operator (**) raises its left operand to the power of its right operand and returns the result as a float or int.

var = 11 ** 2     # "eleven raised to the 2nd power (squared)"
print(var)        # 121

var = 3 ** 4
print(var)        # 81

% Modulus Operator

The modulus operator (%) shows the remainder that would result from division of two numbers.

var = 11 % 2      # "eleven modulo two"
print(var)        # 1   (11/2 is 5, with a remainder of 1)


var2 = 10 % 2     # "ten modulo two"
print(var2)       # 0   (10/2 is 5, with a remainder of 0)

modulus shows the remainder of a division
modulus with 2 can be useful because it shows us whether a number is even or odd

+ operator with strings: concatenation

The plus operator (+) with two strings returns a concatenated string.

aa = 'Hello, '
bb = 'World!'

cc = aa + bb     # 'Hello, World!'

Note that this is the same operator (+) that is used with numbers for summing. Python uses the type of the operands (values on either side of the operator) to determine behavior and result. Ex. 2.5

* operator with one string and one integer: string repetition

The "string repetition operator" (*) creates a new string with the operand string repeated the number of times indicated by the other operand:

aa = '!'
bb = 5

cc = aa * bb       # '!!!!!!'

Note that this is the same operator (*) that is used with numbers for multiplication. Python uses the type of the operands to determine behavior and result. Ex. 2.6

+ operator "overloading"

Object types determine behavior.

int or float "added" to int or float: addition

tt = 5            # assign an integer value to tt
zz = 10.0         # assign a float value to zz

qq = tt + zz      # compute 5 plus 10 and assign float 15.0 to qq

str "added" to str: concatenation

kk = '5'          # assign a str value (quotes mean str) to kk
rr = '10.0'       # assign a str value to rr

mm = kk + rr      # concatenate '5' and '10.0'
                  # to construct a new str object, assign to mm

print(mm)         # '510.0'

the plus operator serves double duty depending on what types are used
we call this type of behavior 'overloaded'
it also refers to an Object-Oriented term: 'polymorphism'

* operator "overloading"

Again, object types determine behavior.

int or float "multipled" by int or float: multiplication

tt = 5            # int, 5
zz = 10           # int, 10

qq = tt * zz      # int, 50 (5 * 10)
print(qq)         # 50

str "multiplied" by int: string repetition

aa = '5'          # str, '5'
bb = 3            # int, 3

cc = aa * bb      # str, '555' ('5' * 3)

once again, object types determine what is possible (and not possible) in the language
this is why it's so important to know the type of every variable

introduce the concept of labs

studying for the quizzes and the midterm and final exam

the exercises and weekly assignments are practice for the exams

the midterm and final will consist of short programs to write and test
you will be graded on the proper use of coding features
you will not be permitted to use features that have not been discussed in the course
the best way to study is to complete each of the weekly exercises
if you have limited time, you can concentrate on the lab exercises (those named as _lab.py)
make note of the features used, and how particular tasks are done
the problems given in the midterm and final will use the same features as those needed for the exercises
again, the coding assignments in the midterm and final must be done using features we have covered
you will not get credit for working programs that use outside features!

Built-In Functions

built-in functions

Built-in functions activate functionality when they are called.

aa = 'hello'        # str, 'hello'

bb = len(aa)        # pass string object aa as an argument to function len(),
                    # which returns an integer object as a return value.

print(bb)            # int, 5

All functions are called: the parentheses after the function name indicate the call.
All functions take argument(s) and return return value(s).
The argument (or comma-separated list of arguments) is placed in parentheses.
The return value of the function call can be assigned to a new variable. (It can also be printed or used in an expression.)

len() function

The len() function takes a string argument and returns an integer -- the length of (number of characters in) the string.

varx = 'hello, world!'

vary = len(varx)        # int, 13

pythonreference

round() function

The round() function takes a float argument and returns another float, rounded to the specified decimal place.

aa = 5.9583

bb = round(aa, 2)     # float, 5.96

cc = round(aa)        # int, 6

with two arguments, the 2nd argument determines the number of decimal places
with one argument, rounds to the nearest integer

float precision and the round() function

Some floating-point operations will result in a number with a small remainder:

x = 0.1 + 0.2
print(x)            # 0.30000000000000001  (should be 0.3?)

y = 0.1 + 0.1 + 0.1 - 0.3
print(y)            # 5.551115123125783e-17  (should be 0.0?)

This remainder represents the float imprecision of your computer. No binary machine is capable of calculating floating-point math with perfect precision, although many programs (like Excel) may simulate it.

The solution when using Python is to round any result:

x = 0.1 + 0.2       # 0.30000000000000001

z = round(x, 1)
print(z)            # 0.3

input() function

This function allows us to enter data into the program through the keyboard.

cc = input('enter name:  ')    # program pauses!  Now the user types something

print(cc)                      # [a string, whatever the user typed]

the input() function takes a string argument
it then displays the string, and pauses execution (i.e., it waits)
the user (the person running the program) may then enter characters from the keyboard
after the user types [Enter], input() returns a string containing the typed characters

exit() function: terminate the program

The exit() function terminates execution immediately. An optional string argument can be passed as an error message.

exit(0)             # 0 indicates a successful termination (no error)

exit('error!  here is a message')     # string argument passed to exit()
                                      # indicates an error led to termination

exit() to manipulate execution during development

This function can be used as a temporary stop to the program if we'd like to isolate some statements.

We can also use exit() to simply stop program execution in order to debug:

aa = '55'
bb = float(aa)
print('type of bb is:')
print((type(bb)))

exit()                  # we inserted this to stop the code
                        # from continuing; we'll remove it later

cc = bb * 2             # because of exit() above, this code
                        # will not be reached

int() "conversion" function

This function can convert a str or float to the int type.

# str -> int
aa = '55'
bb = int(aa)         # int, 55
print(type(bb))      # <class 'int'>

# float -> int
var = 5.95
var2 = int(var)      # int, 5: the remainder is lopped off (not rounded)

The conversion functions are named after their types -- they take an appropriate value as argument and return an object of that type.

float() "conversion" function

This function converts an int or str to the float type.

# int -> float
xx = 5
yy = float(xx)       # float, 5.0

# str -> float
var = '5.95'
var2 = float(var)    # float, 5.95

str() "conversion" function

This function converts any value to the str type.

var = 5              # int, 5
var2 = 5.5           # float, 5.5

svar = str(var)      # str, '5'
svar2 = str(var2)    # str, '5.5'

Any object type can be converted to str. ex. 2.12 - 2.16

conversion challenge: treating a string like a number

Because Python is strongly typed, conversions can be necessary.

Numeric data sometimes arrives as strings (e.g. from input() or a file). Use int() or float() to convert to numeric types.

aa = input('enter number and I will double it:  ')

print(type(aa))         # <class 'str'>

num_aa = int(aa)        # int() takes the string as an argument
                        # and returns an integer

print(num_aa * 2)       # prints the input number doubled

You can use int() and float() to convert strings to numbers.

beginner's tip: avoid improvising syntax!

Just starting out, some students improvise syntax that doesn't exist.

Imagine that would like to find the length of a string. What do you do? Some students begin writing code from memory, even though they are not completely familiar with the right syntax.

they may write something like this...

var = 'hello'

mylen = var.len()      # or mylen = length('var')
                       # or mylen = lenth(var)

...and then run it, only to get a strange error that's difficult to diagnose. The solution is to never improvise syntax. Instead, always start with an existing example.

beginner's tip: use existing examples of a feature to write new code using it

When you want to use a Python feature, you must follow an existing example !

Let's say you have a string and you'd like to get its length:

s = "this is a string I'd like to measure"

You look up the function in a reference, like pythonreference.com:

mylen = len('hello')

Then you use the feature syntax very carefully:

slen = len(s)           # int, 36

However, the code you write may be slightly different than the example code:

the variable names and/or values you use will usually be different
you may have a variable in your code where the example uses literal, or a literal in your code where the example uses a variable

review: distinguish between variables and string literals

early on we need to distinguish between a variable and a literal.

xx = 10          # int, 10
yy = 2           # int, 2

zz = xx * yy     # int, 20

print(zz)

can you name the 3 variables and 2 literals in this code?

next slide should be an update or continuation of this same slide, with bullet points added

review: distinguish between variables and string literals

early on we need to distinguish between a variable and a literal.

xx = 10          # int, 10
yy = 2           # int, 2

zz = xx * yy     # int, 20

print(zz)

can you name the 3 variables and 2 literals in this code?
variables: xx, yy, zz. These are names that have been assigned a value.
literals: 10, 2. These are values that have been typed directly into our code.
what about 20? It's the value for zz, but it is not written directly in our code, so it is not a literal.

example: confusing a string literal with a variable name

Here's an example of this common error that beginners make - try to avoid it!

Going back to our previous example - you'd like to use len() to measure this string:

s = "this is a string I'd like to measure"

You look up the function in a reference, like pythonreference.com:

mylen = len('hello')

You have been told to make your syntax match the example's. But should you do this?

slen = len('s')            # int, 1

You were expecting a length of 36, but you got a length of 1. Can you see why? The variable s points to a long string. The literal string, 's', is just a one-character string. In trying to match the example code, you may have thought you needed to also match the quotes. But keep in mind that you may be using a variable where the example code has a literal, but these two are interchangeable. The takeaway is this: anyplace a literal is used, a variable can be used instead; and anyplace a variable is used, a literal can be used instead. ex 2.17 and 2.18 illustrate not confusing literal and variable

Conditionals and Blocks; Object Methods

conditionals: if/elif/else and while

All programs must make decisions during execution.

Consider these decisions by programs you know:

text editor: does the read file exist? If no, create a new one.
ATM: is the security PIN valid? If no, display error.
website: is the email address in the proper form? If yes, submit form.
game: did the player's score beat the high score? If yes, replace high score.

Each program will decide conditional statements allow any program to make the decisions it needs to do its work.

'if' statement

The if statement executes code in its block only if the test is True.

aa = input('please enter a positive integer: ')
int_aa = int(aa)

if int_aa < 0:                          # test:  is this a True statement?
    print('error:  input invalid')      # block (2 lines) -- lines are
    exit()                              # executed only if test is True

d_int_aa = int_aa * 2
print('your value doubled is ' + str(d_int_aa))

The two components of an if statement are the test and the block. The test determines whether the block will be executed.

'else' statement

An else statement will execute its block if the if test before it was not True.

xx = input('enter a value less than 100:  ')
yy = int(xx)

if yy < 100:
    print(xx + ' is a valid number')
    print('congratulations.')

else:
    print(xx + ' is too high')
    print('please re-run and try again.')

Since else means "otherwise", we can say that only one block of an if/else statement will execute.

'elif' statement

elif is also used with if (and optionally else): you can chain additional conditions for other behavior.

zz = input('type an integer and I will tell you its sign:  ')
zyz = int(zz)

if zyz > 0:
    print('that number is positive')

elif zyz < 0:
    print('that number is negative')

else:
    print('0 is neutral')

if can be used alone, with elif, with else, or with both
else is not required when using if

the python code block

A code block is marked by indented lines. The end of the block is marked by a line that returns to the prior indent.

xx = input('enter a value less than 100:  ')      # not in any block
yy = int(xx)                                      # not in any block

if yy < 100:                               # the start of the 'if' block
    print(xx + ' is a valid number')
    print('congratulations.')              # last line of the 'if' block

else:                                      # the start of the 'else' block
    print(xx + ' is too high')
    print('please re-run and try again.')  # last line of the 'else' block

Note also that a block is preceded by an unindented line that ends in a colon.

nested blocks increase indent

Blocks can be nested within one another. A nested block (a "block within a block") is indented further to the right.

var_a = int(input('enter a number: '))
var_b = int(input('enter another number:  '))

if var_b >= var_a:                                  # 'outer' block
    print("the test was true")
    print("var b is at least as large")

    if var_a == var_b:                              # 'inner' block
        print('the two values are equivalent')

    print("in outer block, not in the inner block")  # back in 'outer' block

print('this gets printed in any case (i.e., not part of either block)')

Decision trees using 'if' and 'else' is a part of most programs.

comparison operators with numbers

>, <, <=, >= tests with numbers work as you might expect.

var = 5
var2 = 3.3

if var >= var2:
    print('var is greater or equal')

if var == var2:
    print('they are equivalent')

== with strings

With strings, this operator tests to see if two strings are identical.

var = 'hello'
var2 = 'hello'

if var == var2:
    print('these are equivalent strings')

The same 'equivalence' operator is used for numbers and strings.
Compare this to the 'polymorphic' begavior of '+' and '*'.

the in operator with strings

in with strings allows you can to see if a 'substring' appears within a string.

article = 'The market rallied, buoyed by a rise in Samsung Electronics.'

if 'Samsung' in article:
    print('Samsung was found')

in tests to see if one string can be found in another (a 'substring').
Like the other comparison operators, this one returns True or False.

and "compound" test

Python uses the operator and to combine tests: both must be True.

The and compound statement if both tests are True, the entire statement is True.

xx = input('what is your ID?  ')
yy = input('what is your pin?  ')

if xx == 'dbb212' and yy == '3859':
    print('you are a validated user')
else:
    print('you are not validated')

Note the lack of parentheses around the tests -- if the syntax is unambiguous, Python will understand. We can use parentheses to clarify compound statements like these, but they often aren't necessary. Beginners may think they need to put parentheses around some values. You should avoid parentheses wherever you can.

or "compound" test

Python uses the operator or to combine tests: either can be True for the entire expression to be True.

aa = input('please enter "q" or "quit" to quit: ')

if aa == 'q' or aa == 'quit':
    exit()

print('continuing...')

Again, note the lack of parentheses around the tests -- if the syntax is unambiguous, Python will understand. We can use parentheses to clarify compound statements like these, but they often aren't necessary. Beginners may think they need to put parentheses around some values. You should avoid parentheses wherever you can.

testing a variable against two values

Both sides of an 'or' or 'and' must be complete tests.

if aa == 'q' or aa == 'quit':          # not "if aa == 'q' or 'quit'""
    exit()

Note the 'or' test above -- we would not say if aa == 'q' or 'quit'; this would always succeed (for reasons discussed later).

testing a variable against multiple values

We can also test a variable against multiple values by using in with a list (more on lists next week):

if aa in ['q', 'quit']:
    exit()

negating an if test with not

You can negate a test with the not keyword.

var_a = 5
var_b = 10

if not var_a > var_b:
    print("var_a is not larger than var_b (well - it isn't).")

Of course this particular test can also be expressed by replacing the comparison operator > with <=, but when we learn about new True/False condition types we'll see how this operator can come in handy.

boolean (bool) values True and False

True and False are boolean values (type bool), and are produced by expressions that can be seen as True or False.

aa = 3
bb = 5

if aa > bb:
    print("that is true")

Tests are actually expressions that resolve to True or False, which are values of boolean type:

var = 5
var2 = 10
xx = (5 > 3)
print(xx)            # True
print(type(xx))      # <class 'bool'>

Note that we would almost never assign comparisons like these to variables, but we are doing so here to illustrate that they resolve to boolean values. ex 3.1 - 3.9

The while Block Statement and Looping Blocks

the concept of incrementing

Incrementing means increasing by one.

x = 0         # int, 0

x = x + 1     # int, 1
x = x + 1     # int, 2     (can also say x += 1)
x = x + 1     # int, 3

print(x)      # 3

For each of the three incrementing statements above, a new value that equals the value of x is created, and then assigned back to x.
The previous value of x is replaced with the new, incremented value.
Incrementing is most often used for counting within loops -- see next.

while looping block

A while block with a test causes Python to loop through a block repetitively, as long as the test is True.

This program prints each number between 0 and 4:

cc = 0                 # initialize a counter

while cc < 5:          # if test is True, enter the block; if False, drop below
    print(cc)
    cc = cc + 1        # increment cc:  add 1 to its current value

    # WHEN WE REACH THE END OF THE BLOCK,
    # JUMP BACK TO THE while TEST

print('done')

The block is executing the print() and cc = cc + 1 lines multiple times - again and again until the test becomes False. Of course, the value being tested (cc) must change as the loop progresses - otherwise the loop will cycle indefinitely (infinite loop).

understanding while looping blocks

while loop statements have 3 components: the test, the block, and the automatic return.

cc = 10

while cc > 0:         # the TEST (if True, enter the block)

       print(cc)      # the BLOCK (execute as regular Python statements)
       cc = cc - 1

       # the AUTOMATIC RETURN [invisible!]
       # (at end of block, go back to the test)

print('done')

can you tell just from reading what this code prints?
to do so, you'll need to keep track of two things simultaneously: program flow (i.e., which line is executed next, then which after that, etc.) and the changing value of cc, and calculate its changes as the code block is executed repetitively
you must be able to "execute" this code in your head
this takes some practice, but isn't complicated

loop control: break

break is used to exit a loop regardless of the test condition.

xx = 0
print('Hello, User')

while xx < 10:

    answer = input("do you want the loop to break? ")

    if answer == 'y':
        break                  # drop down below the block

    xx = xx + 1
    print('I have now greeted you ' + str(xx) + ' times')


print("ok, I'm done")

loop control: continue

The continue statement jumps program flow to next loop iteration.

x = 0

while x < 10:

    x = x + 1

    if x % 2 != 0:             # will be True if x is odd
        continue               # jump back up to the test and test again

    print(x)

Note that print(x) will not be executed if the continue statement comes first. Can you figure out what this program prints?

the while True looping block

while with True and break provide us with a handy way to keep looping until we wish to stop, and at any point in the block.

while True:

    var = input('please enter a positive integer:  ')

    if int(var) > 0:
        break

    else:
        print('sorry, try again')


print('thanks for the integer!')

Note the use of True in a while expression: since True is always True, the if test will be always be True, and will cause program flow to enter (and re-enter) the block every time execution returns to the top. Therefore the break statement is essential to keep this block from looping indefinitely. ex 3.17 - 3.22

debugging loops: the "fog of code"

What do we do when we get bad output, but with no error messages?

The output of the code should be the sum of all numbers from 0-10 (i.e. 55), but instead it is 10:

revcounter = 0
while revcounter < 10:

    varsum = 0
    revcounter = revcounter + 1
    varsum = varsum + revcounter

    print("loop iteration complete")
    print("revcounter value: ", revcounter)
    print("varsum value: ", varsum)
    input('pausing...')
    print()
    print()

print(varsum)                         # 10

Why is it not working? You may see it right away, but I'd like you to imagine that this code is a lot more complicated, such that it won't be easy to see the reason. And you may be tempted to tinker with the code to see whether you can get the correct output, but it's important to understand that we need to be more methodical. We do this with print() statements.

Object Methods

object methods

Objects are capable of behaviors, which are expressed as methods.

var = 'Hello, World!'

var2 = var.replace('World', 'Mars')      # replace substring, return a str

print(var2)                              # Hello, Mars!

Methods are type-specific functions that are used only with a particular type. They work with the object itself, and any values passed as arguments.
We commonly say "call the x method on this object".
Methods use the object itself as its implied first value. Any action the method takes will make use of, or modify, the object on which it is called.

methods vs. functions

Compare method syntax to function syntax.

mystr = 'HELLO'

x = len(mystr)            # function len() (stands alone)

y = mystr.count('L')      # method .count() (attached to the string variable)

Methods and functions are both called (using the parentheses after the name of the function or method). Both also may take an argument and/or may return a return value.

string method: .upper()

This "transforming" method returns a new string with a string's value uppercased.

var = 'hello'
newvar = var.upper()        # str, 'HELLO'

print(newvar)               # 'HELLO'

This method does not take an explicit argument, because it works with the string object itself.

string method: .lower()

This "transforming" method returns a new string with a string's value lowercased.

var = 'Hello There'
newvar = var.lower()        # str, 'hello there'

print(newvar)               # 'hello there'

This method does not take an explicit argument, because it works with the string object itself.

string method: .replace()

This "transforming" method returns a new string based on an old string, with specified text replaced.

var = 'My name is Marie'

newvar = var.replace('Marie', 'Greta')    # str, 'My name is Greta'

print(newvar)                             # My name is Greta

This method takes two arguments, the search string and replace string.

string method: .isdigit()

This "inspector" method returns True if a string is all digits.

mystring = '12345'
if mystring.isdigit():
    print("that string is all numeric characters")

if not mystring.isdigit():
    print("that string is not all numeric characters")

Since it returns True or False, inspector methods like isdigit() are used in an if or while expression. To test the reverse (i.e. "not all digits"), use if not before the method call.

string method: .endswith()

This "inspector" method returns True if a string starts with or ends with a substring.

bb = 'This is a sentence.'
if bb.endswith('.'):
    print("that line had a period at the end")

string method: .startswith()

This "inspector"method returns True if the string starts with a substring.

cc = input('yes? ')
if cc.startswith('y') or cc.startswith('Y'):
    print('thanks!')
else:
    print("ok, I guess not.")

string method: .count()

This "inspector" method returns a count of occurrences of a substring within a string.

aa = 'count the substring within this string'
bb = aa.count('in')
print(bb)             # 3 (the number of times 'in' appears in the string)

string method: .find()

This "inspector" method returns the character position of a substring within a string.

xx = 'find the name in this string'
yy = xx.find('name')
print(yy)             # 9 -- the 10th character in mystring

ex. 3.27 - 3.28

f'' strings for string formatting

An f'' string allows us to embed any value (such as numbers) into a new, completed string.

aa = 'Jose'
var = 34

bb = f'{aa} is {var} years old.'

print(bb)                                  # Jose is 34 years old.

f'' strings are the preferred way to combine strings with numbers
you do not need to convert or concatenate
it's also recommended not to use commas, i.e. print(aa, 'is', var); this is best use for diagnostic purposes, but not final output

Ex. 3.29

f'' string format codes

There are numerous options for justifying, formatting numbers, and more.

overview of formatting

# text padding and justification
# :<15     # left justify width
# :>10     # right justify width
# :^8      # center justify width

# numeric formatting
:f         # as float (6 decimal places)
:.2f       # as float (2 decimal places)
:,         # 000's comma separators
:,.2f      # 000's comma separators with float rounded to 2 places

f'' string format code examples

There are even more options, you can search online for details.

examples

x = 34563.999999

f'hi:  {x:<30}'      # 'hi:  34563.999999                  '

f'hi:  {x:>30}'      # 'hi:                    34563.999999'

f'hi:  {x:^30}'      # 'hi:           34563.999999         '

f'hi:  {x:f}'        # 'hi:  34563.999999'

f'hi:  {x:.2f}'      # 'hi:  34564.00'

f'hi:  {x:,}'        # 'hi:  34,563.999999'

f'hi:  {x:,.2f}'     # 'hi:  34,564.00'

Please note that f'' strings are available only as of Python 3.6. ex 3.29

sidebar: method and function return values in an expression; combining expressions

The return value of an expression can be used in another expression.

letters = "aabbcdefgafbdchabacc"

vara = letters.count("a")         # 5

varb = len(letters)               # 20

varc = vara / varb                # 5 / 20, or 0.25

vard = varc * 100                 # 25


print(len(letters) / letters.count("a") * 100)  # statements combined

the first 4 statements calculate the percentage of a's in the string
the last statement does these same operations in one statement
combining statements is optional, but can be fun
shorter code is usually better, as long as it is clear

a note on style in your homework submissions

Professional coders respect good style because it makes code easier to read.

good and proper style is important within the developer community
proper style is easier to read
it is also the mark of professionalism
code with bad style just looks wrong to an experienced developer
therefore if you don't employ good style, your code appears amateurish
a style guide is included in your supplementary materials
the guide covers the bare basics of well styled code
if you get into the right habit now, you'll always be creating profesionally-styled code
if you are interested in even more style conventions, check out the document online called PEP 8 (Python style guide)

Data Parsing & Extraction: String Methods

our first data format: csv

The CSV format will allow us to explore Python's text parsing tools.

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010

data is commonly organized in tabular form: columns and rows
examples: Excel spreadsheet, CSV file, relational database
the CSV stands for "comma-separated values"
CSV is used throughout the world to post, transmit and store data
in this lesson we will 'parse' CSV data (i.e., divide into usable pieces)
in the process, we'll learn Python's tools for reading file data and parsing strings
much of the data we are called upon to work with comes to us as strings

CSV structure: "fields" and "records"

Tables consist of records (rows) and fields (column values).

Tabular text files are organized into rows and columns.

comma-separated values file (CSV)

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010
    19270104,0.30,0.15,0.73,0.010
    19280103,0.43,0.90,0.20,0.010
    19280104,0.14,0.47,0.01,0.010

space-separated values file

    19260701   -0.09    0.22    0.30   0.009
    19260702    0.44    0.35   -0.08   0.009
    19270103    0.97   -0.21    0.24  -0.010
    19270104    0.30   -0.15    0.73   0.010
    19280103   -0.43    0.90    0.20   0.010
    19280104    0.14    0.47    0.01  -0.010

note the delimiters may be commas, colons, tabs, or any other non-alphanumeric character
in addition, the delimiter may be "spaces", in other words multiple space
the delimiter is necessary to maintain the structure, but also must be removed during parsing
our job will be to turn the CSV into "fields", i.e. separated data values on each line

presentation note: ask student to name the two structural characters

table data in text files

Text files are just sequences of characters. Commas and newline characters separate the data.

If we print a CSV text file, we may see this:

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010
    19270104,0.30,0.15,0.73,0.010
    19280103,0.43,0.90,0.20,0.010
    19280104,0.14,0.47,0.01,0.010

However, here's what a text file really looks like under the hood:

19260701,0.09,0.22,0.30,0.009\n19260702,0.44,0.35,0.08,
0.009\n19270103,0.97,0.21,0.24,0.010\n19270104,0.30,0.15,
0.73,0.010\n19280103,0.43,0.90,0.20,0.010\n19280104,0.14,
0.47,0.01,0.010

the newline character separates the records in a CSV file
the delimiter (in this case, a comma) separates the fields
the newline character is actually a "printable" character: it is a signal to your printer or display program to drop down a line and continue to print or display characters on the next line

tabular data: looping, parsing and summarizing

Looping through file line strings, we can split and isolate fields on each line.

The process:

1. Open the file for reading.

fh = open('myfile.csv')

2. Use a for loop to read each line of the file, one at a time. Each line will be represented as a string.

for line in fh:

3. Remove the newline from the end of each string with .rstrip

    line = line.rstrip()

4. Divide (using .split()) the string into fields.

    fields = line.split(',')

5. Read a value from one of the fields, representing the data we want.

    val = fields[4]

6. As the loop progresses, build a sum of values from each line.

    mysum = mysum + float(val)

We will begin by reviewing each feature necessary to complete this work, and then we will put it all together.

string method: .rstrip()

This method can remove any character, or whitespace from the right side of a string.

When no argument is passed, the newline character (or any "whitespace" character) is removed from the end of the line:

line_from_file = 'jw234,Joe,Wilson\n'

stripped = line_from_file.rstrip()      # str, 'jw234,Joe,Wilson'

When a string argument is passed, that character is removed from the end of the ine:

line_from_file = 'I have something to say.'

stripped = line_from_file.rstrip('.')   # str, 'I have something to say'

Whitespace characters are any characters that don't print directly, but we may see their presence: space, tab, or newline characters are whitespace.

string method: .split() with a delimiter

This method divides a delimited string into a list.

line_from_file = 'jw234:Joe:Wilson:Smithtown:NJ:2015585894\n'

xx = line_from_file.split(':')

print(xx)                         # ['jw234', 'Joe', 'Wilson',
                                  #  'Smithtown', 'NJ', '2015585894\n']

When we pass a delimiter string like the colon, ':', Python steps through the string character-by-character, looking for that character. When it finds it, it determines that all characters leading up to it are a new item in the resulting list.
It then continues searching the string for the next instance of the delimiter. Each delimiter marks the end of another item in the resulting list.
A list is an object that can contain a sequence of other objects. We'll learn more about lists in the next lesson.

string method: .split() without a delimiter

We can also thing of a string as delimited by spaces.

gg = 'this is a file    with    some     whitespace'

hh = gg.split()                   # splits on any "whitespace character"

print(hh)                         # ['this', 'is', 'a', 'file',
                                  #  'with', 'some', 'whitespace']

If no delimiter is supplied, the string is split on whitespace.
Whitespace characters are any characters that don't print directly, but we may see their presence: space, tab, or newline characters are whitespace.
Also note that all whitespace is removed - any consecutive spaces are treated as one.

ex 4.1 - 4.2 (skipping 4.3, 4.4, slicing)

Data Parsing & Extraction: List Operations and String Slicing

lists and list subscripting

Subscripting allows us to select individual items from a list.

fields = ['jw234', 'Joe', 'Wilson', 'Smithtown', 'NJ', '2015585894']

var = fields[0]           # 'jw234', 1st item
var2 = fields[4]          # 'NJ', 3rd item
var3 = fields[-1]         # '2015585894' (-1 means last item)

the list is a sequence of objects of any type (here they are strings)
subscripting means accessing an individual item within the list
square brackets specify an item index, starting at 0
the last index can also be specified using -1
indices count from the beginning at 0, or from the end at -1

Ex. 4.5

list slicing

Slicing allows us to select multiple items from a list.

letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
first_four = letters[0:4]
print(first_four)                     # ['a', 'b', 'c', 'd']

# no upper bound takes us to the end
print(letters[5:])                    # ['f', 'g', 'h']

Here are the rules for slicing:

the first index is 0
the lower bound is the 1st item to be included
the upper bound is one higher the last item to be included
no upper bound means "to the end"

string slicing

Slicing a string selects characters the way that slicing a list selects items.

mystr = '20140313 15:33:00'
year =  mystr[0:4]               # '2014'
month = mystr[8:10]              # '03'
day =   mystr[10:12]             # '13'

Again, please review the rules for slicing:

the first index is 0
the lower bound is the 1st item to be included
the upper bound is one higher the last item to be included
no upper bound means "to the end"

now can go back to 4.3, 4.4

the IndexError exception

An IndexError exception indicates use of an index for a list item that doesn't exist.

mylist = ['a', 'b', 'c']

print(mylist[5])            # IndexError:  list index out of range

Since mylist does not contain a sixth item (i.e., at index 5), Python tells us it cannot complete this operation.

Data Parsing & Extraction: File Operations and the for Looping Statement

the for loop block statement with a list

for with a list repeats its block as many times as there are items in the list.

mylist = [1, 2, 'b']

for myvar in mylist:     # myvar = next(mylist)   (i.e., <B>1</B>)
    print(myvar)         # 1
    print('===')         # ===
print('done')

The above code produces this output:

# 1
# ===
# 2
# ===
# b
# ===
# done

Similar to a while block, the for looping block statement repeats the contents of its block multiple times (looping).
However, the for block will repeat only as many times as there are items in the list.
myvar is called the control variable.
The control variable is reassigned the next value from the list for each iteration of the loop.
This means that if the list has 3 items, the loop executes 3 times and myvar is reassigned a new value 3 times.
Special note: the variable myvar may be given any name - it is a variable like any other that you might create and use.

Ex. 4.12

review: the concept of incrementing

We reassign the value of an integer to effect an incrementing.

x = 0         # int, 0

x = x + 1     # int, 1
x = x + 1     # int, 2     (can also say x += 1)
x = x + 1     # int, 3

print(x)      # 3

For each of the three incrementing statements above, a new value that equals the value of x is created, and then assigned back to x.
The previous value of x is replaced with the new, incremented value.
Incrementing is most often used for counting within loops -- see next.

using a for loop to count list items

An integer, incremented once for each iteration, can be used to count iterations.

mylist = [1, 2, 'b']

my_counter = 0

for thisvar in mylist:
    my_counter = my_counter + 1

print(f'count:  {my_counter} items')   # count:  3 items

The value of my_counter is initialized at 0 before the loop begins.
Then, since the incrementing line my_counter = my_counter + 1 is inside the looping block, the value of my_counter goes up once with each iteration.
(Please note that the len() function can count list items more efficiently, but we are using a counter to demonstrate the counter technique, which can be used in situations where len() can't be used -- as when looping through a file -- discussed shortly.)

using a for loop to sum list items

A float value, updated for each iteration, can be used to sum up the values that it encounters with each iteration.

mylist = [1, 2, 3]

my_sum = 0

for val in mylist:
    my_sum = my_sum + val

print(f'sum:  {my_sum}')     # sum: 6  (value of 1 + 2 + 3)

The value of my_sum is initialized at 0 before the loop begins.
Then, since the incrementing line my_sum = my_sum + val is inside the looping block, the value of my_sum goes up once with each iteration.
(Please note that the sum() function can count list items more efficiently, but we are using a summing variable to demonstrate the summing technique, which can be used in situations where sum() can't be used, as when we are summing values from a file -- discussed shortly.)

4.12 - 4.13

the open() function and the 'file' object

The 'file' object represents a connection to a file that is saved on disk.

fh = open('students.txt')     # a 'file' object

print(type(fh))               # <class '_io.TextIOWrapper'>

The open() function causes Python to ask the operating system to access the file.
If the file exists and is readable, the operating system will create a connection to the file and give Python access to it.
We will be able to use this object to read the file data into our program.
(The actual type of the file is _io.TextIOWrapper, but we will call it a 'file' object.)

reading a file with the for statement

for with a 'file' object repeats its block as many times as there are lines in the file.

fh = open('students.txt')              # file object allows looping
                                       # through a series of strings

for xx in fh:                          # xx is a string, a line from the file;
    print(xx)                          # this prints each line of students.txt

fh.close()                             # close the file

xx is the control variable, and it is automatically assigned each line in the file, as a string.
Again, the control variable xx is reassigned for each iteration of the loop.
This means that if the file has 5 lines, the loop executes 5 times and xx is reassigned a new value (a new line of the file) 5 times.
break and continue work with for as well as while loops.

Ex. 4.14

summarizing: csv parsing with for looping and string parsing

Here we put together all features learned in this session.

fh = open('revenue.csv')          # 'file' object

counter = 0
summer = 0.0

for line in fh:                   # str, "Haddad's,PA,239.50\n"  (first line from file)

    line = line.rstrip()          # str, "Haddad's,PA,239.50"
    fieldlist = line.split(',')   # list, ["Haddad's", 'PA', '239.50']

    rev_val = fieldlist[2]        # str, '239.50'
    f_rev = float(rev_val)        # float, 239.5

    counter = counter + 1         # incrementing once for each iteration
    summer = summer + f_rev       # adding the value found at each iteration to a sum

fh.close()

print(f'counter:  {counter}')     # 7 (number of lines in file)
print(f'summer:   {summer}')      # 662.01000001  (sum of all 3rd col values in file)

This example puts together everything we learned in this session.
Each line is a string, which gets stripped, split into fields and then the last item in the line converted to float.
We then use a summing variable to sum up the values found on each line.
If we wish at the end we can derive an average value by dividing summer by counter.
(Note that the tiny remainder is expected, and can be rounded to 2 places.)

Ex 4.28

sidebar: writing and appending to files using the file object

Files can be opened for writing or appending; we use the 'file' object and the file .write() method.

fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()

Note that we are explicitly adding newlines to the end of each line -- the write() method doesn't do this for us.

Containers: More List Operations

using containers to collect data

Containers are Python objects that can contain other objects.

a container is an object that can contain other objects
we collect values (numbers and strings) from a data source and store them in a container to manipulate and analyze the data
the four Python containers are list, tuple, set and dict
each one stores data in a different way that makes it convenient for us to manipulate and analyze

containers allow for manipulation and analysis

Once collected, values in a container can be sorted or filtered (i.e. selected) according to whatever rules we choose. A collection of integer or floating-point values offers many opportunities for analysis. We can calculate:

the median (the "middle" value in a sorted list)
the standard deviation (the average distance of the values from an average value)
the top 5 or bottom 3, and the average of those
the dividing of values into "quartiles" or "percentiles"

A collection of string values allows us to perform text analysis:

the frequency of a word
the position of a word within a text
whether a word in one collection is present in another collection

container object summary : list, set, tuple

Compare and contrast the characteristics of each container.

mylist =  ['a', 'b', 'c', 'd', 1, 2, 3]

mytuple = ('a', 'b', 'c', 'd', 1, 2, 3)

myset =   {'a', 'b', 'c', 'd', 1, 2, 3}

mydict =  {'a': 1, 'b': 2, 'c': 3, 'd': 4}

list: ordered, mutable sequence of objects tuple: ordered, immutable sequence of objects set: unordered, mutable, unique collection of objects dict: unordered, mutable collection of object key-value pairs, with unique keys (discussed upcoming)

Note the different brackets used for each container (dicts are a little more elaborate).
This is just an overview, to show you where we are going -- no need to learn these characteristics now.

review: the list container object

A list is an ordered sequence of values.

var = []                     # initialize an empty list

var2 = [1, 2, 3, 'a', 'b']   # initialize a list of values

the list is the most commonly used container
it stores values (items) in a sequence
it allows us to access an item by position
lists can be sorted, sliced and looped through ('for' loop, coming up)

review: subscripting a list

Subscripting allows us to read individual items from a list.

mylist = [1, 2, 3, 'a', 'b']       # list

xx = mylist[2]                     # 3

yy = mylist[-1]                    # 'b'

indexing starts at 0
so index 1 is the 2nd item, index 2 is the 3rd item, etc.
indexing can also be counted from the end, at -1
so index -1 is the last item, index -2 is the 2nd to last item, etc.

review: slicing a list

Slicing a list returns a new list.

var2 = [1, 2, 3, 'a', 'b']            # list

sublist1 = var2[0:3]                  # [1, 2, 3]

sublist2 = var2[2:4]                  # [3, 'a']

sublist3 = var2[3:]                   # ['a', 'b']

Remember the rules of slicing:

indexing begins at 0, so 0 is the first item
the "upper bound" (2nd integer) is an index one greater than the item that will be returned (non-inclusive)
to slice off the end, leave off the upper bound

in operator: finding an item within a list

The in operator returns True if an item is in the list.

mylist = [1, 2, 3, 'a', 'b']             # list

if 'b' in mylist:                        # this is True for mylist
    print("'b' can be found in mylist")

print('b' in mylist)                     # True:  the 'in' operator
                                         # actually returns True or False

Ex. 5.1

summary functions: len(), sum(), max(), min()

Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?

mylist = [1, 3, 5, 7, 9]        # list

print(len(mylist))               # 5 (count of items)
print(sum(mylist))               # 25 (sum of values)
print(min(mylist))               # 1 (smallest value)
print(max(mylist))               # 9 (largest value)

sorting a list

sorted() returns a new list of sorted values.

mylist = [4, 9, 1.2, -5, 200, 20]

smyl = sorted(mylist)              # list, [-5, 1.2, 4, 9, 20, 200]

Ex. 5.2

concatenating two lists with +

List concatenation works in the same way as it does with strings.

var = ['a', 'b', 'c']
var2 = ['d', 'e', 'f']

var3 = var + var2            # list, ['a', 'b', 'c', 'd', 'e', 'f']

adding (appending) an item to a list

var = []

var.append(4)                # Note well! call is not assigned
var.append(5.5)              # list is changed in-place

print(var)                    # [4, 5.5]

It is the nature of a list to hold these items in order as they were added.
Note that the list is changing in-place
The call to .append() is not assigned back to var.

5.11

the AttributeError exception

An AttributeError exception occurs when calling a method on an object type that doesn't support that method.

mylines = ['line1\n', 'line2\n', 'line3\n']

mylines = mylines.rstrip()         # AttributeError:
                                   # 'list' object has no attribute 'rstrip'

Debugging:

the problem with the last line here is that .rstrip() is a string method, but it's being called on a string
most AttributeError exceptions are raised when we call a method on the wrong object
when you get an AttributeError, check the object type and whether that type supports the method

Understanding the name AttributeError:

although .rstrip() is a method, we can refer to it more generally as an attribute
an attribute is any name that appears after an object and a dot (object.attribute).
the attribute is often a method, though it may point at any type of object (str, int, list, etc.)

the AttributeError when using .append()

This exception may sometimes result from a misuse of the append() method -- it should not be assigned to any variable.

mylist = ['a', 'b', 'c']

# oops:  returns None -- call to append() should not be assigned
mylist = mylist.append('d')

mylist = mylist.append('e')        # AttributeError:  'NoneType'
                                   # object has no attribute 'append'

since .append() isn't designed to return an object, it returns the default value None
(None is a value that means 'null' or 'empty')
.append() returns None because it wasn't designed to return a value -- None is its way of saying 'nothing here')
if we assign to the list variable, it replaces the list with None
the next time we try to use .append() we are attempting to call it on None
Python's error message is actually saying "the None object doesn't have an .append() method"

the correct use of .append()

Just remember that we don't assign from .append().

mylist = ['a', 'b', 'c']

mylist.append('d')                 # now mylist equals ['a', 'b', 'c', 'd']

because .append() does not return a useable value, we should not assign from the call
simply call the method and understand that the list will change in-place
the list is known to be mutable, meaning that it can be changed

sidebar: removing a container item

There are a number of additional list methods to manipulate a list, though they are less often used.

mylist = ['a', 'hello', 5, 9]

popped = mylist.pop(0)      # str, 'a'
                            # (argument specifies the index  of the item to remove)

mylist.remove(5)            # remove an item by value
print(mylist)               # ['hello', 9]

mylist.insert(0, 10)
print(mylist)               # [10, 'hello', 9]

Containers: Tuples and Sets

tuples and sets: like lists but different

It's helpful to contrast these containers with lists.

tuples are like lists, but read-only
sets are like lists, but the items are unordered and unique (no duplicates in a set)

It's easy to remember how to use one of these containers by considering how they differ in behavior.

the tuple container object

A tuple is an immutable, ordered sequence of values.

var2 = (1, 2, 3, 'a', 'b')     # initialize a tuple of values

immutable means the tuple cannot be changed once initialized
the best way to think about tuples is that they are identical to lists in every way, except that they are read-only -- they can't be changed
any value can be included within a tuple
as with lists, items in a tuple do not need to be of the same type

subscripting a tuple

Subscripting allows us to read individual items from a tuple.

mytuple = (1, 2, 3, 'a', 'b')       # initialize a tuple of values

xx = mytuple[3]                     # 'a'

Note that as with lists, indexing starts at 0, so index 1 is the 2nd item, index 2 is the 3rd item, etc.

slicing a tuple

Slicing a tuple returns a new tuple.

var2 = (1, 2, 3, 'a', 'b')             # initialize a tuple of values

subtuple1 = var2[0:3]                  # (1, 2, 3)

subtuple2 = var2[2:4]                  # (3, 'a')

subtuple3 = var2[3:]                   # ('a', 'b')

Remember the rules of slicing, which are the same as lists and strings:

indexing begins at 0, so 0 is the first item
the "upper bound" (2nd integer) is an index one greater than the item that will be returned (non-inclusive)
to slice off the end, leave off the upper bound

concatenating two tuples with +

Concatenation works in the same way as with lists and strings.

var = ('a', 'b', 'c')
var2 = ('d', 'e', 'f')

var3 = var + var2                  # ('a', 'b', 'c', 'd', 'e', 'f')

Ex. 5.12

set container object

A set is an unordered, unique collection of values.

myset = set()                  # initialize an empty set (note that empty
                               # curly braces are reserved for dicts)

myset = {'a', 9999, 4.3, 'a'}  # initialize a set with items

print(myset)                   # {9999, 4.3, 'a'}

note that the set has changed the order of items (this may vary from time to time)
note also that the duplicate 'a' has been eliminated
this illustrates the two salient characteristics of a set: items are unordered and unique

adding an item to a set

The set changes in place; any duplicate item will be ignored.

myset = set()        # initialize an empty set

myset.add(4.3)       # note well!  we do not assign back to myset
myset.add('a')
myset.add('a')

print(myset)         # {'a', 4.3}    (order is not
                     #                necessarily maintained)

Note that 'a' was added twice, but only appears once.
Also note that we do not assign back to myset -- this is because sets are also mutable, so the .add() changes the set in-place.

getting information about a set or tuple

Here are len() and in with a tuple.

# get the length of a set or tuple (compare to len() of a list or string)
myset = {1, 2, 3, 'a', 'b'}

yy = len(myset)                # 5


# test for membership in a set or tuple
mytuple = (1, 2, 3, 'a', 'b')

if 'b' in mytuple:                        # bool, True
    print("'b' can be found in mytuple")

print('b' in mytuple)                     # "True":  the 'in' operator
                                          # actually returns True or False

note that these operations are identical as for a list
(Python tries as much as may be possible to unify operations between similar objects)

looping through a set or tuple

The 'for' loop allows us to traverse a set or tuple and work with each item.

mytuple = (1, 2, 3, 'a', 'b')            # could also be a set here

for var in mytuple:
    print(var)                           # prints 1, then 2, then 3,
                                         # then a, then b

Whether a list, set or tuple, these operations work in the same way.

summary functions: len(), sum(), max(), min()

These functions also work as they do with lists.

Whether a set or tuple, these operations work in the same way.

mytuple = (1, 3, 5, 7, 9)       # initialize a tuple
myset =   {1, 3, 5, 7, 9}       # initialize a set

print(len(mytuple))             # 5  (count of items)
print(sum(myset))               # 25 (sum of values)

print(min(myset))               # 1 (smallest value)
print(max(mytuple))             # 9 (largest value)

sorting a set or tuple

Regardless of type, sorted() returns a list of sorted values.

mytuple = (4, 9, 1.2, -5, 200, 20)       # could also be a set here

smyl = sorted(mytuple)                   # [-5, 1.2, 4, 9, 20, 200]

Whether a list, tuple or set, these operations work in the same way.
Note that unlike other operations discussed here, sorted() returns a list. This is always the case sorting any object.

Ex. 5.13

why do we need sets?

The set's duplicate elimination behavior gives us certain advantages.

As we saw, sets have 2 important characteristics:

all items are unique
items are not in any particular order

How can we use a set?

eliminate duplicates from repeating data
checking for membership with 'in' (is it there?)

Building Up Containers from File

introduction: building up containers from file

This technique forms the core of much of what we do in Python.

In order to work with data, the usual steps are:

read a data source, such as file, database, or network response
select the data that we'd like to work with
add that data to one or more containers
use the container to analyze the data
use the container to produce new data or new values
store the results in another data source

We call this process Extract-Transform-Load, or ETL. ETL is at the heart of what core Python does best.

looping through a data source and building up a list

Similar to the counting and summing algorithm, this one collects values instead.

build a list of company names

company_list = []                        # empty list
fh = open('revenue.csv')                 # 'file' object

for line in fh:                          # str, 'Haddad's,PA,239.50\n'
    line = line.rstrip()                 # str, 'Haddad's,PA,239.50'

    items = line.split(',')              # list, ["Haddad's", 'PA', '239.50']

    company_list.append(items[0])        # list, ["Haddad's"]


print(company_list)   # ["Haddad's", 'Westfield', 'The Store', "Hipster's",
                      #  'Dothraki Fashions', "Awful's", 'The Clothiers']

fh.close()

Ex. 5.14 - 5.15

looping through a data source and building up a unique set

This program uses a set to collect unique items from repeating data.

state_set = set()                       # empty set
fh = open('revenue.csv')                # 'file' object

for line in fh:                         # str, 'Haddad's,PA,239.50'

    items = line.split(',')             # list, ["Haddad's", 'PA', '239.50']
    state_set.add(items[1])             # set, {'PA'}

print(state_set)       # set, {'PA', 'NY', 'NJ'}   (your order may be different)

chosen_state = input('enter a state:  ')

if chosen_state in state_set:
   print(f'{chosen_state} found in the file')
else:
    print(f'{chosen_state} not found')

fh.close()

the state value in the 2nd column has multiple state values repeated
if our only purpose is to know which states appear in the file, we can simply add all state values to the set and know that duplicates will be removed

5.22 & 5.23

reading a file with with

A file is automatically closed upon exiting the 'with' block.

A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.

with open('pyku.txt') as fh:
    for line in fh:
        print(line)

# At this point (once outside the with block), filehandle fh
# has been closed.  There is no need to call fh.close().

it's considered "good housekeeping" to close a file as soon as we are done with it
files are usually closed by calling the .close() method on the file object
however, we may easily forget to close our files
the with block allows us to open a file and have it automatically close when we leave the block

However, we should understand the minimal cost of not closing our files:

a file opened for reading does not block others from opening it
there are minimal resources used in keeping a file open
anytime a script exits, its open files are closed automatically

A file open for writing should be closed as soon as possible. The data may not appear in the file until it has been closed. 4.15

slicing and dicing a file: the line, word, character count (1/3)

Once we have read a file as a single string, we can "chop it up" any way we like.

# read(): file text as a single strings
fh = open('guido.txt')          # 'file' object
text = fh.read()                # read() method called on
                                # file object returns a string

fh.close()                      # close the file

print(text)
print(len(text))                 # 207 (number of characters in the file)

    # single string, entire text:

    # 'For three months I did my day job, \nand at night and
    #  whenever I got a \nchance I kept working on Python.  \n
    #  After three months I was to the \npoint where I could
    #  tell people, \n"Look here, this is what I built."'

once the file is read as a string, we can do all kinds of string operations
'in' (to find a substring)
.replace()
.count(), etc.
we can also process the string data further -- see next

slicing and dicing a file: splitting a string into words (2/3)

String .split() on a whole file string returns a list of words.

file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built." """

words = file_text.split()      # split entire file on whitespace (spaces or newlines)

print(words)
    # ['For', 'three', 'months', 'I', 'did', 'my', 'day', 'job,',
    #  'and', 'at', 'night', 'and', 'whenever', 'I', 'got', 'a',
    #  'chance', 'I', 'kept', 'working', 'on', 'Python.', 'After',
    #  'three', 'months', 'I', 'was', 'to', 'the', 'point', 'where',
    #  'I', 'could', 'tell', 'people,', '“Look', 'here,', 'this',
    #  'is', 'what', 'I', 'built.”']

print(len(words))       # 42 (number of words in the file)

the "triple-quoted string" above is also called a "multi-line" string
it is the same data form that we get from the file .read() method, previously
.split() splits on whitespace, which separates each word
(newlines, which separate each line, are also considered whitespace)
we now have the opportunity to count the words, sort the words, subscript the first or last word, etc.

slicing and dicing a file: the line, word, character count (3/3)

String .splitlines() will split any string on the newlines, delivering a list of lines from the file.

file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built."" """

lines = file_text.splitlines()

print(lines)

    # ['For three months I did my day job, ', 'and at night and whenever I got a ',
    #  'chance I kept working on Python.  ', 'After three months I was to the ',
    #  'point where I could tell people, ', '“Look here, this is what I built.”']

print(len(lines))          # 6 (number of lines in the file)

.splitlines() divides the multi-line string into a list of lines
this has the effect of delivering an entire file as a list of lines, but with the newlines removed
because these lines are in a list, we now have the opportunity to count the lines, slice the lines, subscript the first or last line, etc.

"whole file" parsing: reading a file as a list of lines

String .splitlines() will split any string on the newlines, delivering a list of lines from the file.

fh = open('pyku.txt')           # 'file' object

file_text = fh.read()           # entire file as a single string

lines = file_text.splitlines()

print(lines)

    # ["We're out of gouda.", 'That parrot has ceased to be.',
    #  'Spam, spam, spam, spam, spam.']

print(len(lines))          # 3 (number of lines in the file)

.splitlines() divides the multi-line string into a list of lines
this has the effect of delivering an entire file as a list of lines, but with the newlines removed
because these lines are in a list, we now have the opportunity to count the lines, slice the lines, subscript the first or last line, etc.

Ex. 5.27 -> 5.29

Summary: 3 ways to read strings from a file

for: read (newline ('\n') marks the end of a line)

fh = open('students.txt')        # file object allows looping
                                 # through a series of strings
for my_file_line in fh:          # my_file_line is a string
    print(my_file_line)           # prints each line of students.txt

fh.close()                       # close the file

read(): read entire file as a single string

fh = open('students.txt')  # file object allows reading
text = fh.read()                 # read() method called on file
                                 # object returns a string
fh.close()                       # close the file

print(text)                       # entire text as a single string

readlines(): read as a list of strings (each string a line)

fh = open('students.txt')
file_lines = fh.readlines()      # file.readlines() returns
                                 # a list of strings
fh.close()                       # close the file

print(file_lines)                 # entire text as a list of lines

when we read a file, we can choose from these 3 approaches
we cannot do more than one read on a file, so we choose one
which we choose depends on what we want to do -- analyze the file as a whole (.read() or .readlines()

sidebar: writing to a file

We don't have call to write to a file in this course, but it's important to know how.

wfh = open('newfile.txt', 'w') # open for writing # (will overwrite an existing file) wfh.write('this is a line of text\n') wfh.write('this is a line of text\n') wfh.write('this is a line of text\n') wfh.close()

note the second argument to open(): this determines the 'mode'

the mode may be 'r' (reading, the default), 'w' (writing, first deletes the file if it exists) or 'a' (appending, adds to the end of the file)

note also that the newline character ('\n') must be added to the end; the .write() function is unlike print() in that it does not add a newline automatically

sidebar: the range() function

This function allows us to iterate over an integer sequence.

counter = range(10) for i in counter: print(i) # prints integers 0 through 9 for i in range(3, 8): # prints integers 3 through 7 print(i)

If we need an literal list of integers, we can simply pass the iterable to a list:

intlist = list(range(5)) print(intlist) # [0, 1, 2, 3, 4]

Dictionaries: Lookup Tables

dictionaries

A dictionary (or dict) is a collection of unique key/value pairs of objects.

mydict = {} # empty dict mydict = {'a':1, 'b':2, 'c':3} # dict with str keys and int values val = mydict['a'] # look up 'a'; returns 1

each item is a pair

the pair consists of a key and a value

the keys in a dict are unique

each key is associated with a value

a dict is addressible by key, in the same way that a list is addressable by index

in other words, we "look up" the value for any given key by using a subscript

example uses: dictionaries

Pairs describe data relationships that we often want to consider:

companies paired with annual revenue for each company

employees paired with contact information for each employee

students paired with grade point averages

dates paired with the high temperature for each

web pages paired with the number of times each was accessed

You yourself may consider data in pairs, even in your personal life:

home projects and the amount of time you think each might take

different items you might want to buy at the grocery and the price for each

stores and their distance from your house

restaurants and their ratings

your family members and their ages

types of dictionaries

There are three main ways dictionaries are used.

a lookup table: pairing clients with their addresses allows you to look up the address of any client

a ranking: pairing companies with their market capitalizations allows you to rank them by market cap

an aggregation: pairing each city with the number of students listed from that city allows you to see which cities have the most students

initialize a dict

Dicts are marked by curly braces. Keys and values are separated with colons.

mydict = {} # empty dict mydict = {'a':1, 'b':2, 'c':3} # dict with str keys and int values

add a key/value pair to a dict

We use subscript syntax to assign a value to a key.

mydict = {'a':1, 'b':2, 'c':3}

mydict['d'] = 4       # setting a new key and value

print(mydict)         # {'a': 1, 'c': 3, 'b': 2, 'd': 4}

retrieve a value from a dict using a key

We also use subscript syntax to retrieve a value.

mydict = {'a':1, 'b':2, 'c':3, 'd': 4}

dval = mydict['d']       # value for 'd' is 4

xxx = mydict['c']        # value for 'c' is 3

You might notice that this subscripting is very close in syntax to list subscripting. The only difference is that instead of an integer index we are using the dict key (most often a string).

the KeyError exception

This exception is raised when we request a key that does not exist in the dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

val = mydict['d']       # KeyError:  'd'

Like the IndexError exception, which is raised if we ask for a list item that doesn't exist, KeyError is raised if we ask for a dict key that doesn't exist.

check for key membership

If we're not sure whether a key is in the dict, before we subscript we can check to confirm.

mydict = {'a': 1, 'b': 2, 'c': 3}

if 'a' in mydict:
    print("'a' is a key in mydict")

Ex. 6.1 - 6.4

Dictionaries: Rankings

dictionary rankings

Dictionaries can be sorted by value to produce a ranking.

dictionaries, particularly ones that have numeric values, can be sorted by value
if the value is a quantitative measure for each key, the dict can be used as a ranking
examples are: students and their grade point averages, companies and their market caps, sports teams and their wins, competing products and their prices

loop through dict keys and values

We loop through keys and then use subscripting to get values.

mydict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

for key in mydict:         # a
    val =  mydict[key]

    print(key)             # a
    print(val)             # 1
    print()
                           # b
                           # 2

                           # (continues with 'c' and 'd')

Note that plain 'for' looping over a dict delivers the keys.
We must then use the key to query the dict to get the value.
Loop through the dict and receiving each key, we use a subscript to get the value.

Ex. 6.8

review: sorting any container with sorted()

With any container or iterable (list, tuple, file), sorted() returns a list of sorted items.

namelist = ['banana', 'apple', 'dates', 'cherry']

slist = sorted(namelist, reverse=True)

print(slist)          # ['dates', 'cherry', 'banana', 'apple']

Remember that no matter what container is passed to sorted(), the function returns a list. Also remember that the reverse=True argument to sorted() can be used to sort the items in reverse order.

sorting a dict (sorting its keys)

sorted() returns a sorted list of a dict's keys.

bowling_scores = {'bob': 123, 'zeb': 98, 'mike': 202, 'alice': 184}

sorted_keys = sorted(bowling_scores)

print(sorted_keys)       # [ 'alice', 'bob', 'mike', 'zeb' ]

Ex. 6.9

sorting a dictionary's keys by its values

A special argument to sorted() can cause Python to sort a dict's keys by its values.

bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'alice': 184}

sorted_keys = sorted(bowling_scores, key=bowling_scores.get)

print(sorted_keys)                 # ['zeb', 'jeb', 'alice', 'mike']

for player in sorted_keys:
    print(f"{player} scored {bowling_scores[player]}")

        ##  zeb scored 98
        ##  jeb scored 123
        ##  alice scored 184
        ##  mike scored 202

The key= argument allows us to instruct sorted() how to sort the keys. The sorting works in part by use of the dict .get() method (discussed later). Passing .get to sorted() causes it to sort by value instead of by key. Ex. 6.10

assign multiple values to individual variables

multi-target assignment allows us to "unpack" the values in a container.

If the container on the right has 3 values, we may unpack them to three named variables.

company, state, revenue = ["Haddad's", 'PA', '239.50']

print(company)      # Haddad's
print(revenue)      # 239.50

But if the values we want are in a CSV line we can split them to a list -- and then assign them using multi-target assignment.

csv_line = "Haddad's,PA,239.50"

company, state, revenue = csv_line.split(',')

print(company)      # Haddad's
print(state)        # PA

Ex. 6.14

build up a dict from two fields in a file

As with all containers, we loop through a data source, select and add to a dict.

ids_names = {}                 # empty dict

fh = open('student_db.txt')

for line in fh:
    stuid, street, city, state, zip = line.split(':')

    ids_names[stuid] = state   # key id is paired to
                               # student's state


print("here is the state for student 'jb29':  ")
print(ids_names['jb29'])        #  NJ

fh.close()

ex. 6.15

Dictionaries: Aggregations

dict aggregations

A "counting" or "summing" dictionary answers the question "how many of each" or "how much of each".

Aggregations may answer the following questions:

How many students are from each state or country? (count)
How many cars are sold by each automaker? (count)
What is the total $ sales generated by each sales associate? (sum)
What are the total number of hours billed to each client? (sum)

The dict is used to store this information. Each unique key in the dict will be associated with a count or a sum, depending on how many we found in the data source or the sum of values associated with each key in the data source.

building a counting dict

A "counting" dict increments the value associated with each key, and adds keys as new ones are found.

state_count = {}                  # empty dict

fh = open('revenue.csv')

for line in fh:                   # str, "Haddad's,PA,239.50\n"

    items = line.split(',')       # list, ["Haddad's", 'PA', '239.50\n']
    state = items[1]              # str, 'PA'

    if state not in state_count:
        state_count[state] = 0

    state_count[state] = state_count[state] + 1

print(state_count)                # {'PA': 2, 'NJ': 2, 'NY': 3}
fh.close()

as we loop through the file, we look at the 2nd value in the line, the state
we will see several of the states repeated
if the state is not found in the dict, we add it, with a 0 as value
then (whether the state is newly added or not) we increment the value associated with that state
at the end, we have a count of the # of occurrences of each state

Ex. 6.16

building a summing dict

A "summing" dict sums the value associated with each key, and adds keys as new ones are found.

state_sum = {}                  # empty dict

fh = open('revenue.csv')        # 'file' object

for line in fh:                 # str, "Haddad's,PA,239.50\n"

    items = line.split(',')     # ["Haddad's", 'PA', '239.50']
    state = items[1]            # str, 'PA'
    value = float(items[2])     # float, 239.5

    if state not in state_sum:
        state_sum[state] = 0

    state_sum[state] = state_sum[state] + value

print(state_sum)      # {'PA': 263.45, 'NJ': 265.4, 'NY': 133.16}

fh.close()

the summing dictionary is very similar to the counting dict
the only difference is that we are summing values by state rather than counting

dictionary size with len()

len() counts the pairs in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

print(len(mydict))                 # 3 (number of keys in dict)

Ex. 6.21

sidebar: dict .get() method

This method may be used to retrieve a value without checking the dict to see if the key exists.

mydict = {'a': 1, 'b': 2, 'c': 3}

xx = mydict.get('a', 0)          # 1 (key exists so paired value is returned)

yy = mydict.get('zzz', 0)        # 0 (key does not exist so the
                                 #    default value is returned)

.get() works like dict subscripting - given a key, it returns the value for that key
however if the key is missing, .get() return a default value
the second argument to get is the default value
this method is sometimes used to avoid the KeyError exception that occurs when trying to read a nonexistent key

Ex. 6.22

sidebar: obtaining keys in a dict

The .keys() method gives access to the keys in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

these_keys = mydict.keys()

for key in these_keys:
    print(key)

print(list(these_keys))            # ['a', 'c', 'b']

the object returned from .keys() is known as a generator
looping through the object retrieves the keys
in order to see all of them in a list, we must pass the object to the list() function

sidebar: obtaining values in a dict

The .values() method gives access to the values in a dict.

mydict = {'a': 1, 'b': 2, 'c': 3}

values = list(mydict.values())     # [1, 2, 3]

if 'c' in mydict.values():
    print("'c' was found")

for value in mydict.values():
    print(value)

the values cannot be used to get the keys
however, we might want to check for a value to see if it is present
we might also want to sort or sum the values
(.values() also returns a generator, so we can use list() to retrieve them in one list)

sidebar: using the dict .items() method

.items() gives key/value pairs as 2-item tuples.

mydict = {'a': 1, 'b': 2, 'c': 3}

for key, value in mydict.items():
    print(key, value)               # a 1
                                    # b 2
                                    # c 3

print(list(mydict.items()))         # [('a', 1), ('c', 3), ('b', 2)]

.items() is usually used as another approach for looping through a dict
each item returned is a 2-item tuple, with the first item as the key and the second as the value
when looping with 'for', we can assign the tuple's two items (key and value) to variable names and use them immediately, rather than resorting to subscripting. This is usually easier and it is also more efficient.

Ex. 6.23

sidebar: working with dict items()

dict items() can give us a list of 2-item tuples. dict() can convert this list back to a dictionary.

mydict = {'a': 1, 'b': 2, 'c': 3}
these_items = list(mydict.items())    # [('a', 1), ('c', 3), ('b', 2)]

some_items = these_items[0:3]         # [('a', 1), ('c', 3)]

newdict = dict(some_items)

print(newdict)                        # {'a': 1, 'b': 2}

2-item tuples can be sorted and sliced, so they are a handy alternate structure.

sidebar: converting parallel lists to tuples

zip() zips up parallel lists into tuples; dict() can convert this to dict.

list1 = ['a', 'b', 'c', 'd']
list2 = [ 1,   2,   3,   4 ]

tupes = list(zip(list1, list2))

print(tupes)          # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print(dict(tupes))    # {'a': 1,    'b': 2,   'c': 3,   'd': 4}

Occasionally we are faced with two lists that relate to each other one a 1-to-1 basis... or, we sometimes even shape our data into this form. Paralell lists like these can be zipped into multi-item tuples.

The JSON File Format and Multidimensional Containers

the JSON file format

JavaScript Object Notation is a simple "data interchange" format for sending or storing structured data as text.

JSON is used by many web APIs and other programs for communcating data
it is "lightweight", meaning it does not require overhead (as with a database server)
it is text-based, so it can be easily stored to file or transmitted over a network
it was originally created for JavaScript, but became popular for use with any language
it is more flexible than CSV in describing many-to-one relationships
it uses lists and dictionaries to describe data
in many ways it resembles Python, because both languages grew out of the same tradition of languages

a sample json file

Fortunately for us, JSON resembles Python in many ways, making it easy to read and understand.

contents of file sample.json

{
   "key1":  ["a", "b", "c"],
   "key2":  {
              "innerkey1": 5,
              "innerkey2": "woah"
            },
   "key3":  false,
   "key4":  null
}

JSON containers use the same syntax as Python lists and dictionaries
this makes it easy for Python programmers to read and understand them
any container may contain other containers (dicts inside dicts, lists inside dicts, etc.)
this allows for the expression of complex ("many-to-one") data relationships

reading a structure from a json file

The json.load() function decodes the contents of a JSON file.

import json                 # this module is used to read JSON files

fh = open('sample.json')

mys = json.load(fh)         # load data objects from the file,
                            # convert into Python objects

fh.close()

print((type(mys)))            # dict (the outer container of this struct)

print(mys['key2']['innerkey2'])     # woah

Ex. 7.1

reading a structure from a json string

The json.loads() function decodes the contents of a JSON string.

import json                   # we use this module to read JSON
import requests               # use 'pip install' to install


response = requests.get('https://davidbpython.com/mystruct.json')

text = response.text          # str, entire file data

mys = json.loads(text)        # read string, convert into Python container

print((type(mys)))              # dict (the outer container of this struct)

print((mys['key2']['innerkey2']))    # woah

data retrieved from the internet comes to us as strings
in order to read it, we use json.loads()
this function reads from a string rather than a file

About requests:

like a web browser, your Python program can request and receive data
requests is a module that helps facilitate web requests
in order to use it, you may need to install it: at the command line, type (conda install requests or pip install requests)

Ex. 7.2

printing a complex object readably: writing to a string

A nested object can be confusing to read.

If we have an multidimensional object that is squished together and hard to read, we can use .dumps() with indent=4

import json

obj = {'a': {'x': 1, 'y': 2, 'z': 3}, 'b': {'x': 1, 'y': 2, 'z': 3}, 'c': {'x': 1, 'y': 2, 'z': 3} }

print((json.dumps(obj, indent=4)))

this prints:

{
    "a": {
        "x": 1,
        "y": 2,
        "z": 3
    },
    "b": {
        "x": 1,
        "y": 2,
        "z": 3
    }
}

sidebar: writing an object to json file

We can use json.dump() to write to a JSON file.

import json

wfh = open('newfile.json', 'w')    # open file for writing

obj = {'a': 1, 'b': 2}

json.dump(obj, wfh)

wfh.close()

we must open the file for writing with 'w'
we must also remember to close the file
running this program, you should see file newfile.json appear in your project folder
(keep in mind that if newfile.json already exists, it will be overwritten)

Reading Multidimensional Containers: Subscripting

pinpointing a specific value within a structure

We can use subscripts to "travel to" a value within a multidimensional.

value_table =       [
                       [ 1, 2, 3 ],
                       [ 10, 20, 30 ],
                       [ 100, 200, 300 ]
                    ]

val1 = value_table[0][0]       # float, 1
val2 = value_table[0][2]       # float, 3
val3 = value_table[2][2]       # float, 300

If we are intending to retrieve a single value from a multidimensional, we can use subscripts to pinpoint the target value.
This "outer" list has 3 items -- each item is a list, and each list represents a row of data. Each "inner" list has 3 items.
In this case, the calculation is simple - count indices to the "row" you want, then count indices to the item.
(Remember that indexing begins at 0.)

pinpointing a value within a list of dicts

In a list of dicts, each item is a dict.

lod = [
    { 'fname': 'Ally',
      'lname': 'Kane'   },
    { 'fname': 'Bernie',
      'lname': 'Bain'   },
    { 'fname': 'Josie',
      'lname': 'Smith'  }
]

val = lod[2]['lname']         # 'Smith'

val2 = lod[0]['fname']        # 'Ally'

The "outer" list contains 3 items, each item a dictionary
Each "inner" dictionary has identical keys 'fname' and 'lname'
Our first subscript will be a list index that locates the dict we need, and the 2nd will be a dict key to get the value within that dict.

pinpointing a value in a dict of dicts

A dict of dicts has string keys and dict values.

dod = {
    'ak23':  { 'fname': 'Ally',
               'lname': 'Kane' },
    'bb98':  { 'fname': 'Bernie',
               'lname': 'Bain' },
    'js7':   { 'fname': 'Josie',
               'lname': 'Smith' },
}

val = dod['ak23']['fname']     # 'Ally'

val2 = dod['js7']['lname']     # 'Smith'

The "outer" dict has 3 keys, each associated with a dictionary.
Each "inner" dictionary has identical keys 'fname' and 'lname'.
So our first subscript will be a dict key that locates the "inner" dict we need, and the 2nd will be another dict key to get the value within that dict.

Ex. 7.8 -> 7.10

Reading Multidimensional Containers: Looping

looping through a struct to read each "inner" structure

We begin by identifying the "inner" structures; 'for' looping takes us to each one in turn.

value_table =       [
                       [ 1, 2, 3 ],
                       [ 10, 20, 30 ],
                       [ 100, 200, 300 ]
                    ]

for inner_list in value_table:    # list, [ 1, 2, 3 ]

    print(inner_list[0])          # 1
                                  # 10
                                  # 100

in this program we are looking to retrieve a value from each inner item of a multidimensional
we first identify that each inner item is a list
we can use a 'for' loop to assign each "inner" list to the control variable (here we have chosen to call it inner_list)
next we can use subscripts to access a value within each "inner" list
it is recommended to name your variables descriptively, for example inner_list -- this will keep the identity of each variable clear to you as you work

looping through and accessing values within a list of dicts

In a list of dicts, each item is a dict.

lod = [
    { 'fname': 'Ally',
      'lname': 'Kane'   },
    { 'fname': 'Bernie',
      'lname': 'Bain'   },
    { 'fname': 'Josie',
      'lname': 'Smith'  }
]

for inner_dict in lod:
    print(inner_dict['fname'])         # Ally
    print(inner_dict['lname'])         # Kane
    print()

                                       # Bernie
                                       # Bain

                                       # Josie
                                       # Smith

we first identify this struct as a list of dicts
looping through the "outer" list, we see that each item is a dict
each "inner" dict has identical keys 'fname' and 'lname'
as we loop through the list, each "inner" dict is assigned to the control variable (inner_dict); we can then access whatever values we wish by using simple dict subscripting on the "inner" dict

looping through and accessing values within a dict of dicts

In dict of dicts, looping through retrieves each key, and we must subscript to retrieve the "inner" dict.

dod = {
    'ak23':  { 'fname': 'Ally',
               'lname': 'Kane' },
    'bb98':  { 'fname': 'Bernie',
               'lname': 'Bain' },
    'js7':   { 'fname': 'Josie',
               'lname': 'Smith' },
}

for id_key in dod:
    inner_dict = dod[id_key]

    print((inner_dict['fname']))        # Ally
    print((inner_dict['lname']))        # Kane
    print()

we first identify this struct as a dict of dicts
looping through the "outer" dict, each key in the dict is assigned to the control variable (id_key)
we then must use the key to get the value, which is the "inner_dict" for that key
we can then subscript this "inner" dict for the values that we need

Ex. 7.16 -> 7.18 also to discuss building a struct from file (Ex. 7.25 -> 7.30)

Introduction to User-Defined Functions

user-defined functions

A user-defined function is a block of code that can be executed by name.

def add(val1, val2):
    valsum = val1 + val2
    return valsum

ret = add(5, 10)           # int, 15

ret2 = add(0.3, 0.9)       # float, 1.2

A function is a block of code:

that can be executed by name ("calling" the function)
that may be executed with different input values
that can be executed multiple times in a program

user defined functions: calling the function

calling means activating the function and running its code.

def print_hello():
    print("Hello, World!")

print_hello()             # prints 'Hello, World!'
print_hello()             # prints 'Hello, World!'
print_hello()             # prints 'Hello, World!'

we are calling the function 3 times in this code
a function call is marked by parentheses after the function name -- this means that the function block will be run

user defined functions: arguments

The arguments are the inputs to a function.

def print_hello(greeting, person):              # note we do not
    full_greeting = f'{greeting}, {person}!'    # refer to 'name1'
    print(full_greeting)                        # 'place2', etc.
                                                # inside the function
name1 = 'Hello'
place1 = 'World'

print_hello(name1, place1)             # prints 'Hello, World!'


name2 = 'Bonjour'
place2 = 'Python'

print_hello(name2, place2)             # prints 'Bonjour, Python!'

we are calling the function, passing two arguments (inputs) to the call
the arguments are renamed in the function definition (greeting, person), and the function refers to them by these names
(the argument objects that were passed are copied to the argument names -- they are the same objects with new names)

user defined functions: function return values

A function's return value is passed back from the function using the return statement.

def print_hello(greeting, person):
    full_greeting = f'{greeting}, {person}!'
    return full_greeting

msg = print_hello('Bonjour', 'parrot')

print(msg)                                       # 'Bonjour, parrot!'

now instead of printing the greeting, we are returning it
a value is returned from the function using the return statement
the value returned is assigned to the variable assigned in the call (here, it is msg)

Ex. 9.1 - 9.5

Exception Trapping

exception trapping: handling exceptions after they are raised

Introduction: unanticipated vs. anticipated exceptions

Think of exceptions that we see raised by Python (SyntaxError, IndexError, etc.) as being of two general kinds -- unanticipated and anticipated:

unanticipated exceptions happen due to errors in our code, which we usually find during development. We respond to these by fixing our code so that the exceptions aren't raised.
anticipated exceptions are ones that we know could be raised due to external circumstances. We respond to these by writing code to avoid them, or deal with them if they should be raised.

Examples of anticipated exceptions:

we ask the user for a number to convert to int() but they give us a non-number
we try to open a file, but it has been moved or deleted
we try to connect to a database, but it is down

In each of these cases, we know the exception could be raised, and so we write code to try to avoid the exception, or to deal with it if it does.

KeyError: when a dictionary key cannot be found

If the user enters a key, but it can't be found in the dict.

mydict = {'1972': 3.08, '1973': 1.01, '1974': -1.09}

uin = input('please enter a year: ')         # user enters '2116'

print(f'value for {uin} is {mydict[uin]}')

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 5, in <module>
  #      print(f'value for {uin} is {mydict[uin]}')
  #                                  ~~~~~~^^^^^
  #  KeyError: '2116'

here is a simple example of an anticipatable exception
the outcome depends on the user: if they don't give us a valid year, we know what will happen -- a KeyError exception

ValueError: when the wrong value is used with a function or statement.

If we ask the user for a number, but they give us something else.

uin = input('please enter an integer:  ')

intval = int(uin)                           # user enters 'hello'

print('{uin} doubled is {intval*2}')

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 3, in <module>
  #      intval = int(uin)                           # user enters 'seven'
  #               ^^^^^^^^
  #  ValueError: invalid literal for int() with base 10: 'hello'

Here we know what will happen if the user doesn't enter all digits: a ValueError

FileNotFoundError: when a file can't be found.

If we attempt to open a file, but it has been moved or deleted.

filename = 'thisfile.txt'

fh = open(filename)

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 3, in <module>
  #      fh = open(filename)
  #           ^^^^^^^^^^^^^^
  #  FileNotFoundError: [Errno 2] No such file or directory: 'thisfile.txt'

one approach to managing exceptions: "asking for permission"

Up to now we have managed anticipated exceptions by testing to make sure an action will be succesful.

Examples of testing for anticipated exceptions:

we test the user's input to see that it's a number before attempting to convert to int()
we check to see if a file exists before attempting to open it
we check to see if a database is online before we try to connect to it

So far we have been dealing with anticipated exceptions by checking first -- for example, using .isdigit() to make sure a user's input is all digits before converting to int().
However, there is an alternative to "asking for permission": begging for forgiveness.

another approach to managing exceptions: "begging for forgiveness"

The try block can trap exceptions and the except block can deal with them.

try:
    uin = input('please enter an integer:  ')   # user enters 'hello'
    intval = int(uin)                           # int() raises a ValueError
                                                # ('hello' is not a valid value)

    print('{uin} doubled is {intval*2}')

except ValueError:
    exit('sorry, I needed an int')   # the except block cancels the
                                     # ValueError and takes action

trapping exceptions is an alternative to testing ahead of time: taking action after an anticipated exception is raised ("Begging for Forgiveness" rather than "Asking for Permission")
we trap the exception with try, then we handle it with except
the try: block will contain statements from which a potential raised exception is anticipated
the except: block will identify the anticipated exception and contain statements to be excecuted if the exception is raised

the procedure for setting up exception handling

It's important to witness the exception and where it it is raised before attempting to trap it.

It's strongly recommended that you follow a specific procedure in order to trap an exception:

allow the exception to be raised
note the exception type and line number where it was raised
wrap the line that caused the exception in a try: block
follow with an except: block, containing statements to be executed if the exception is raised
test that when the exception is raised, the except block is executed
test that when the exception is not raised, the except block is not executed

Ex. 9.12 - 9.13

trapping multiple exceptions

Multiple exceptions can be trapped using a tuple of exception types.

companies = ['Alpha', 'Beta', 'Gamma']

user_index = input('please enter a ranking:  ')   # user enters '4' or 'hello'

try:
    list_idx = int(user_index) - 1

    print(f'company at ranking {user_index} is {companies[list_idx]}')

except (ValueError, IndexError):
    exit(f'max index is {len(companies) - 1}')

the except block traps two exceptions
if the user types a non-number, a ValueError exception is raised
if the user enters a nonexistent index, an IndexError is raised
in either case the except: block will be executed

chaining except: blocks

The same try: block can be followed by multiple except: blocks, which we can use to specialize our response to the exception type.

companies = ['Alpha', 'Beta', 'Gamma']

user_index = input('please enter a ranking:  ')   # user enters '4'

try:
    list_idx = int(user_index) - 1

    print(f'company at ranking {user_index} is {companies[list_idx]}')

except ValueError:
    exit('please enter a numeric ranking')

except IndexError:
    exit(f'max index is {len(companies) - 1}')

The exception raised will be matched against each type, and the first one found will excecute its block. Ex. 9.14

avoiding except: and except Exception:

When we don't specify an exception, Python will trap any exception. This is a bad practice.

ui = input('please enter a number: ')

try:
    fval = float(ui)
except:                  # AVOID!!  Should be 'except ValueError:'
    exit('please enter a number - thank you')

However, this is a bad practice. Why?

except: or except Exception: can trap any type of exception, so an unexpected exception could go undetected
except: or except Exception: does not specify which type of exception was expected, so it is less clear to the reader

(There are certain limited circumstances under which we might use except: by itself, or except Exception. One comment practice is to place the entire program execution in a try: block and to trap any exception that is raised, so the exception can be logged and the program doesn't need to exit as a result.)

Set Operations and List Comprehensions

container processing: set comparisons

Set comparisons make it easy to compare 2 sets for membership.

set_a = {1, 2, 3, 4}
set_b = {3, 4, 5, 6}

print(set_a.union(set_b))           # {1, 2, 3, 4, 5, 6}  (set_a + set_b)

print(set_a.difference(set_b))      # {1, 2}              (set_a - set_b)

print(set_a.intersection(set_b))    # {3, 4}     (what is common between them?)

set comparisons aren't essential, as we could accomplish them with a simple algorithm using 'for' loops
however, set comparisons are very convenient to use
they are also more efficient to run

Ex. 9.15 - 9.17

"transforming" list comprehension

List comprehensions build a new list based on an existing list.

This list comprehension doubles each value in the nums list

nums = [1, 2, 3, 4, 5]

dblnums = [      val * 2      for val in nums     ]
   #            transform       'for' loop

print(dblnums)            # [2, 4, 6, 8, 10]

for val in nums is the 'for' loop: this specifies what we are looping through, and the name of the control variable
val * 2 is the transform: each item will be passed through this change in value and passed to the target list (dblnums).
(Note that the spacing in the example above is not required but added for clarity and commenting.)

Ex. 9.18

"filtering" list comprehension

List comprehensions can also select values to place in the new list.

This list comprehension selects only those values above 35 degrees Celsius:

daily_temps = [26.1, 31.0, 38.4, 36.1, 38.3, 34.1, 32.7, 33.3]

hitemps = [       t          for t in daily_temps        if t > 35     ]
    #          transform          'for' loop               filter

print(hitemps)          # [37.4, 36.1, 38.3]

the first t in t for t is the transform: in this case, the value will not be changed
the if expression is the filter: only value that pass this test will be included in the resulting list

Ex. 9.19

combining a transforming with a filtering list comprehension

We can choose to filter or transform or both.

This list comprehension selects values above 35C and converts them to Fahrenheit:

daily_temps = [26.1, 31.0, 38.4, 36.1, 38.3, 34.1, 32.7, 33.3]

f_hitemps = [ round((t * 9/5) + 32, 1)     for t in daily_temps     if t > 35 ]
     #              transform                'for' loop               filter

print(f_hitemps)          # [37, 36, 37]

Ex. 9.19

list comprehensions: examples

List comprehensions are a powerful convenience, but not ever required.

Some common operations can be accomplished in a single line. In this example, we produce a list of lines from a file, stripped of whitespace.

stripped_lines = [ i.rstrip() for i in open('pyku.txt').readlines() ]

We can even combine expressions for some fancy footwork

totals = [  float(i.split(',')[2])
            for i in open('revenue.csv')
            if i.split(',')[1] == 'NY'    ]

This last example borders on the overcomplicated -- if we are trying to do too much with a list comprehension, we might be better off with a conventional 'for' loop.

list comprehensions: why?

A list comprehension is a single statement.

since a list comprehension is one statement, it can go in places where a 'for' loop algorithm cannot
it can be used as a return value
it can be assigned to a variable
it can be the value in a dictionary
it can be made the argument to a function

sidebar: list comprehensions with dictionaries

Since dicts can be converted to and from 2-item tuples, we can manipulate them using list comprehensions.

Recall that dict .items() returns a list of 2-item tuples, and that the dict() constructor uses the same 2-item tuples to build a dict.

mydict =  {'a': 5, 'b': 1, 'c': -3}

# dict -> list of tuples
my_items = list(mydict.items())      # list, [('a', 5), ('b', 1), ('c', -3)]

# list of tuples -> dict
mydict2 = dict(my_items)       # dict, {'a':5,   'b':1,   'c':-3}

Here's an example: filtering a dictionary by value - accepting only those pairs whose value is larger than 0:

mydict = {'a': 5, 'b': 1, 'c': -3}

filtered_dict = dict([ (i, j)
                       for (i, j) in mydict.items()
                       if j > 0 ])

           # {'a': 5, 'b': 1}

The Command Prompt: Moving Around and Looking

the command prompt

The Command Prompt includes powerful commands for working with files and programs.

the command prompt (or Terminal prompt, or command line) is a text-based interface to the operating system
the command prompt was the original interface to your computer, before Windows File Explorer or the Mac Finder GUIs we use today
the command prompt is still used extensively by programmers and IT professionals to run programs and read and write files

opening the command prompt: Windows

In your Windows search, look for one of the following, and open it:

Anaconda Prompt (preferred, if you have it installed)
PowerShell
Command Prompt

You should see something similar to the following:

C:\Users\david>                       < -- command line

After opening this window, note the blinking cursor: this is your computer's operating system, awaiting your next command. (Please note that there may be small differences between your output and this illustration; these can usually be ignored.)

opening the command prompt: Mac or Linux*

in your Spotlight search, look for Terminal, and open it:

You should see something similar to the following:

Last login: Thu Sep  3 13:46:14 on ttys001

Davids-MBP-3:~ david$                 < -- command line

After opening the command prompt program on your computer, note the blinking cursor: this is your computer's operating system awaiting your next command. (Please note that there may be small differences between your output and this illustration; these can usually be ignored.)

Also note that the Apple Macintosh runs "on top of" and is actually powered by the Unix operating system; when we open a Terminal window, we are talking directly to Unix. Linux is another version of Unix, so the commands we explore are the same on either operating system.

the present working directory (pwd)

Your command line session is located at one particular directory on the file tree at any given time.

On Windows, the pwd is automatically displayed at the prompt:

C:\Users\david>

On Mac/Linux, type pwd and hit [Enter]:

Davids-MBP-3:~ david$ pwd
/Users/david

we can move around the filesystem, and will see the present working directory change
your pwd is the reference point for any files we might want to open or other directories to where we might want to travel

listing files in a directory: Windows

dir is the command to list the contents of a directory.

Type dir and hit [Enter]:

C:\Users\david> dir

 Volume Serial Number is 0246-9FF7

 Directory of C:\Users\david

08/29/2020  11:37 AM    <DIR>          .
08/29/2020  11:37 AM    <DIR>          ..
08/29/2020  11:28 AM    <DIR>          Contacts
08/29/2020  12:50 PM    <DIR>          Desktop
... etc ...

The contents of the directory include all files and folders that can be found in it.

listing files in a directory: Mac

ls is the command to list the contents of a directory.

Type ls and hit [Enter]:

Davids-MBP-3:~ david$ ls

Applications          Downloads         Movies
Desktop               Dropbox           Music
Documents             Library           Public
... etc ...

The contents of the directory include all files and folders that can be found in it.

visualizing the directory tree

Starting from the root, each folder may have files and other folders within.

C:\Users
├── david           <--- my pwd when I open my Terminal
│   ├── Desktop
│   │   └── python_data
│   │        ├── 00
│   │        ├── 01
│   │        │    ├─ 1.1.py
│   │        │    ├─ 1.2.py
│   │        │    ├─ 1.3.py
│   │        │    etc.
│   │        ├── 02
│   │        │    ├─ 2.1.py
│   │        │    ├─ 2.2.py
│   │        │    ├─ 2.3.py
             etc.

Most of us are familiar with the file tree, since we see it in one form or another when we browse files on our computers.
When we are moving around the file tree, it is best to imagine where we are in the file system - in which directory, and at what level.
(The above structure shows that I have placed my python_data/ folder in Desktop/ -- you may have placed it elsewhere.)

moving around the directory tree with 'cd'

cd stands for 'change directory'. This command works for both Windows and Mac.

on Mac/Linux:

Davids-MBP-3:~ david$ pwd
/Users/david

Davids-MBP-3:~ david$ cd Desktop

Davids-MBP-3:~ david$ pwd
/Users/david/Desktop

on Windows:

C:\Users\david> cd Desktop
C:\Users\david\Desktop>

moving to the child directory

To visit a directory "below" where we are, we simply name the child dir.

We move "down" the directory tree by using the name of the next directory -- this extends the path:

C:\Users\david> cd Desktop

C:\Users\david\Desktop> cd python_data

C:\Users\david\Desktop\python_data> cd 02

C:\Users\david\Desktop\python_data\02>

We can also travel multiple levels by specifying a longer path:

C:\Users\david> cd Desktop\python_data\02

C:\Users\david\Desktop\python_data\02>

(Please that on Windows we use the backslash separator (\); on Mac it is the forward slash(/).)

moving to the parent directory

The '..' (double dot) indicates the parent directory and can move us one directory "up".

If we'd like to travel up the directory tree, we use the special .. directory value, which signifies the parent directory:

C:\Users\david\Desktop\python_data\02> cd ..

C:\Users\david\Desktop\python_data> cd ..

C:\Users\david\Desktop> cd ..

C:\Users\david>

We can also travel multiple levels with multiple ../'s:

C:\Users\david\Desktop\python_data\02> cd ..\..\..

C:\Users\david>

(Please that on Windows we use the backslash separator (\); on Mac it is the forward slash(/).)

using ls or dir with cd

These two commands together allow us explore our filesystem.

Here is an example journey through some folders, viewing the contents of each folder as we move (this shows Mac/Linux output, but in Windows you may replace ls with dir):

Davids-MBP-3:Desktop david$ pwd
/Users/david/Desktop

Davids-MBP-3:Desktop david$ ls
python_data

Davids-MBP-3:Desktop david$ cd python_data

Davids-MBP-3:python_data david$ pwd
/Users/david/Desktop/python_data

Davids-MBP-3:python_data david$ ls
00
01
02
... etc.

Davids-MBP-3:python_data david$ cd 02

Davids-MBP-3:02 david$ ls
2.1.py       2.2.py       2.3.py       2.4.py       2.5.py
2.6.py       2.7.py       2.8.py       2.9.py       2.10.py

Davids-MBP-3:02 david$ pwd
/Users/david/Desktop/python_data/02

The Command Prompt: Executing Python Programs

verifying the PATH environment variable for Python

To execute scripts from the command line, we must ensure that the OS can find Python.

Please begin by opening a new Terminal or Command Prompt window. At the prompt, type python -V (make sure it is a capital V). (Please note that your prompt may look different than mine.)

Python can be found: python version is displayed

C:\Users\david> python -V
Python 3.11.5

Python can be found, but at the wrong version:

david@192 ~ % python -V
Python 2.7.16

Python can't be found:

david@192 ~ %  python -V
'python' is not a recognized...   or   'python': command not found...

If your path is not set correctly to a 3.x version of Python, you can find instructions on setting it in the supplementary documents for this class. You'll need to set the PATH to point to Python to continue with the remaining steps in this lesson. You may also contact your course manager for assistance.

executing a python script from the command line

Here we ask Python to run a script directly (without our IDE's help).

If you are in the same directory as the script, you can execute a program by running Python and telling Python the name of the script:

On Windows:

C:\Users\david\Desktop\python_data\02> python 2.1.py

On Mac:

Davids-MBP-3:02 david% python 2.1.py

Please note: if your prompt looks like this: >>>, you have entered the python interactive prompt. Type quit() and hit [Enter] to leave it. If there are any issues finding Python, please contact your course manager for assistance.

the STDIN, STDOUT and STDERR data streams

Your output goes to the screen, but in truth, it's going to "standard out".

when we use print(), we expect the string to be printed to the screen
the truth is that any output goes to something called a data stream
a data stream is simply an 'output pipe' from your program
once it leaves your program, the data is handled by the operating system
the standard output data stream is called "standard out", abbreviated as STDOUT
a similar output data stream, "standard error" or STDERR, is used when your program is delivering error output (an example is the string output of exit(), for example exit('error: missing argument'))
a third data stream, "standard in" or STDIN, can be fed into our programs

redirecting the STDOUT data stream to a file

STDOUT can be redirected to other places besides the screen.

hello.py: print a greeting

print('hello, world!')

redirecting STDOUT to a file at the command line (Windows or Mac):

mycomputer% python hello.py                # default:  to the screen
hello, world!

mycomputer% python hello.py > newfile.txt  # redirect to a file (not the screen)
                                           # (we see no output)

mycomputer% cat newfile.txt       # Mac:  cat spits out a file's contents
hello, world!

C:\> type newfile.txt             # Windows: type spits out a file's contents
hello, world!

STDOUT is the "place" or "stream" that data goes to when we print it
the > symbol can be used to redirect this stream to a file

redirecting the STDOUT data stream to STDIN of another program

The 'pipe' character can connect two programs; the output of a will be redirected as the input of b

Mac: direct output to the wc command (count lines, words and characters)

mycomputer% python hello.py | wc

       1       2      14                   # the output of wc

Windows: direct output to find command (count lines):

mycomputer% python hello.py | find /c /v ""

1

hello.py prints to STDOUT (usually the screen)
the pipe (|) connects this output to the STDIN of wc or find
wc is a Unix utility that counts lines, words and characters in text
find /c /v "" does similar work in Windows, counting lines

reading and redirecting the STDIN data stream

STDIN is the 'input pipe' to our program (usually the keyboard, but can be redirected to read from a file or other program).

import sys

for line in sys.stdin.readlines():
    print(line)

filetext = sys.stdin.read()          # alternative to above

A program like the above could be called this way, directing a file into our program's STDIN:

mycomputer% python readfile.py < file_to_be_read.txt

We can of course also direct the output of a program into our program's STDIN through use of a pipe:

mycomputer% ls -l | python readfile.py

The Command Prompt: Program Arguments

sys.argv to capture command line arguments

sys.argv is a list that holds string arguments entered at the command line

a python script get_args.py

import sys                           # import the sys module

print('first arg: ' + sys.argv[1])   # print first command line arg
print('second arg: ' + sys.argv[2])  # print second command line arg

running the script from the command line, with two arguments

% python myscript.py hello there
first arg: hello
second arg: there

sys (a module object) gives us access to various resources provided by the Python system, and our operating system.
sys.argv is a list that is automatically provided by the sys module.
This list contains any string arguments to the program that were entered at the command line by the user.
If the user does not type arguments at the command line, then they will not be added to the sys.argv list.

Please note: if your prompt looks like this: >>>, you have entered the python interactive prompt. Type quit() and hit [Enter] to leave it.

The default item in sys.argv: the program name

sys.argv[0] will always contain the name of our program.

a python script print_args.py

import sys
print(sys.argv)

(passing 3 arguments)

% python print_args.py hello there budgie
['myscript2.py', 'hello', 'there', 'budgie']

running the script from the command line (passing no arguments)

% python print_args.py
['myscript2.py']

sys.argv[0] always contains the name of the program itself
or, it may contain the pathname that was used to execute the program (i.e., directory path to the program filename)
even if no arguments are passed at the command line, sys.argv always holds this one value

IndexError with sys.argv (when user passes no argument)

Since we read arguments from a list, we can trigger an IndexError if we try to read an argument from sys.argv that wasn't passed at the command line.

a python script addtwo.py

import sys

firstint = int(sys.argv[1])
secondint = int(sys.argv[2])

mysum = firstint + secondint

print(f'the sum of the two values is {mysum}')

passing 2 arguments

% python addtwo.py 5 10
the sum of the two values is 15

passing no arguments

% python addtwo.py
Traceback (most recent call last):
  File "addtwo.py", line 3, in <module>
firstint = int(sys.argv[1])
IndexError: list index out of range

How to handle this exception? Test the len() of sys.argv, or trap the IndexError exception.

File and Directory Listings

writing to files using the file object

To open a file for writing, use the 2nd argument 'w'.

fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()

Files can be opened for writing or appending, but not usually for both.
Note that we are explicitly adding newlines to the end of each line. The write() method doesn't do this for us.
Special note: opening an existing file for writing truncates (erases) the file, so use with care!

appending to files using the file object

To open a file for appending, use the 2nd argument 'a'.

fh = open('new_file.txt', 'a')
fh.write("20250505 1203   something happened\n")
fh.close()

Appending is usually used for log files, that are often updated by adding new lines to the file.
Again, note that we are explicitly adding newlines to the end of each line.

use os.getcwd() to show the present/current working directory.

Below program assumes were are starting in our home directory:

import os                # os ('operating system') module talks
                         # to the os (for file access & more)

cwd = os.getcwd()        # str, '/Users/david'

print(cwd)

The 'os' module gives us access to the operating system and its work and environment (files, folders, programs, etc.).
Again, the pwd is the location where we 'are' currently.
An important note: VS Code uses the workspace folder as the pwd when you run programs through the IDE, whereas PyCharm uses the directory the script is in.

a sample file tree

We'll use this tree to explore relative filepaths.

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py

this tree can be found among your course files

relative filepaths

These paths locate files relative to the present working directory.

If the file you want to open is in the same directory as the script you're executing, use the filename alone:

fh = open('filename.txt')

relative filepaths: parent directory

To reach the parent directory, prepend the filename with ../

fh = open('../filename.txt')

relative filepaths: child directory

To reach the child directory, prepend the filename with the name of the child directory.

fh = open('childdir/filename.txt')

relative filepaths: sibling directory

To reach a sibling directory, prepend the filename with ../ and the name of the child directory.

fh = open('../childdir/filename.txt')

To reach a sibling directory, we must go "up, then down" by using ../ to go to the parent, then the sibling directory name to go down to the child.

absolute filepaths

These paths locate files from the root of the filesystem.

In Windows, absolute paths begin with a drive letter, usually C:\:

""" test3a.py:  open and read a file """

filepath = r'C:\Users\david\Desktop\python_data\dir1\file1.txt'
fh = open(filepath)

print(fh.read())

(Note that r'' should be used when expressing in our Python program any Windows paths that contain backslashes.)

On the Mac, absolute paths begin with a forward slash:

""" test3a.py:  open and read a file """

filepath = '/Users/david/Desktop/python_data/dir1/file1.txt'
fh = open(filepath)

print(fh.read())

(The above paths assume that the python_data folder is in the Desktop directory; your may have placed yours elsewhere on your system. Of course, the above paths also assume that my home directory is called david/; yours is likely different.)

os.path.join()

This function joins together directory and file strings with slashes appropriate to the current operating system.

dirname = '/Users/david'
filename = 'journal.txt'

filepath = os.path.join(dirname, filename)             # '/Users/david/journal.txt'

filepath2 = os.path.join(dirname, 'backup', filename)  # '/Users/david/backup/journal.txt'

this function inserts slashes (when needed) in between directory name and filename items, joining them together into one string
the slash inserted will be forward slash on Mac/Linux, and backslash on Windows machines
Keep in mind these are only strings, and this is a string join method. Its work is not affecting or reading any existing files or directories - it is only manipulating these strings.

os.listdir(): list a directory

os.listdir() can read the contents of any directory.

import os

mydirectory = '/Users/david'

items = os.listdir(mydirectory)

for item in items:                                # 'photos'

    item_path = os.path.join(mydirectory, item)

    print(item_path)  # /Users/david/photos/
                      # /Users/david/backups/
                      # /Users/david/college_letter.docx
                      # /Users/david/notes.txt
                      # /Users/david/finances.xlsx

Note the os.path.join() call. This is a standard algorithm for looping through a directory -- each item must be joined to the directory to ensure that the filepath is correct.

exceptions for missing or incorrect files or directories

Several exceptions can indicate a file or directory misfire.

exception type	triggered by
FileNotFoundError	attempt to open a file not in this location
FileExistsError	attempt to create a directory (or in some cases a file) that already exists
IsADirectoryError	attempt to open() a file that is already a directory
NotADirectoryError	attempt to os.listdir() a directory that is not a directory
PermissionError	attempt to read or write a file or directory to which you haven't the permissions
WindowsError, OSError	these exception types are sometimes raised in place of one or more of the above when on a Windows computer

depending on your OS type and Python version, you may see any of these exceptions
exceptions like FileNotFoundError are very specific, but more general exceptions like WindowsError or OSError may indicate one of the more specific conditions (missing file, missing directory, attempting to open a directory, etc.)

traversing a directory tree with os.walk()

os.walk() visits every directory in a directory tree so we can list files and folders.

import os
root_dir = '/Users/david'
for root, dirs, files in os.walk(root_dir):

    for tdir in dirs:                    # loop through dirs in this directory
        print(os.path.join(root, tdir))  # print full path to tdir

    for tfile in files:                  # loop through files in this dir
        print(os.path.join(root, tfile)) # print full path to file

os.walk() traverses an entire directory tree
at each iteration of the 'for' loop, it visits each node (i.e., directory), one at a time
starting with the supplied directory (above, root_dir), it visits each subdirectory, traveling to every node beneath the root (again, node means particular directory)

At each iteration, these three variables are assigned these values:

root: string, the "node" or directory currently being read
dirs: list, names of directories found in the current directory
files: list, names of files found in the current directory

File Tests and Manipulations

os.path.isfile() and os.path.isdir()

With these functions we can see whether a file is a plain file, or a directory.

import os                         # os ('operating system') module talks
                                  # to the os (for file access & more)
mydirectory = '/Users/david'

items = os.listdir(mydirectory)   # list of strings, files found in this directory

for item in items:                # str, first file or dir found in directory

    item_path = os.path.join(mydirectory, item)  # join directory name and file or dir

    if os.path.isdir(item_path):
        print(f"{item}:  directory")
    elif os.path.isfile(item_path):
        print(f"{item}:  file")
                                     # photos:  directory
                                     # backups:  directory
                                     # college_letter.docx:  file
                                     # notes.txt:  file
                                     # finances.xlsx:  file

again, os.path.join() is here connecting the directory being listed to a file or directory found in it, to construct a path to the file or directory
.isdir() returns True if the listing is a directory
.isfile() returns True if the listing is a file

os.path.exists()

This function tests to see if a file exists on the filesystem.

import os

fn = input('please enter a file or directory name:  ')
if not os.path.exists(fn):
    print('item does not exist')

elif os.path.isfile(fn):
    print('item is a file')

elif os.path.isdir(fn):
    print('item is a directory')

Keep in mind that if an item doesn't even exist, .isdir() and .isfile() will return False
This could lead you to believe that a non-existent entity is a dir based on getting False from .isfile(), or a file based on getting False from .isdir()
Therefore if we're getting False from .isdir() or .isfile(), we should check to see if the item even exists.
.isfile() and .isdir() will also return False if the item is another kind of entity -- depending on your operating system, an entry representing a link, a block device, or a socket may be encountered.

read file size with os.path.getsize()

os.path.getsize() takes a filename and returns the size of the file in bytes

import os

mydirectory = '/Users/david'

items = os.listdir(mydirectory)

for item in items:
    item_path = os.path.join(mydirectory, item)
    item_size = os.path.getsize(item_path)
    print(f"{item_path}:  {item_size} bytes")

Remember, as before, that when looping through a directory, Python won't be able to find a file unless its path is prepended. This is why os.path.join() is so important.

moving or renaming a file

moving and renaming a file are essentailly the same thing

import os

filename = 'file1.txt'
new_filename = 'newname.txt'

os.rename(filename, new_filename)

import os

filename = 'file1.txt'      # or could be a filepath incluing directory
move_to_dir = 'old/'

# renaming file1.txt to old/file1.txt
os.rename(filename, os.path.join(move_to_dir, filename))

in the first example, we simply rename a file
in the second example, we retain the old filename, but give it a new path. This is the same as moving the file to a new location - in a sense, we're renaming the path.

copying or backing up a file

import shutil                      # the 'shell utilities' module

filename = 'file1.txt'
backup_filename = 'file1.txt_bk'   # must be a filepath, including filename

shutil.copyfile(filename, backup_filename)

import shutil

filename = 'file1.txt'
target_dir = 'backup'              # can be a filepath or just a directory name

shutil.copy(filename, target_dir)  # dst can be a folder; use shutil.copy2()

shutil ("shell utilities") is a module for doing all kinds of operations on files, directories, and the like
shutil.copy() can copy a file to a new name or pathname, or to a new directory
it's important to note that the file "metadata" (creation date, modify date, etc.) will not be copied with the file

creating a directory: os.mkdir()

This function is named after the unix utility mkdir.

import os

os.mkdir('newdir')

A new directory will be created if one does not already exist.

removing a directory or file tree: os.remove() and shutil.rmtree()

If your directory is not empty, shutil.rmtree must be used.

import os
import shutil

os.mkdir('newdir')

wfh = open('newdir/newfile.txt', 'w')  # creating a file in the dir
wfh.write('some data')
wfh.close()

os.rmdir('newdir')        # OSError: [Errno 66] Directory not empty: 'newdir'

shutil.rmtree('newdir')   # success

if a directory is empty, os.rmdir() can remove it
if the directory is not empty, this function will fail
in that case, shutil.rmtree() will succeed
.rmtree() can also remove entire trees of directories and files
obviously, care must be taken before removing an entire file tree!

copying a file tree

Again, take care when working with entire trees!

import shutil

shutil.copytree('olddir', 'newdir')

Regardless of what files and folders are in the directory to be copied, all files and folders (and indeed all child folders and files within those) will be copied to the new name or location.

Interacting with External Processes

the operating system (OS) manages files and processes

Through interacting with the OS, we can manage files and launch other programs.

the "wider world" of our program is the world of the OS (operating system)
the OS is responsible for managing the files that are stored on our computer's disk
Python has functions to copy, delete, and move files
the OS is also responsible for managing the many programs that run on our computer simultaneously (processes)
Python has functions to launch and communicate with external processes
Python is a favored language among sysops (systems administrators) because of its wide feature set in this area

the subprocess module

This module can launch external programs from your Python script.

The subprocess module allows us to:

spawn new processes (i.e., run external programs)
connect to the processes' input/output/error pipes (STDOUT, STDIN, etc.)
obtain the processes' return codes when finished

subprocess.call()

Executes a command and outputs to STDOUT.

for Mac/Linux, using ls:

import subprocess

subprocess.call(['ls', '-l'])      # -l means 'long listing'

for Windows, using dir:

import subprocess

subprocess.call(['dir', '/b'], shell=True)  # /b means 'bare listing'

here, the .call() function is executing the ls and dir commands
the first string is the command, and additional strings specify arguments to these commands
the programs ls and dir output file listings
the output of these calls will go to STDOUT (usually, the screen)
the program name and argument(s) are placed in a list for security purposes

subprocess.call(): redirecting output

The output of the called program can be directed to a file or other process.

sending output of command to a write file (and error output to STDOUT)

import subprocess
import sys

wfh = open('outfile.txt', 'w')
subprocess.call(['ls', '-l'], stdout=wfh, stderr=sys.stdout)
wfh.close()

reading the contents of a file to the input of command wc

fh = open('pyku.txt')
subprocess.call(['wc'], stdin=fh)
fh.close()

as mentioned, wc is the 'word count' program on Unix: it returns the number of lines, words and characters in the file

subprocess.call(): executing through the shell with shell=True

The shell means the program that runs your Command or Terminal Prompt.

import subprocess

subprocess.call('dir /b', shell=True)

shell=True will execute directly through the operating system's shell (think of the shell as the "public face" of the os)
this means that certain behaviors of the Command Prompt, like glob expansion (replacing the * with all files in the directory) or environment variables will be activated
when using shell=True, the command must appear in a single string
However, if there is any chance the string could contain inputs from the user or the internet, do not use shell=True. This has been the source of some notable hacks in the past.
(note that dir is Windows-specific -- use ls on Mac/Linux)

subprocess.check_output()

This command executes a command and returns the output to a byte string rather than STDOUT.

(using the dir Windows command)

import subprocess

var = subprocess.check_output(["dir", "."])
var = var.decode('utf-8')
print(var)                   # prints the file listing for the current directory

(using the wc Mac/Linux command)

out = subprocess.check_output(['wc', 'pyku.txt'])
out = out.decode('utf-8')
print(out)                  #        3     15     80 pyku.txt

the output of .check_output() is a bytestring, or 'encoded' string of bytes
working with a bytestring means we must decode it to characters to use it as a conventional string

forking child processes with multiprocessing

forking allows a running program to execute multiple copies of itself simultaneously.

when a program forks, a new process is created that is a duplicate of the original process
the "child" process can be dispatched by the "parent" process to do some additional work while the parent continues with its own work
this technique is used in situations in which a single script process would take too long to complete, for example sending email messages (which requires the program to wait on the email sending program)
by dividing up the work, multiple processes can get some jobs done much more quickly

multiprocessing example

forking allows a running program to execute multiple copies of itself simultaneously.

from multiprocessing import Process
import os
import time

def info(title):                # function for a process to identify itself
    print(title)
    if hasattr(os, 'getppid'):  # only available on Unix
        print('parent process:', os.getppid())
    print('process id:', os.getpid())

def func(childnum):                # function for child to execute
    info(f'|Child Process {childnum}|')
    print('now taking on time consuming task...')
    time.sleep(3)
    print(f'{childnum}:  done')
    print()

if __name__ == '__main__':

    info('|Parent Process|')
    print(); print()
    procs = []
    for num in range(3):
        p = Process(target=func, args=(num,))
                                         # a new process object
                                         # target is function f
        p.start()                        # new process is spawned
        procs.append(p)                  # collecting list of Process objects

    for p in procs:
        p.join()                         # parent waits for child to return
    print('parent concludes')

multiprocessing output and discussion

Because multiple processes are spawned, we must imagine each one as a separately running program.

the if __name__ == '__main__' line ensures that any new process spawned will not execute the code inside the 'if' block
Process() defines a new child process, i.e. a new program
each new child process is an identical copy of the original process, except it begins executing at a different point
the target=func argument names the function that should be run by the child process, along with any arguments to be passed to that function ((num,) is a 1-item tuple)
p.start() starts ("spawns") a child process
after all 3 processes have been spawned, the program calls .join() on each of them. This is a method that causes the parent wait until each process concludes before exiting.
the net result is that the parent will not terminate execution until all of the child processes have concluded. This is considered good housekeeping - the parent should always wait for the child (otherwise a "zombie process" may be created).

Looking closely at the output, note that all three processes executed and that the parent didn't continue until it had heard back from each of them

|Parent Process|
parent process: 92180
process id: 92316

|Child Process 0|
parent process: 92316
process id: 92318
now taking on time consuming task...
|Child Process 2|
parent process: 92316
process id: 92320
now taking on time consuming task...
|Child Process 1|
parent process: 92316
process id: 92319
now taking on time consuming task...
0:  done
2:  done
1:  done
parent concludes

More About User-Defined Functions

user-defined functions and code organization

User-defined functions help us organize our code -- and our thinking.

Let's now return to functions from the point of view of code organization. Functions are useful because they:

are separate from, and do not interfere with, the rest of the code
help us implement a modular program design
allow us to test the function separately ("unit testing")
help us avoid repetition in our code
help organize our thinking

review: function block, argument and return value

def add(val1, val2):
    mysum = val1 + val2
    return mysum

a = add(5, 10)      # int, 15

b = add(0.2, 0.2)   # float, 0.4

Review what we've learned about functions:

when we call the function, program execution jumps up to the function block, executes the block, and then returns
the arguments here are the two values passed to the function call (inside the parentheses), i.e. 5, 10 or 0.2, 0.2
inside the function, the arguments are assigned to variables val1 and val2
the return value inside the function is mysum
once we have returned from the function, the return value is assigned to the variable assignment in the call, i.e. a or b above

Ex. 12.1 - 12.4

functions without a return statement return None

When a function does not return anything, it returns None.

def do(arg):
    print(f'{arg} doubled is {arg * 2}')
    # no return statement returns None

x = do(5)        # (prints '5 doubled is 10')

print(x)         # None

note that this function does not have a return statement
despite this, we are (erroneously) assigning the return value to a variable (x =)
x would normally be assigned whatever was returned from the function
since the function does not explicitly return anything, it returns a default value (None)
None in Python is the "value that means no value"

Actually, since do() does not return anything useful, then we should not call it with an assignment (i.e., x = above), because no useful value will be returned. If you should call a function and find that its return value is None, it often means that it was not meant to be assigned because there is no useful return value. Ex. 12.5 - 12.6

the None object type

The None value is the "value that means 'no value".

zz = None

print(zz)        # None
print(type(zz))  # <class 'NoneType'>

aa = 'None'      # oops, this is a string -- not the None value!

None is a distinct value: it is expressed by None (no quotes)
None is a value of its own type: NoneType
it is capitalized the same way that are the boolean values True and False
None represents "nothing", "empty", "void" or "undefined"
the null/empty/void value is present in most other programming languages
it is most often used to say "nothing here", "nothing was found", "the request came up empty" and other such meanings
the value must not have quotes around it -- quotes would be a 4-character string, not None

function argument type: positional

Positional arguments are required to be passed, and assigned by position.

def greet(firstname, lastname):
    print(f"Hello, {firstname} {lastname}!")

greet('Joe', 'Wilson')   # passed two arguments:  correct

greet('Marie')           # TypeError: greet() missing 1 required positional argument: 'lastname'

positional arguments are the ones we have used up until now
the number of arguments shown in the definition must match those in the call
(There is no type requirement for arguments to a function. Python will accept whatever objects are passed.)

function argument type: keyword

Keyword args are not required, and if not passed return a default value.

def greet(lastname, firstname='Citizen'):
    print(f"Hello, {firstname} {lastname}!")

greet('Kim', firstname='Joe')   # Hello, Joe Kim!

greet('Kim')                    # Hello, Citizen Kim!

this function has one positional (lastname) and one keyword argument (firstname='Citizen')
in the def, the keyword argument specifies a default value
in the first call, the positional and keyword arguments are passed
in the second call, the keyword argument is not passed, and so the function supplies the default value for that variable ('Citizen')

12.7 - 12.8

User-Defined Function Variable Scoping

variable name scoping: the local variable

Variable names initialized inside a function are local to the function.

def myfunc():
    tee = 10
    return tee

var = myfunc()

print(var)          # 10

print(tee)          # NameError ('tee' does not exist here)

variable tee does not exist outside the function because it was defined (i.e., assigned) inside the function
assignment inside the function makes the variable local to the function
we call this behavior scoping, and say that tee is scoped to the function
(it is true that the value 10 survives outside the function, but scoping refers to variable names, not objects)

Ex. 12.9

variable name scoping: the global variable

Any variable defined in our code outside a function is global.

var = 'hello global'      # global variable

def myfunc():
    print(var)            # this global is available here

myfunc()                  # hello global

any non-local variable defined in our code is global
globals are available both inside and outside a function

Ex. 12.10 - 12.11

"pure" functions

Functions that do not touch outside variables, and do not create "side effects" (for example, calling exit(), print() or input()), are considered "pure" -- and are preferred.

"Pure" functions have the following characteristics:

pure functions do not read from or write to "outside" variables (instead, they work only with arguments passed to the function)
pure functions do not call input() from inside the function (instead, they work with the arguments passed to the function)
pure functions do not call print() (instead, they return values to be printed outside the function)
pure functions do not call exit() (instead, they use the raise statement to signal errors - discussed later in this course)

"pure" functions: working only with "inside" (local) variables

"Outside" (Global) variables are ones defined outside the function -- they should be avoided.

wrong way: referring to an outside variable inside a function

val = '5'                   # defined outside any function

def doubleit():
    dval = int(val) * 2     # BAD:  function refers to "global" variable 'val'
    return dval

new_val = doubleit()

right way: passing outside variables as arguments

val = '5'                   # defined outside any function

def doubleit(arg):
    dval = int(arg) * 2     # GOOD:  refers to the same value '5',
    return dval             #        but accessed through local
                            #        argument 'arg'

new_val = doubleit(val)     # passing variable to function -
                            #   correct way to get a value into the function

using an outside variable creates a "dependency" between the outside variable and the function: if the variable changes, the behavior of the function changes
the outside variable could be defined in another part of the program, where we can't see it and may have lost mental track of it
one exception to this rule is the use of constants -- values that are intended never to be changed -- these could be used inside a function without ambiguity (discussed later)
however, passing the value to the function is the best approach -- this means that we can see explicitly (as well as being able to test) what value is going into the function

"pure" functions: avoiding "side-effects"

print(), input(), exit() all "touch" the outside world and in many cases should be avoided inside functions.

a "side-effect" is something that happens outside the function, but as a result of calling the function
print(): this function reflects values to the screen
input(): this function takes input from the keyboard
exit(): this function terminates program execution altogether

Although it is of course possible (and sometimes practical) to use these built-in functions inside our function, we should avoid them if we are interested in making a function "pure". It should also be noted that it's fine to use any of these in a function during development - they are all useful development tools.

"pure" functions: using raise instead of exit() inside functions

exit() should not be called inside a function.

def doubleit(arg):
    if not arg.isdigit():
        raise ValueError('arg must be all digits')   # GOOD:  error signaled with raise
    dval = int(arg) * 2
    return dval

val = input('what is your value? ')
new_val = doubleit(val)

this function requires input of a certain value - a string that is all digits
if the wrong value is passed, the function needs to respond
"pure" functions avoid exit() because it is a side-effect
the proper way to signal an error is with the raise statement
raise will literally raise an exception, the same way Python does

signalling errors (exceptions) with raise

raise creates an error condition (exception) that usually terminates program execution.

Python uses exceptions to signal that it cannot continue
this may be because it doesn't understand, or can't do, or won't do what we request
when writing functions, we prefer to signal an error using the raise statement rather than by calling exit() - this is how all built-in functions behave
the responsibility to exit should rest with the calling code, not with a function

To raise an exception, we simply follow raise with the type of error we would like to raise, and an optional message:

raise ValueError('please use a correct value')

You may raise any existing exception (you may even define your own). Here is a list of common exceptions:

Exception Type	Reason
TypeError	the wrong type used in an expression
ValueError	the wrong value used in an expression
FileNotFoundError	a file or directory is requested that doesn't exist
IndexError	use of an index for a nonexistent list/tuple item
KeyError	a requested key does not exist in the dictionary

Ex. 12.12 - 12.13

global variables and function "purity"

Globals should be used inside functions only in select circumstances.

STATE_TAX = .05    # ALL CAPS designates a "constant"


def calculate_bill(bill_amount, tip_pct):

    tax = bill_amount * STATE_TAX     # int, 5
    tip = bill_amount * tip_pct       # float, 20.0

    total_amount = bill_amount + tax + tip   # float, 125.0

    return total_amount


total = calculate_bill(100, .20)      # float, 125.0

here we're using global STATE_TAX inside the function
ALL_CAPS names indicate that we don't intend to change this value
because the value isn't changing, we can safely use it inside the function, knowing what its value will be
if the global were to change, that would change the behavior of this function and lead to bugs that are hard to track down
you must not except in very special cases make changes to a global variable inside a function (for example, a list) as this can cause very challenging debugging situations

"pure" functions: why prefer them?

Here are some positive reasons to strive for purity.

You may have noticed that these "impure" practices do not cause Python errors. So why should we avoid them?

pure functions are easier to maintain and extend
pure functions are more modular and thus make it easier to control our programs
pure functions can be tested in isolation
pure functions make code more reliable and less prone to error
pure functions make errors easier to trace and fix

The above perspective will become clearer as you write longer programs. As your programs become more complex, you will be confronted with more complex errors that are sometimes difficult to trace. Over time you'll realize that the best practice of using pure functions enhances the "quality" of your code -- making it easier to write, maintain, extend and understand the programs you create. Again, please note that during development it is perfectly allowable to call print(), exit() or input() from inside a function. We may also decide on our own that this is all right in shorter programs, or ones that we working on in isolation. It is with longer programs and collaborative projects where purity becomes more important.

proper code organization

Let's discuss some essential elements of a program.

Here are the main components of a properly formatted program. Please Sse the tip_calculator.py file in your files directory for an example:

Triple-quoted string at top of script: "docstring" with description, author, date, etc.
imports: all imports go at the top except in certain circumstances
global constants: ALL UPPERCASE variable names of values that are not expected to change and will be available everywhere
functions: all functions appear together before any "main body" code
a "main" function (optional): the "gateway" function that leads to all functions; the program could be "restarted" by calling this function
if __name__ == '__main__': in the "global" or "main body" space (meaning outside of any function), a "module gate" with a test that will be True only if the script was run directly, and False if the script was imported as a module (this is discussed in detail in the 'Modules' section

review of tip_calculator.py

the four variable scopes: L-E-G-B

Four kinds of variables: (L)ocal, (E)nclosing, (G)lobal and (B)uiltin.

filename = 'pyku.txt'       # 'filename':  global

                            # 'get_text':  global (function name is a
                            #                      variable as well)
def get_text(fname):        # 'fname':     local
    fh = open(fname)        # 'fh':        local; 'open':  builtin
    text = fh.read()        # 'text':      local
    return text

txt = get_text(filename)    # 'txt':       global
print(txt)                  # 'print':     builtin

Local variables are defined inside a function, and are limited to it
Enclosing variables (not shown here) would be those variables defined inside a function that are used inside a nested function, i.e a function defined inside another function
Global variables are defined in our program, but outside any function
Builtin variables are those defined by python: print(), len(), etc.

Modules in Python

importing built-in modules

Python comes with hundreds of preinstalled modules.

import sys             # find and import the sys module
import json            # find and importa the json module

print(sys.copyright)   # the .copyright attribute points
                       # to a string with the copyright notice

      # Copyright (c) 2001-2023 Python...


obj = json.loads('{"a": 1, "b": 2}')   # the .loads attribute points to
                                       # a function that reads str JSON data

print(type(obj))    # <class 'dict'>

modules are files that contain Python code for use in our programs
modules are made available through an import statement
we often import several modules in a given script
following import, we can access the module code through its attributes
each attribute points to a function, string, list, etc. that is a global variable in the module

other module import patterns

These patterns are purely for convenience when needed.

Abbreviating the name of a module:

import json as js           # 'json' is now referred to as 'js'

obj = js.loads('{"a": 1, "b": 2}')

Importing a module variable into our program directly:

from json import loads      # making the 'loads' function part of the global namespace

obj = loads('{"a": 1, "b": 2}')

Please note that this does not import only a part of the module: the entire module code is still evaluated and imported.

built-in module examples

Each module has a specific focus.

The sys module has functions that let us work with python's interpreter, and how it works with the operating system
The os module has functions that let us work with the operating system's files, folders and other processes
The datetime module has functions that let us easily calculate a date into the future or past, or compare two dates
The urllib2 module has functions that let us easily make HTTP requests over the internet

the python standard distribution of modules

Modules included with Python are installed when Python is installed -- they are always available.

Python provides hundreds of supplementary modules to perform myriad tasks. The modules do not need to be installed because they come bundled in the Python distribution -- that is, they are installed at the time that Python itself is installed. The documentation for the standard library is part of the official Python documentation (search for "Python documentation"). Python modules cover a wide range of tasks and specialized purposes:

various string-related services
specialized containers (type-specific lists and dicts, pseudohashes, etc.)
math calculations and number generation
file and directory manipulation
persistence (saving data on disk)
data compression and archiving (e.g., creating zip files)
encryption
networking and interprocess (program-to-program) communication
internet tasks: web server, web client, email, file transfer, etc.
XML and HTML parsing
multimedia: audio and image file manipulation
GUI (graphical user interface) development
code testing
etc...

finding third-party modules

Take some care when installing modules -- it is possible to install malicious code.

We generally find third-party modules by doing web searches, or from colleagues
When we find a module that meets our needs, we should do some research to make sure it's the one we want and need -- check online for references to it, examine the project's home page, make sure it is in active development
Although it's very rare, sometimes bad actors have created modules designed to pass viruses. These modules may be named similarly to popular modules
We must always be careful when selecting a module to install

- [demo: searching for powerpoint module, verifying ]

installing modules

Third-party modules must be downloaded and installed into your Python distribution.

command to use to install a 3rd party module:

david@192 ~ % pip install pandas

or, use this command if you have installed Anaconda or Miniconda Python:

david@192 ~ % conda install pandas     # installs pandas

third-party modules are not part of the Python distribution but have been created for the public to use
thousands of these modules are free and available on demand
the pip utility is installed along with Python
if you have installed Anaconda or Miniconda, try the conda utility first, because you may find a module that is quality-controlled to work with your distribution

PyPI: the python package index

This index contains links to all modules ever added by anyone to the index.

Search for any module's home page at the PyPI website:

https://pypi.python.org/pypi

PyPI is the definitive index of modules written in Python
there are more than 70,000 projects uploaded there, from serious modules used by millions of developers, to half-baked ideas that someone decided to share prematurely
usually, we will not search for modules here; we will hear about a module from an online article or through a search

user-defined modules

A module of our own design may be saved as a .py file.

messages.py: a simple Python module that prints messages

import sys

def print_warning(msg):
    print(f'Warning!  {msg}')

test.py: a Python script that imports messages.py

import messages

# accessing the print_warning() function
messages.print_warning('Look out!')   # Warning!  Look out!

we can also build our only modules; one way is by creating a .py file with module code
upon import, the entire module is read, compiled and executed
global variables in the module then become attributes of the module

Ex. 12.19 - 12.21

module search path

Python must be told where to find our own custom modules.

To view the currently used module search paths, we can use sys.path

import sys

print(sys.path)        # shows a list of strings, each a directory
                       # where modules can be found

the sys.path list contains all paths that your distribution of Python uses to store modules
to add our own folders (i.e., containing modules we'd like to import) we can augment this list with the PYTHONPATH environment variable (next)

setting the PYTHONPATH system environment variable

Like the PATH for programs, this variable tells Python where to find modules.

when we import a module, Python needs to find it
modules may be located in any of several directories (stored in sys.path)
to extend this list and add our own directories, we add them to the PYTHONPATH
when Python starts up to run a program, it looks for the PYTHONPATH variable and if found, adds the paths specified there to sys.path
for more specific instructions, please see your supplementary documents.

demo setting the variable

User-Defined Classes: the class Statement and Object Methods

introduction: classes

A class is a definition for a custom type of object.

At the start of this course, we defined an object:

An object is:
--> a unit of data
--> of a particular type
--> with type-specific functionality

user-defined classes allow us to create a custom type of object
the object will have its own behaviors and its own ways of storing data
many modules use custom objects to facilitate access to module functionality
we can create our own object types to provide an object-oriented interface to our own module's functionality

the object-oriented interface

All objects have behaviors called methods and data stored in attributes.

mylist = [1, 2, 3]
mylist.append(4)       # list, [1, 2, 3, 4]
print(mylist[-1])      # int, 4

mystr = 'hello'
ustr = mystr.upper()   # str, 'HELLO'

myint = 5
dblint = myint + 5     # int, 10

we have been working with the OOP interface from the start
any object of a type we have used (list, str, int, etc) have had data and behaviors designed to work with that data
the behaviors may take the form of methods, or be other operations such as subscripting or use of operators
the combination of data plus behaviors is the essence of the OOP interface

example of a class: the date and timedelta object types

Consider any object you encounter in terms of its data and behaviors.

from datetime import date, timedelta

dt = date(2023, 12, 30)         # new 'date' object for 12/30/2023
td = timedelta(days=3)          # new 'timedelta' object:  3 day interval

dt2 = dt + td                   # new date object:  date + timedelta

print(dt)                       # 2024-01-02 (3 days later)
print(type(dt))                 # <class 'datetime.datetime'>

a date object holds the data for a date
it is also capable of calculating dates (using math operators)
this combination of data plus behaviors is the essence of the OOP interface

designing an object interface: the (proposed) server object type

When creating a new object type, consider what you want it to be and to do.

import sysadmin                    # theoretical / proposed module

s1 = sysadmin.Server('work1',
                     username='user',
                     password='pass')

ms = s1.ping()
print(f'{s1.hostname} pinged at {ms}ms')   # 'work1' pinged at 43ms

s1.copyfile_up('myfile.txt')       # copies a file up to the server
s1.copyfile_down('yourfile.txt')   # copies a file down from the server

print(s1.uptime())                 # 7920 ('work1' restarted 2 hours, 12 minutes ago)

s1.restart()                       # restarts the server

print(s1.uptime())                 # 2 (work1 restarted 2 seconds ago)

when you consider a module you would like to create, think in terms of the OOP interface
consider what you would want an object to be and to do
this (proposed) server object represents a server. So we define methods that will give us access to the things a server can do.

the class statement

The class block statement is the blueprint for an object type.

class Me:
    pass          # 'pass' marks an empty block


m = Me()          # construct a new 'Me' object

print(type(m))    # <class '__main__.Me'>

a class defines what data an object holds and what it does
with nothing in it (marked by 'pass'), this class statement simply indicates the name
the name of a class is the same as its type (this is a Me class)
when we call the class, it produces an instance (object) of the class
the type() function shows us that this is a Me object
(the '__main__' part refers to the current script, or when imported would be the name of the module)

Ex. 13.1

defining a method

A method is a function that is part of a class.

class Say:

    def greet(self):
        print('Hello!')

    def make_greeting(self, name):
        return f'Hello, {name}!'

s = Say()

s.greet()                        # Hello!

g = s.make_greeting('Guido')     # 'Hello, Guido!'

a method is simply a function that belongs to the class
the method is called through the object with "dot" syntax
it is in most other ways like a function - it may take arguments and may return return values
(the 'self' argument will be discussed in our next lesson)

13.2 - 13.2

The init Constructor and Object Attributes

method calls pass the instance as first (implicit) argument, called self

Object methods otherwise known as instance methods, allow us to work with the instance's data.

class Do:

    def printme(self):
        print(self)     # <__main__.Do object at 0x1006de910>


x = Do()

print(x)                # <__main__.Do object at 0x1006de910>

x.printme()             # same as Do.printme(x)

by default, 'self' is passed to every method
the identity of 'self' can be obtained by printing it
printing both 'self' and the instance 'x' reveal the same hex code, which is a kind of identifier for an object
that the codes are the same indicates that self and x are the same object
the method call can be rewritten as Class.method(instance) (see last line); this illustrates that the instance is being passed to the methods

Ex. 13.7

the init() method

This method is automagically called when a new instance is constructed.

class Counter:

    def __init__(self):
        print(f'called __init__ with {self}')


a = Counter()    # called __init__ with <__main__.Counter object at 0x10f8ea590>

print(a)         # <__main__.Counter object at 0x10f8ea590>

calling Counter() automatically calls the __init__() method, if it is defined
if __init__() is not defined, it will not be called
__init__() must be named exactly as shown to be activated
as with all methods, the default first argument is the instance itself

Ex. 13.8

using init() to set initial value(s) in the instance

__init__() is used to initialize the instance's attributes.

class Counter:

    def __init__(self, ival):    # ival:  5
        self.count = ival        # sets .count to 5


a = Counter(5)

print(a.count)             # 5

a value (or values) passed to Counter() is passed to __init__()
the value is captured in a 2nd argument because the 1st argument is usually self
the value is customarily used to initialize the instance
by setting values in __init__(), we can guarantee that the instance has these values before anything else is done with the object

Ex. 13.9

instance methods: changing instance "state"

We can use methods to modify the instance's attributes.

class Counter:

    def __init__(self, ival):    # ival:  0
        self.count = ival        # sets .count to 0

    def increment(self):
        self.count = self.count + 1

    def get_value(self):
        return self.count


a = Counter(0)

a.increment()
a.increment()

print(a.get_value())    # 2

completing our Counter class, we now have an object that can keep track of a integer, and increment it on demand
we can see why self is included in every method -- it is because each method is intended to either add, modify or read attributes from the instance
we call these kinds of methods instance methods

'setter' and 'getter' Methods and Encapsulation

setter and getter methods

These methods are used to control the reading and writing of instance attributes.

class Counter:

    # a 'setter' method
    def __init__(self, val):
        if not isinstance(val, int):
            raise TypeError('arg must be an int')

        self.value = val       # set the value in the instance's attribute

    # a 'getter' method
    def getval(self):
        return self.value

    def increment(self):
        self.value = self.value + 1


a = Counter(10)

b = Counter('hello')    # TypeError:  arg must be an int

we are often concerned with maintaining the 'integrity' of our instances
this means controlling the data that the instance can hold
a 'setter' method is one that is used to control the setting of an attribute
in the above example, __init__() is used as the setter
the setter determines whether the initial value passed is correct for the object
if it is not, it rejects it (in this case, by raising a TypeError)

Ex. 13.10

breaking encapsulation

Encapsulation means that data integrity is maintained.

class Counter:

    def __init__(self, val):
        if not isinstance(val, int):
            raise TypeError('arg must be an int')

        self.value = val       # set the value in the instance's attribute

    def getval(self):
        return self.value

    def increment(self):
        self.value = self.value + 1


a = Counter(10)

a.value = 'hello'      # <-- here we are breaking encapsulation

a.increment()          # (unexpected TypeError)

in this example, encapsulation is broken on the 2nd to last line
the attribute .value is set to a str, inconsistent with how it is intended to be used
when we attempt to increment, the operation breaks the code
maintaining encapsulation means ensuring that unexpected errors like these do not happen

data object example

Objects are often used to represent a 'record' of data.

file records.csv

Janice,Korz,31
Adam,Elbert,29
Jake,Broom,30
Alice,Kim,41
Amber,Post,50
Eun-Kyung,Choi,33

class Student:
    def __init__(self, fname, lname, age):
        self.first = fname
        self.last = lname
        self.age = age


students = []                     # list to hold Student objects
for line in open('records.csv'):
    fn, ln, ag = line.split(',')
    stu = Student(fn, ln, ag)     # construct a new Student
    students.append(stu)          # add to the student list

for stu in students:              # loop through the list of Student objects
    print(stu.fname)              # print each student's name

Adam
Jake
Alice
Amber
Eun-Kyung

each 'record' (line) in the csv file contains data on an individual student
we design the Student object to hold that data
data-based objects like these are quite common

Study Glossary for OOP, Part I

These are OOP terms we introduced in this session.

class	A statement that defines a new type, and the attributes and methods that define the object's data and behavior.
instance	An object of the type defined in the class. Calling the class produces a new instance. The instance will have access to the methods defined in the class, but hold its own data values ("state") in its attributes.
object	See "instance".
method	A function defined in and as part of a class.
attribute	A value associated with an object. An "instance attribute" represents data stored in an instance. A "class attribute" represents data defined in a class, also known as a class variable.
constructor	The __init__ method, which defines the attributes in a new instance that is being created.
initialize	Define first values for any object. __init__ is so called because it sets attribute values in a new instance.
state	Refers to the values stored in the instance. An instance's state is most often changed by the __init__ constructor and "setter" methods, but may be changed at any point.
setter and getter methods	Methods that are designed to either write to or read from an instance.
encapsulation	The technique of controlling instance state (i.e., the values of its attributes) through methods designed for this purpose.

Class Variables / Attributes

class variables / attributes

Class variables are data stored directly in the class; they are actually attributes set in the class.

class MyClass:

    var = 10              # a class variable / attribute


val = MyClass.var         # 10 (retrieve the class attribute)

MyClass.var2 = 'hello'    # set an attribute directly in the class object


obj = MyClass()           # MyClass instance

var is assigned inside the class block; this makes it a class variable
a class variable is actually a class attribute
note how this variable is accessible as a attribute of the class (MyClass.var)
we can also set attributes directly in the class using <class>.<attribute> syntax and they will work in the same way
keep in mind the class variables are neither global nor local; they are assigned using a similar syntax, but they are actually attributes stored in the class itself
this is why when accessing them, we must prepend the class name (MyClass.var)

class variables / attributes are also accessible through its instances

Instances can read class attributes using the same syntax as when reading their own attributes.

class MyClass:
    var = 10              # a class variable / attribute


obj = MyClass()           # a MyClass instance / object


print((obj.var))            # 10

Note that obj.var is the same syntax as the one we use to access instance attributes.

<object>.<attribute> 'cascading' lookup

If an attribute can't be found in an object, it is searched for in the class.

class MyClass:
    classval = 10           # class attribute

    def __init__(self):
        self.instval = 99   # instance attribute

    def get(self):
        return self.instval


a = MyClass()

print(a.instval)            # 99 (found in the instance)
print(a.classval)           # 10 (found in the class

val = a.get()               # access class variable get() through the instance

when we access (or "look up") an attribute from an instance, it may come from one of two places
the attribute may be found in the instance; if not, it may be found in the class
Python uses a 'cascading lookup' to check first in the instance, then the class
this lookup is necessary in part because any method defined in the class is a class variable - so method calls must also find the attribute

class variable is shared data for instances

Here's an example of class data that is common to each instance.

class MyClass:
    instance_count = 0

    def __init__(self, letter):
        self.id = letter
        MyClass.instance_count = MyClass.instance_count + 1


a = MyClass('alpha')
b = MyClass('beta')

print(a.id)                 # 'a'
print(b.id)                 # 'b'

print(a.instance_count)     # 2
print(b.instance_count)     # 2

instance_count is a class variable
it is incremented every time an instance is created
each instance a and b have their own .id value
but each instance is also aware of how many instances have been created
(this is not a practical example but demonstrates a class variable in action)

Ex. 14.1

Class Methods and Static Methods

instance / object methods

Instance methods are designed to read or write attributes in the instance.

class Counter:

    # a 'setter' method
    def __init__(self, val):
        self.value = val       # set the value in the instance's attribute

    # a 'getter' method
    def getval(self):
        return self.value      # read the value in the instance's attribute

    def increment(self):
        self.value = self.value + 1


a = Counter(10)

all of the methods we have seen so far are instance methods
an instance method is one that works with the instance
instance methods are designed to read or write attributes in the instance
the instance method is marked in the definition by the initial argument self
since self is the instance and thus has access to the instance attributes, it is always present in the method's arguments list

class methods

Class methods are designed to read or write attributes in the class.

class MyClass:
    instance_count = 0

    def __init__(self, letter):
        self.id = letter
        MyClass.instance_count = MyClass.instance_count + 1

    @classmethod
    def reset_instance_count(cls):
        cls.instance_count = 0

a = MyClass('alpha')
b = MyClass('beta')

print(a.instance_count)     # 2

a.reset_instance_count()

print(a.instance_count)     # 0

the @ ("pie") notation above the function is called a decorator
note that reset_instance_count(cls) has a @classmethod decorator above its definition
note also that the argument to reset_instance_count() is cls, not self
cls refers to the class object itself (here it is MyClass)
since this class object has access to all of its attributes, we can modify them inside the class method
again note that we have to name the class in __init__ as we don't have direct access to the class object except by name

Ex. 14.2

static methods

Static methods do not work with instance or class, but still may belong to the class.

import datetime

class MyClass:
    instance_list = []

    def __init__(self, letter):
        self.id = letter
        formatted_time = MyClass.get_now()
        MyClass.instance_list.append((self, formatted_time))

    @staticmethod
    def get_now():
        dt = datetime.datetime.now()   # a datetime object
        ds = dt.strftime('%H:%M:%S')   # '17:06:41' (or current time)
        return ds

a = MyClass('alpha')
time.sleep(1)               # wait 1 second
b = MyClass('beta')

print(a.id)                 # 'a'
print(b.id)                 # 'b'

print(MyClass.instance_list)     # [(<MyClass>, '17:06'), (<MyClass>, '17:07')]

the static method 'get_now()' returns the current time as a formatted string
it does not require the instance or the class to do its job
so it is not appropriate to take self or cls as an argument
in order to turn it into a 'plain' function, we must add the decorater @staticmethod
(we are recording the time each instance was created possibly for logging or debugging purposes, but its practical use doesn't matter for this example)

Ex. 14.3

Inheritance and Polymorphism

inheritance

Child classes inherit attributes from Parent classes.

class Parent:
    parv = 500

class Child(Parent):           # Child inherits from Parent
    chi_val = 1

    def __init__(self, iv):
        self.iv = iv


c = Child(88)

print(c.iv)           # 88     (retrieved from instance)

print(c.chi_val)      # 1      (retrieved from the Child class)

print(c.parv)         # 500    (retrieved from the Parent class)

first, note that the Child class definition has Parent in its parentheses: Child inherits from Parent
next, notice that in each of the three print() statements we are accessing an attribute from a different place
attribute lookup looks first in the instance, then in the class, then in any parent classes
class variables defined in a parent class are accessible to any child class or instances of a child class
(these classes do not need to be named Parent and Child - they can have any names)

inheritance allows us to divide behavior into general, then specific

General behaviors can go in the parent, and more specific in the children.

class Animal:
    def __init__(self, name):
        self.name = name
    def eat(self, food):
        print(f'{self.name} eats the {food}.')

class Dog(Animal):
    def fetch(self, thing):
        print('{self.name} brings back the {thing}!')

class Cat(Animal):
    pass                 # means empty block

d = Dog('Rover')
c = Cat('Fluffly')

d.eat('dog food')        # Rover eats the dog food.
c.eat('cat food')        # Flffy eats the cat food.

d.fetch('stick')         # Rover brings back the stick!

note that both Dog and Cat inherit from Animal
through inheritance, both Dog and Cat can call Animal.eat()
however Dog is also able to call .fetch(), as this is specific to Dog

methods may be specialized in an inheriting class

Child class behavior can build upon the general behavior in the parent.

class Animal:
    def __init__(self, name):
        self.name = name
    def eat(self, food):
        print(f'{self.name} eats {food}.')

class Dog(Animal):
    pass

class Cat(Animal):
    def eat(self, food):
        if food != 'sushi':
            print('snif - snif - nah!')
        else:
            super().eat(food)        # calls Animal.eat()

d = Dog('Rover')
c = Cat('Fluffly')

d.eat('dog food')        # Rover eats the dog food.
c.eat('cat food')        # snif - snif - nah!
c.eat('sushi')           # Fluffy eats the sushi.

here we see that child class Cat has its own way of eating (it first checks to see whether the food is desirable, then eats)
if approed, it eats by calling Animal.eat()
the super() function fetches the parent class so methods can be called there without actually naming the parent

conceptually similar methods can be unified through polymorphism

Same-named methods in two different classes can share a conceptual similarity.

class Animal:
    def __init__(self, name):
        self.name = name

class Dog(Animal):
    def speak(self):
        print(f'{self.name}:  Bark! Bark!')

class Cat(Animal):
    def speak(self):
        print(f'{self.name}:  Meow!')


pet_list = [Dog('Rover'), Cat('Fluffy'), Cat('Precious'), Dog('Rex')]

for pet in pet_list:
    pet.speak()

                   # Rover:  Bark!  Bark!
                   # Fluffy:  Meow!
                   # Precious:  Meow!
                   # Rex:  Bark!  Bark!

"polymorphism" means "many shapes"
this refers to an arrangement in which different class instances do different things, but using the same named method
if we group similar behaviors under the same method name, it doesn't matter which Animal we're working with, and so we don't have to check
we can simply get instances of any class that inherits from Animal, and call the same method on each one
we can do this, confident that the instance will do the right thing

Study Glossary for OOP, Part II

These are OOP terms we introduced in this session.

class variable	See "class attribute".
class attribute	A value stored in the class, often defined as a variable within the class block. Class attributes are also accessible to its instances.
attribute lookup	Anytime an attribute of an instance or a class is accessed, Python must go through a process of trying to find it, as the attribute may be found in the instance, in the class, or in any classes from which its class inherits.
class method	A method that is designed to work with a class' attributes, rather than an instance's attributes. Contrast with instance method.
static method	A method that is not designed to work with either class or instance attributes, but belongs with the class because its functionality is related to what the class does.
instance method	A method that is designed to work with an instance's attributes. Also known as an object method.
object method	See instance method.
inheritance	The dynamic by which a "child" class, and its instances, are able to access attributes of a "parent" class.
polymorphism	The technique of creating two methods, of two different classes, that do different but conceptually similar things, and naming them the same name. Polymorphic methods can be called on either instance, with the expectation that the functionality appropriate to the instance of either type will result.

[pr]