Python 3

home

All Slides on One Page




Introduction; Installations and Setup

class goals


about python

Python's popularity is due to its elegance and simplicity.



the zen of python

Do other languages have a manifesto like this one?


The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!


about me: David Blaikie

I am dedicated to student success.



about you: welcome!

Prior exposure to Python is helpful, but not required.


You do not have to know anything about Python or programming, but some personal qualities will be very helpful. These are "soft skills" that will benefit you greatly as you proceed:


three technical requirements to write and run programs

If you already have an editor and Python installed, you do not need to add these.



Please keep in mind that if you are already able to write and run Python programs, you only need to add the class files.


the course materials

The zip file contains all files needed for our course exercises.


  1. Please look for the file called python_data.zip in your course files.
  2. Unzip the folder so that it has the following structure:

  3. python_data/
    ├── session_00_test_project/
    ├── session_01_objects_types/
    ├──── inclass_exercises/
    │       ├── inclass_1.1.py
    │       ├── inclass_1.2.py
    │       ├── ..etc..
    │       ├── inclass_1.6_lab.py
    │       ├── inclass_1.7_lab.py
    │       ├── ..etc..
    ├──── notebooks_inclass_warmup/
    ├── session_02_funcs_condits_methods/
    ├──── inclass_exercises/
    │       ├── inclass_2.1.py
    │       ├── inclass_2.2.py
    │       ├── ..etc..
    ├── session_03_strings_lists_files/
    ├── session_04_containers_lists_sets/
    ├── ..etc..
    └── session_10_classes/

  4. Place this folder in a location where you can find it
  5. Use PyCharm to open the folder (a project)
  6. Link the project to your version of Python




Your New Partner: the Python Interpreter

what do computers do?

Computers can do many different things for us.


Think about what our computers do for us:


what do computers really do?

At base, computers really only do three things.



Python can do many things, but we will focus on the first item -- working with data. The main purpose of any programming language is to allow us to store data in memory and then process that data according to our needs.


programming languages

A programming language like Python is designed to allow us to give instructions to our computer.



the Python Interpreter

The Interpreter is Python Itself.



evaluate - compile - run

When we run a python program, the Interpeter takes these three steps.



what the interpreter can do

Python is very smart in some ways.



what the interpreter can't do

Python is not smart in some ways, too!



how to respond to exceptions (errors)

We should seek to understand what the Interpreter is telling us.




This learning is not just about making programs work -- it's about understanding the interpreter -- what it can and can't do.




Executing Programs and Using the Lab Exercises

creating a new script (.py file) in PyCharm

A folder in PyCharm is known as a 'project'.


Open a Folder, which will correspond to a new workspace.


Add a new file.

Create a 'hello, world!' script.
print('hello, world!')
print()
Take care when reproducing the above script - every character must be in its place. (The print() at end is to clarify the Terminal output.) Next, we'll execute the script.


executing a script

PyCharm may be able to run your script, or some configuration may be required.


Attempt to run your script.


when programs run without error

'Without error' means Python did everything you asked.


On my Mac, I see this output:

hello, world!

Process finished with exit code 0

When you see the terminal prompt repeated, it means that the script has completed executing


when exceptions occur

An 'exception' is when Python cannot, or will not, do everything you asked in you program.


To demonstrate an exception, I removed one character from my code. Here is the result:

   File "/Users/david/test_project/test.py", line 2
     print('hello, world!)
          ^
SyntaxError: unterminated string literal (detected at line 2)

How should we read our exception?


Throughout this course I will repeatedly stress that you must identify the exception type, pinpoint the error to the line, and seek to understand the error in terms of the exception type, and where Python says it occurred.


the SyntaxError exception

Some element of the code is misplaced or missing.


print('hello, world!)
print()

File "/Users/david/test_project/test.py", line 2
  print('hello, world!)
        ^
SyntaxError: unterminated string literal (detected at line 2)

How do we respond to a SyntaxError? First by understanding that there's something missing or out of place in the syntax (the proper placement of language elements -- brackets, braces, parentheses, quotes, etc.) We look at the syntax on the line, and compare it to similar examples in other code that we've seen. Careful comparison between our code and working code will usually show us what's missing or misplaced. In the example above, the first print() statement is missing a quotation mark. It might be hard to see at first, but eventually you will develop "eyes" for this kind of error.


writing code: comments and blank lines

Use hash marks to comment individual lines; blank lines are ignored.


# this program adds numbers
var1 = 5
var2 = 2

var3 = var1 + var2        # add these numbers together

# these lines modify the value further
# var3 = var3 * 2
# var3 = var3 / 20

print(var3)


Using the Lab Exercises

We will use some exercises for demos in class; you will use them to practice you skills, and prepare for tests.


python_data/
├── session_00_test_project/
├── session_01_objects_types/
├──── inclass_exercises/
│       ├── inclass_1.1.py
│       ├── inclass_1.2.py
│       ├── ..etc..
│       ├── inclass_1.6_lab.py
│       ├── inclass_1.7_lab.py
│       ├── ..etc..
├──── notebooks_inclass_warmup/
├── session_02_funcs_condits_methods/
├──── inclass_exercises/
│       ├── inclass_2.1.py
│       ├── inclass_2.2.py
│       ├── ..etc..
├── session_03_strings_lists_files/
├── session_04_containers_lists_sets/
├── ..etc..
└── session_10_classes/

The exercises come in two forms:




Creating and Identifying Objects by Type

the variable

A variable is a value assigned ("bound") to an object.


xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)              # print 20 to screen

xx is a variable bound to 10 = is an assignment operator assigning 10 to xx yy is another variable bound to 2 * is a multiplication operator computing its operands (10 and 2) zz is bound to the product, 20 print() is a function that renders its argument to the screen.


the literal: a value typed into our code

early on we need to distinguish between a variable and a literal.


xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)             # print 20 to screen


the literal: a value typed into our code

early on we need to distinguish between a variable and a literal.


xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print zz              # print 20 to screen


the object

An object is a data value of a particular type.


Every data value in Python is an object.


var_int = 100                  # assign integer object 100 to variable var_int

var2_float = 100.0             # assign float object 100.0 to variable var2_float

var3_str = 'hello!'            # assign str object 'hello' to variable var3_str

# NOTE:  'hash mark' comments are ignored by Python.

At every point you must be aware of the type and value of every object in your code.


object types for this session

The three object types we'll look at in this unit are int, float and str. They are the "atoms" of Python's data model.


data typeknown asdescriptionexample value
intintegera whole number5
floatfloata floating-point number5.03
strstringa character sequence, i.e. text'hello, world!'


sidebar: string literal syntax

The string has 3 ways to enquote -- all produce a string.


s1 = 'hello, quote'
s2 = "hello, quote"

s3 = """hello, quote                # multi-line strings can be expressed with triple-quotes
Sincerely, Python"""


s4 = 'He said "yes!"'               # using single quotes to enquote double quotes
s5 = "Don't worry about that."      # using double quotes to enquote a single quote


identifying type through syntax

The way a variable is written in the code determines type.


It's vital that we always be aware of type.


myint = 5
myfloat = 5.0
mystr = '5.0'

Other languages (like Java and C) use explicit type declarations to indicate type, for example int a = 5. But Python does not do this.


identifying type through syntax

The way a variable is written in the code determines type.


It's vital that we always be aware of type.


myint = 5             # written as a whole number:  int
myfloat = 5.0         # written with a decimal point:  float
mystr = '5.0'         # written with quotes:  str

Other languages (like Java and C) use explicit type declarations to indicate type, for example int a = 5. But Python does not do this.


can we identify type through printing?

Printing is usually not enough to determine type, since a string can look like any object.


myint = 5
myfloat = 5.0
mystr = '5.0'

print(myint)         # 5
print(myfloat)       # 5.0
print(mystr)         # 5.0

mystr looks like a float, but it is a str.


identifying type through the type() function

If we're not sure, we can always have Python tell us an object's type.


myint = 5
myfloat = 5.0
mystr = '5.0'

print(type(myint))         # <class 'int'>
print(type(myfloat))       # <class 'float'>
print(type(mystr))         # <class 'str'>

python is strongly typed

This means that what an object can do is defined by its type.


a = 5            # int, 5
b = 10.0         # float, 10.0
c = '10.0'       # str, '10.0'

x = a + b        # 15.0           (adding int to float)

y = a + c        # TypeError      (cannot add int to str!)

Even though the value '10.0' looks like a number, it is of type str. Python will not add an int to a str.


variable names

You must follow correct style even though Python does not always require it.


name = 'Joe'
age = 29

my_wordy_variable = 100

student3 = 'jg61'




Math and String Operators

+, -, *, /: math operators

Math operators behave as you might expect.


var_int = 5
var2_float = 10.3

var3_float = var_int + var2_float    # int plus a float:  15.3, a float

var4_float = var3_float - 0.3    # float minus a float:  15.0, a float

var5_float = var4_float / 3      # float divided by an int:  5.0, a float

identifying type through an operation

Every operation or function call results in a predictable type.


With two integers, the result is integer. If a float is involved, it's always flot.

vari = 7
vari2 = 3
varf = 3.0

var3 = var * var2      # 35, an int.

var4 = var + var2      # 10.0, a float

When an integer is divided into another integer, the result is always a float.

var = 7
var2 = 3

var3 = var / var2      # 2.3333, a float


** exponentiation operator

The exponentiation operator (**) raises its left operand to the power of its right operand and returns the result as a float or int.


var = 11 ** 2     # "eleven raised to the 2nd power (squared)"
print(var)        # 121

var = 3 ** 4
print(var)        # 81

% Modulus Operator

The modulus operator (%) shows the remainder that would result from division of two numbers.


var = 11 % 2      # "eleven modulo two"
print(var)        # 1   (11/2 has a remainder of 1)


var2 = 10 % 2     # "ten modulo two"
print(var2)       # 0   (10/2 divides evenly:  remainder of 0)


+ operator with strings: concatenation

The plus operator (+) with two strings returns a concatenated string.


aa = 'Hello, '
bb = 'World!'

cc = aa + bb     # 'Hello, World!'

Note that this is the same operator (+) that is used with numbers for summing. Python uses the type of the operands (values on either side of the operator) to determine behavior and result.


* operator with one string and one integer: string repetition

The "string repetition operator" (*) creates a new string with the operand string repeated the number of times indicated by the other operand:

aa = '!'
bb = 5

cc = aa * bb       # '!!!!!!'

Note that this is the same operator (*) that is used with numbers for multiplication. Python uses the type of the operands to determine behavior and result.


python's "overloaded" operator +

Object types determine behavior.


int or float "added" to int or float: addition

tt = 5            # assign an integer value to tt
zz = 10.0         # assign a float value to zz

qq = tt + zz      # compute 5 plus 10 and assign float 15.0 to qq

str "added" to str: concatenation

kk = '5'          # assign a str value (quotes mean str) to kk
rr = '10.0'       # assign a str value to rr

mm = kk + rr      # concatenate '5' and '10.0'
                  # to construct a new str object, assign to mm

print(mm)         # '510.0'


python's "overloaded" operator *

Again, object types determine behavior.


int or float "multipled" by int or float: multiplication

tt = 5            # assign an integer value to tt
zz = 10           # assign an integer value to zz

qq = tt * zz      # compute 5 times 10 and assign integer 50 to qq
print(qq)         # 50, an int

str "multiplied" by int: string repetition

aa = '5'
bb = 3

cc = aa * bb      # '555'




Built-In Functions

built-in functions

Built-in functions activate functionality when they are called.


aa = 'hello'        # str, 'hello'

bb = len(aa)        # pass string object aa as an argument to function len(),
                    # which returns an integer object as a return value.

print(bb)            # int, 5


len() function

The len() function takes a string argument and returns an integer -- the length of (number of characters in) the string.


varx = 'hello, world!'

vary = len(varx)        # 13

round() function

The round() function takes a float argument and returns another float, rounded to the specified decimal place.


aa = 5.9583

bb = round(aa, 2)   # 5.96

cc = round(aa)      # 6


float precision and the round() function

Some floating-point operations will result in a number with a small remainder:

x = 0.1 + 0.2
print(x)          # 0.30000000000000004  (should be 0.3?)

y = 0.1 + 0.1 + 0.1 - 0.3
print(y)               # 5.551115123125783e-17  (should be 0.0?)

The solution is to round any result

x = 0.1 + 0.2     # 0.30000000000000004

z = round(x, 1)
print(z)          # 0.3

input() function

This function allows us to enter data into the program through the keyboard.


cc = input('enter name:  ')    # program pauses!  Now the user types something

print(cc)                      # [a string, whatever the user typed]


exit() function: terminate the program

The exit() function terminates execution immediately. An optional string argument can be passed as an error message.


aa = input('to quit, press "q" ')
if aa == 'q':
    exit(0)                           # 0 indicates a successful termination (no error)

if aa == '':                          # if user typed nothing and hit [Return]

    exit('error:  input required')    # string argument passed to exit()
                                      # indicates an error led to termination

Note: the above examples make use of if, which we will cover in a later lesson.


exit() to manipulate execution during development

This function can be used as a temporary stop to the program if we'd like to isolate some statements.


We can also use exit() to simply stop program execution in order to debug:

aa = '55'
bb = float(aa)
print('type of bb is:')
print((type(bb)))
exit()                  # we inserted this to stop the code
                        # from continuing; we'll remove it later

cc = bb * 2             # because of exit() above, this code
                        # will not be reached

int() "conversion" function

This function converts an appropriate value to the int type.


# str -> int
aa = '55'
bb = int(aa)         # 55 (an int)
print(type(bb))      # <class 'int'>

# float -> int
var = 5.95
var2 = int(var)      # 5:  the rest is lopped off (not rounded)


The conversion functions are named after their types -- they take an appropriate value as argument and return an object of that type.


float() "conversion" function

This function converts an appropriate value to the float type.


# int -> float
xx = 5
yy = float(xx)      # 5.0

# str -> float
var = '5.95'
var2 = float(var)   # 5.95 (a float)



str() "conversion" function

This function converts any value to the str type.


var = 5
var2 = 5.5

svar = str(var)     # '5'
svar2 = str(var2)   # '5.5'

print(len(svar))    # 1
print(len(svar2))   # 3

conversion challenge: treating a string like a number

Because Python is strongly typed, conversions can be necessary.


Numeric data sometimes arrives as strings (e.g. from input() or a file). Use int() or float() to convert to numeric types.


aa = input('enter number and I will double it:  ')

print(type(aa))         # <class 'str'>

num_aa = int(aa)        # int() takes the user's input as an argument
                        # and returns an integer

print(num_aa * 2)       # prints the user's number doubled

You can use int() and float() to convert strings to numbers.


avoid improvising syntax!

It's important for early coders to follow existing syntax and not make up their own.


Imagine that would like to find the length of a string. What do you do? Some students being writing code off the top of their head, even though they are not completely familiar with the right syntax


they may write something like this...

var = 'hello'

mylen = var.len()      # or mylen = length('var')
                       # or mylen = lenth(var)

...and then run it, only to get a strange error that's difficult to diagnose.


using existing examples of a feature to write new code using it

When you want to use a Python feature, you must follow an existing example -- you must not improvise!


Let's say you have a string and you'd like to get its length:

s = "this is a string I'd like to measure"     # determine length (36)

You look up the function in a reference, like pythonreference.com:

mylen = len('hello')

Then you use the feature syntax very carefully:

slen = len(s)         # int, 36

However, your code will be slightly different from the example code:


review: distinguish between variables and string literals

early on we need to distinguish between a variable and a literal.


xx = 10               # assign 10 to xx
yy = 2

zz = xx * yy          # compute 10 * 2 and assign integer 20 to variable yy

print(zz)             # print 20 to screen


review: distinguish between variables and string literals

early on we need to distinguish between a variable and a literal.


xx = 10
yy = 2

zz = xx * yy

print(zz)


taking care not to confuse a string literal and a variable name

Here's a common error that beginners make - try to avoid it!


Going back to our previous example - you'd like to use len() to measure this string:

s = "this is a string I'd like to measure"     # determine length (36)

You look up the function in a reference, like pythonreference.com:

mylen = len('hello')

You have been told to make your syntax match the example's. But should you do this?

slen = len('s')            # int, 1

You were expecting a length of 36, but you got a length of 1. Can you see why? The variable s points to a long string. The literal 's' is just a one-character string. In trying to match the example code, you may have thought you The takeaway is this: anyplace a literal is used, a variable can be used instead; and anyplace a variable is used, a literal can be used instead.




Conditionals and Blocks; Object Methods

conditionals: if/elif/else and while

All programs must make decisions during execution.


Consider these decisions by programs you know:


Conditional statements allow any program to make the decisions it needs to do its work.


'if' statement

The if statement executes code in its block only if the test is True.


aa = input('please enter a positive integer: ')
int_aa = int(aa)

if int_aa < 0:                          # test:  is this a True statement?

    print('error:  input invalid')      # block (2 lines) -- lines are
    exit()                              # executed only if test is True

d_int_aa = int_aa * 2                   # double the value
print('your value doubled is ' + str(d_int_aa))

The two components of an if statement are the test and the block. The test determines whether the block will be executed.


'else' statement

An else statement will execute its block if the if test before it was not True.


xx = input('enter an even or odd number:  ')
yy = int(xx)

if yy % 2 == 0:                    # can 2 divide into yy evenly?
    print(xx + ' is even')
    print('congratulations.')

else:
    print(xx + ' is odd')
    print('you are odd too.')

Therefore we can say that only one block of an if/else statement will execute.


'elif' statement

elif is also used with if (and optionally else): you can chain additional conditions for other behavior.


zz = input('type an integer and I will tell you its sign:  ')
zyz = int(zz)

if zyz > 0:
    print('that number is positive')

elif zyz < 0:
    print('that number is negative')

else:
    print('0 is neutral')


the python code block

A code block is marked by indented lines. The end of the block is marked by a line that returns to the prior indent.


xx = input('enter an even or odd number:  ')  # not in any block
yy = int(xx)                                      # ditto


if yy % 2 == 0:                         # the start of the 'if' block
    print('your number is even')
    print('even is cool')               # last line of the 'if' block


else:                                   # the start of the 'else' block
    print('your number is odd')
    print('you are cool')               # last line of the 'else' block


print('thanks for playing "even/odd number"')      # not in any block

Note also that a block is preceded by an unindented line that ends in a colon.


nested blocks increase indent

Blocks can be nested within one another. A nested block (a "block within a block") simply moves the code block further to the right.


var_a = int(input('enter a number: '))
var_b = int(input('enter another number:  '))

if var_b >= var_a:                         # compare int values for truth
    print("the test was true")
    print("var b is at least as large")

    if var_a == var_b:                     # if the two values are equivalent
        print('the two values are equivalent')

    print("now we're in the outer block but not in the inner block")

print('this gets printed in any case (i.e., not part of either block)')

Complex decision trees using 'if' and 'else' is the basis for most programs.


comparison operators with numbers

>, <, <=, >= tests with numbers work as you might expect.


var = 5
var2 = 3.3

if var >= var2:
    print('var is greater or equal')

if var == var2:
    print('they are equivalent')

'==' with strings

With strings, this operator tests to see if two strings are identical.


var = 'hello'
var2 = 'hello'

if var == var2:
    print('these are equivalent strings')


the 'in' operator with strings

'in' with strings allows you can to see if a 'substring' appears within a string.


article = 'The market rallied, buoyed by a rise in Samsung Electronics.  The other...'

if 'Samsung' in article:
    print('Samsung was found')


'and' "compound" test

Python uses the operator and to combine tests: both must be True.


The 'and' compound statement if both tests are True, the entire statement is True.


xx = input('what is your ID?  ')
yy = input('what is your pin?  ')

if xx == 'dbb212' and yy == '3859':
    print('you are a validated user')
else:
    print('you are not validated')

Note the lack of parentheses around the tests -- if the syntax is unambiguous, Python will understand. We can use parentheses to clarify compound statements like these, but they often aren't necessary. You should avoid parentheses wherever you can.


'or' "compound" test

Python uses the operator or to combine tests: either can be True.


The 'or' compound statement if either test is True, the entire statement is True.


aa = input('please enter "q" or "quit" to quit: ')
if aa == 'q' or aa == 'quit':
    exit()
print('continuing...')

Note the lack of parentheses around the tests -- if the syntax is unambiguous, Python will understand. We can use parentheses to clarify compound statements like these, but they often aren't necessary. You should avoid parentheses wherever you can.


testing a variable against two values

Bogth sides of an 'or' or 'and' must be complete tests.


if aa == 'q' or aa == 'quit':          # not "if aa == 'q' or 'quit'""
    exit()

Note the 'or' test above -- we would not say if aa == 'q' or 'quit'; this would always succeed (for reasons discussed later).


testing a variable against multiple values

We can also test a variable against multiple values by using in with a list (more on lists next week):

if aa in ['q', 'quit']:
    exit()

negating an 'if' test with 'not'

You can negate a test with the not keyword.


var_a = 5
var_b = 10

if not var_a > var_b:
    print("var_a is not larger than var_b (well - it isn't).")

Of course this particular test can also be expressed by replacing the comparison operator > with <=, but when we learn about new True/False condition types we'll see how this operator can come in handy.


boolean (bool) values True and False

True and False are boolean values (type bool), and are produced by expressions that can be seen as True or False.


aa = 3
bb = 5

if aa > bb:
    print("that is true")

Tests are actually expressions that resolve to True or False, which are values of boolean type:

var = 5
var2 = 10
xx = (5 > 3)
print(xx)            # True
print(type(xx))      # <class 'bool'>

Note that we would almost never assign comparisons like these to variables, but we are doing so here to illustrate that they resolve to boolean values.




The 'while' Statement and Looping

the concept of incrementing

We reassign the value of an integer to effect an incrementing.


x = 0         # int, 0

x = x + 1     # int, 1
x = x + 1     # int, 2
x = x + 1     # int, 3

print(x)      # 3

For each of the three incrementing statements above, a new value that equals the value of x is created, and then assigned back to x. The previous value of x is replaced with the new, incremented value. Incrementing is most often used for counting within loops -- see next.


while loops

A while test causes Python to loop through a block repetitively, as long as the test is True.


This program prints each number between 0 and 4

cc = 0                 # initialize a counter

while cc < 5:          # "if test is True, enter the block"
    print(cc)
    cc = cc + 1        # "increment" cc:  add 1 to its current value
                       # WHEN WE REACH THE END OF THE BLOCK,
                       # JUMP BACK TO THE while TEST

print('done')

The block is executing the print and cc = cc + 1 lines multiple times - again and again until the test becomes False. Of course, the value being tested must change as the loop progresses - otherwise the loop will cycle indefinitely (infinite loop).


understanding while loops

while loops have 3 components: the test, the block, and the automatic return.


cc = 10

while cc > 0:         # the test (if True, enter the block)

       print(cc)      # the block (execute as regular Python statements)
       cc = cc - 1

  # the automtic return [invisible!]
  # (at end of block, go back to the test)

print('done')


loop control: "break"

break is used to exit a loop regardless of the test condition.


xx = 0
while xx < 10:
    answer = input("do you want loop to break? ")
    if answer == 'y':
        break             # drop down below the block
    print('Hello, User')
    xx = xx + 1
    print('I have now greeted you ' + str(xx) + ' times')

print("ok, I'm done")

loop control: "continue"

The continue statement jumps program flow to next loop iteration.


x = 0
while x < 10:
    x = x + 1
    if x % 2 != 0:             # will be True if x is odd
        continue               # jump back up to the test and test again
    print(x)

Note that print(x) will not be executed if the continue statement comes first. Can you figure out what this program prints?


the "while True" loop

while with True and break provide us with a handy way to keep looping until we feel like stopping.


while True:
    var = input('please enter a positive integer:  ')
    if int(var) > 0:
        break
    else:
        print('sorry, try again')

print('thanks for the integer!')

Note the use of True in a while expression: since True is always True this test will be always be True, and cause program flow to enter (and re-enter) the block. Therefore the break statement is essential to keep this loop from looping indefinitely.


debugging loops: the "fog of code"

Use print() statements to give visibility to your code execution.


The output of the code should be the sum of all numbers from 0-10, or 55:

revcounter = 0
while revcounter < 10:

    varsum = 0
    revcounter = revcounter + 1
    varsum = varsum + revcounter

    print("loop iteration complete")
    print("revcounter value: ", revcounter)
    print("varsum value: ", varsum)
    input('pausing...')
    print()
    print()

print(varsum)                        # 10

I've added quite a few statements, but if you run this example you will be able to get a hint as to what is happening:

loop iteration complete
revcounter value:  1
varsum value:  1
pausing...                          # here I hit [Return] to continue


loop iteration complete
revcounter value:  2
varsum value:  2
pausing...                          # [Return]

So the solution is to initialize varsum before the loop and not inside of it:

revcounter = 0
varsum = 0
while revcounter < 10:

    revcounter = revcounter + 1
    varsum = varsum + revcounter

print(varsum)

This outcome makes more sense. We might want to check the total to be sure, but it looks right. The hardest part of learning how to code is in designing a solution. This is also the hardest part to teach! But the last thing you want to do in response is to guess repeatedly. Instead, please examine the outcome of your code through print statements, see what's happening in each step, then compare this to what you think should be happening. Eventually you'll start to see what you need to do. Step-by-baby-step!




Object Methods

object methods

Objects are capable of behaviors, which are expressed as methods.


Use object methods to process object values

var = 'Hello, World!'
var2 = var.replace('World', 'Mars')      # replace substring, return a str
print(var2)                              # Hello, Mars!

Methods are type-specific functions that are used only with a particular type.


methods vs. functions

Compare method syntax to function syntax.


mystr = 'HELLO'

x = len(mystr)          # int, 5

y = mystr.count('L')    # int, 2

print(y)                # 2

Methods and functions are both called (using the parentheses after the name of the function or method). Both also may take an argument and/or may return a return value.


string method: .upper()

This "transforming" method returns a new string with a string's value uppercased.


upper() string method


var = 'hello'
newvar = var.upper()

print(newvar)                   # 'HELLO'

string method: .lower()

This "transforming" method returns a new string with a string's value uppercased.


lower() string method


var = 'Hello There'
newvar = var.lower()

print(newvar)                   # 'hello there'

string method: .replace()

this "transforming" method returns a new string based on an old string, with specified text replaced.


var = 'My name is Joe'

newvar = var.replace('Joe', 'Greta')    # str, 'My name is Greta'

print(newvar)                            # My name is Greta

This method takes two arguments, the search string and replace string.


string method: .isdigit()

This "inspector" method returns True if a string is all digits.


mystring = '12345'
if mystring.isdigit():
    print("that string is all numeric characters")

if not mystring.isdigit():
    print("that string is not all numeric characters")

Since it returns True or False, inspector methods like isdigit() are used in an if or while expression. To test the reverse (i.e. "not all digits"), use if not before the method call.


string method: .endswith()

This "inspector" method returns True if a string starts with or ends with a substring.


bb = 'This is a sentence.'
if bb.endswith('.'):
    print("that line had a period at the end")

string method: .startswith()

This "inspector"method returns True if the string starts with a substring.


cc = input('yes? ')
if cc.startswith('y') or cc.startswith('Y'):
    print('thanks!')
else:
    print("ok, I guess not.")

string method: .count()

This "inspector" method returns a count of occurrences of a substring within a string.


aa = 'count the substring within this string'
bb = aa.count('in')
print(bb)             # 3 (the number of times 'in' appears in the string)

string method: .find()

This "inspector" method returns the character position of a substring within a string.


xx = 'find the name in this string'
yy = xx.find('name')
print(yy)             # 9 -- the 10th character in mystring

f'' strings for string formatting

An f'' string allows us to embed any value such as numbers into a new, completed string.


aa = 'Jose'
var = 34

# 2 arguments to replace 2 {} tokens
bb = f'{aa} is {var} years old.'

print(bb)                                  # Jose is 34 years old.


f'' string format codes

An f'' string allows us to embed any value such as numbers into a new, completed string.


overview of formatting

# text padding and justification
# :<15     # left justify width
# :>10     # right justify width
# :^8      # center justify width

# numeric formatting
:f         # as float (6 places)
:.2f       # as float (2 places)
:,         # 000's commas
:,.2f      # 000's commas with float to 2 places

examples

x = 34563.999999

f'hi:  {x:<30}'      # 'hi:  34563.999999                  '

f'hi:  {x:>30}'      # 'hi:                    34563.999999'

f'hi:  {x:^30}'      # 'hi:           34563.999999         '

f'hi:  {x:f}'        # 'hi:  34563.999999'

f'hi:  {x:.2f}'      # 'hi:  34564.00'

f'hi:  {x:,}'        # 'hi:  34,563.999999'

f'hi:  {x:,.2f}'     # 'hi:  34,564.00'

Please note that f'' strings are available only as of Python 3.6.


sidebar: method and function return values in an expression; combining expressions

The return value of an expression can be used in another expression.


letters = "aabbcdefgafbdchabacc"

vara = letters.count("a")         # 5

varb = len(letters)               # 20

varc = vara / varb                # 5 / 20, or 0.25

vard = varc * 100                 # 25


print(len(letters) / letters.count("a") * 100)  # statements combined




Data Parsing & Extraction: String Methods

our first data format: csv

The CSV format will allow us to explore Python's text parsing tools.


comma-separated values file (CSV)

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010


CSV structure: "fields" and "records"

Tables consist of records (rows) and fields (column values).


Tabular text files are organized into rows and columns.


comma-separated values file (CSV)

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010
    19270104,0.30,0.15,0.73,0.010
    19280103,0.43,0.90,0.20,0.010
    19280104,0.14,0.47,0.01,0.010

space-separated values file

    19260701    0.09    0.22    0.30   0.009
    19260702    0.44    0.35    0.08   0.009
    19270103    0.97    0.21    0.24   0.010
    19270104    0.30    0.15    0.73   0.010
    19280103    0.43    0.90    0.20   0.010
    19280104    0.14    0.47    0.01   0.010


table data in text files

Text files are just sequences of characters. Commas and newline characters separate the data.


If we print a CSV text file, we may see this:

    19260701,0.09,0.22,0.30,0.009
    19260702,0.44,0.35,0.08,0.009
    19270103,0.97,0.21,0.24,0.010
    19270104,0.30,0.15,0.73,0.010
    19280103,0.43,0.90,0.20,0.010
    19280104,0.14,0.47,0.01,0.010

However, here's what a text file really looks like under the hood:

19260701,0.09,0.22,0.30,0.009\n19260702,0.44,0.35,0.08,
0.009\n19270103,0.97,0.21,0.24,0.010\n19270104,0.30,0.15,
0.73,0.010\n19280103,0.43,0.90,0.20,0.010\n19280104,0.14,
0.47,0.01,0.010


tabular data: looping, parsing and summarizing

Looping through file line strings, we can split and isolate fields on each line.


The process: 1. Open the file for reading. 2. Use a for loop to read each line of the file, one at a time. Each line will be represented as a string. 3. Remove the newline from the end of each string with .rstrip 4. Divide (using .split()) the string into fields. 5. Read a value from one of the fields, representing the data we want. 6. As the loop progresses, build a sum of values from each line. We will begin by reviewing each feature necessary to complete this work, and then we will begin to put it all together.


string method: .rstrip()

This method can remove any character from the right side of a string.


When no argument is passed, the newline character (or any "whitespace" character) is removed from the end of the line:

line_from_file = 'jw234,Joe,Wilson\n'

stripped = line_from_file.rstrip()      # str, 'jw234,Joe,Wilson'

When a string argument is passed, that character is removed from the end of the ine:

line_from_file = 'I have something to say.'

stripped = line_from_file.rstrip('.')   # str, 'I have something to say'

string method: .split() with a delimeter

This method divides a delimited string into a list.


line_from_file = 'jw234:Joe:Wilson:Smithtown:NJ:2015585894\n'

xx = line_from_file.split(':')

print(xx)                         # ['jw234', 'Joe', 'Wilson',
                                  #  'Smithtown', 'NJ', '2015585894\n']

string method: .split() without a delimeter

We can also thing of a string as delimited by spaces.


gg = 'this is a file    with    some     whitespace'

hh = gg.split()                   # splits on any "whitespace character"

print(hh)                         # ['this', 'is', 'a', 'file',
                                  #  'with', 'some', 'whitespace']




Data Parsing & Extraction: List Operations and String Slicing

lists and list subscripting

Subscripting allows us to select individual elements of a list.


fields = ['jw234', 'Joe', 'Wilson', 'Smithtown', 'NJ', '2015585894']

var = fields[0]           # 'jw234'
var2 = fields[4]          # 'NJ'
var3 = fields[-1]         # '2015585894' (-1 means last index)


lists: slicing

Slicing allows us to select multiple items from a list.


letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
first_four = letters[0:4]
print(first_four)                     # ['a', 'b', 'c', 'd']

# no upper bound takes us to the end
print(letters[5:])                    # ['f', 'g', 'h']

Here are the rules for slicing:


   1) the first index is 0
   2) the lower bound is the 1st element to be included
   3) the upper bound is one higher the last element to be included
   4) no upper bound means "to the end"

strings: slicing

Slicing a string selects characters the way that slicing a list selects items.


mystr = '2014-03-13 15:33:00'
year =  mystr[0:4]               # '2014'
month = mystr[5:7]               # '03'
day =   mystr[8:10]              # '13'

Again, please review the rules for slicing:


   1) the first index is 0 (first character)
   2) the lower bound is the 1st character to be included
   3) the upper bound is one higher the last character to be included
   4) no upper bound means "to the end"

the IndexError exception

An IndexError exception indicates use of an index for a list element that doesn't exist.


mylist = ['a', 'b', 'c']

print(mylist[5])            # IndexError:  list index out of range

Since mylist does not contain a sixth item (i.e., at index 5), Python tells us it cannot complete this operation.




Data Parsing & Extraction: File Operations and the 'for' Loop

the 'for' loop with a list

'for' with a list repeats its block as many times as there are items in the list.


mylist = [1, 2, 'b']

for var in mylist:       # 1
    print(var)           # ===
    print('===')         # 2
                         # ===
print('done')            # b
                         # ===
                         # done

Similar to a while block, the for block repeats the contents of its block multiple times, but does so only the number itms in the list. The control variable var is reassigned for each iteration of the loop. This means that if the list has 3 items, the loop executes 3 times and var is reassigned a new value 3 times.


review: the concept of incrementing

We reassign the value of an integer to effect an incrementing.


x = 0         # int, 0

x = x + 1     # int, 1
x = x + 1     # int, 2
x = x + 1     # int, 3

print(x)      # 3

For each of the three incrementing statements above, a new value that equals the value of x is created, and then assigned back to x. The previous value of x is replaced with the new, incremented value. Incrementing is most often used for counting within loops -- see next.


using a 'for' loop to count list items

An integer, updated for each iteration, can be used to count iterations.


mylist = [1, 2, 'b']

my_counter = 0

for var in mylist:
    my_counter = my_counter + 1

print(f'count:  {my_counter} items')   # counter:  3 items

The value of my_counter is initialized at 0 before the loop begins. Then, since the incrementing line my_counter = my_counter + 1 is inside the loop, the value of my_counter goes up once with each iteration. Please note that the len() function can count list items more efficiently, but we are using a counter to demonstrate the counter technique, which can be used in situations where len() can't be used, as when we count lines.


using a 'for' loop to sum list items

An integer, updated for each iteration, can be used to count iterations.


mylist = [1, 2, 3]

my_sum = 0

for val in mylist:
    my_sum = my_sum + val

print(f'sum:  {my_sum}')     # sum: 6  (value of 1 + 2 + 3)

The value of my_sum is initialized at 0 before the loop begins. Then, since the incrementing line my_sum = my_sum + val is inside the loop, the value of my_sum goes up once with each iteration. Please note that the sum() function can count list items more efficiently, but we are using a summing variable to demonstrate the summing technique, which can be used in situations where sum() can't be used, as when we are summing values from a file.


opening and reading a file with 'for'

'for' with a file repeats its block as many times as there are lines in the file.


fh = open('students.txt')              # file object allows
                                       # looping through a
                                       # series of strings

for xx in fh:                          # xx is a string, a line of the file
    print(xx)                           # prints each line of students.txt

fh.close()                             # close the file

"xx" is called a control variable, and it is automatically reassigned each line in the file as a string. break and continue work with for as well as while loops. Again, the control variable xx is reassigned for each iteration of the loop. This means that if the file has 5 lines, the loop executes 5 times and xx is reassigned a new value 5 times.


reading a file with 'with'

A file is automatically closed upon exiting the 'with' block.


A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.

with open('pyku.txt') as fh:
    for line in fh:
        print(line)

## at this point (outside the with block), filehandle fh has been closed.


summarizing: csv parsing with 'for' looping and string parsing

Here we put together all features learned in this session.


fh = open('revenue.csv')          # 'file' object

counter = 0
summer = 0.0

for line in fh:                   # str, "Haddad's,PA,239.50\n"

    line = line.rstrip()          # str, "Haddad's,PA,239.50"
    fieldlist = line.split(',')   # list, ["Haddad's", 'PA', '239.50']

    rev_val = fieldlist[2]        # str, '239.50'   (value from first line)
    f_rev = float(rev_val)        # float, 239.5

    counter = counter + 1
    summer = summer + f_rev

fh.close()

print(f'counter:  {counter}')     # 7 (number of lines in file)
print(f'summer:   {summer}')      # 662.01000001  (sum of all 3rd col values in file)



sidebar: writing and appending to files using the file object

Files can be opened for writing or appending; we use the file object and the file write() method.


fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()

fh = open('new_file.txt')
lines = fh.readlines()
print(lines)
  # ["here's a line of text\n",
  #  'I add the newlines explicitly if I want to write to the file\n']

fh.close()

Note that we are explicitly adding newlines to the end of each line. The write() method doesn't do this for us.




Optional: modules for accessing databases, CSV, SQL, JSON and the internet

Importing Python Modules

A module is Python code (a code library) that we can import and use in our own code -- to do specific types of tasks.


import csv           # make csv (a library module) part of our code

fh = open('thisfile.csv')
reader = csv.reader(fh)

for row in reader:
    print(row)

Once a module is imported, its Python code is made available to our code. We can then call specialized functions and use objects to accomplish specialized tasks. Python's module support is profound and extensive. Modules can do powerful things, like manipulate image or sound files, munge and process huge blocks of data, do statistical modeling and visualization (charts) and much, much, much more. The Python 3 Standard Library documentation can be found at https://docs.python.org/3/library/index.html Python 2 Standard Library: https://docs.python.org/2.7/library/index.html


CSV

The CSV module parses CSV files, splitting the lines for us. We read the CSV object in the same way we would a file object.


import csv
fh = open('students.txt', 'rb')  # second argument: default "read"
reader = csv.reader(fh)

next(fh)                  # skip one row (useful for header lines)

for record in reader:     # loop through each row
    print(f'id:{record[0]};  fname:{record[1]}; lname: {record[2]}')

fh.close()

This module takes into account more advanced CSV formatting, such as quotation marks (which are used to allow commas within data.) The second argument to open() ('rb') is sometimes necessary when the csv file comes from Excel, which output newlines in the Windows format (\r\n), and can confuse the csv reader.


Writing is similarly easy:

import csv
wfh = open('some.csv', 'w', newline='')
writer = csv.writer(wfh)
writer.writerow(['some', 'values', "boy, don't you like long field values?"])
writer.writerows([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']])
wfh.close()

Please be advised that you will not see writes to a file until you close the file with fh.close() or until the program ends execution. (newline='' is necessary when opening the write file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.)


sqlite3: local file-based relational database

An sqlite3 lightweight database instance is built into Python and accessible through SQL statements. It can act as a simple storage solution, or can be used to prototype database interactivity in your Python script and later be ported to a production database like MySQL, Postgres or Oracle.


Keep in mind that the interface to your relational fdatabase will be the same or similar to the one presented here with the file-based one.


import sqlite3
conn = sqlite3.connect('example.db')  # a db connection object

c = conn.cursor()                     # a cursor object for issuing queries

Once a cursor object is established, SQL can be used to write to or read from the database:

c.execute('''CREATE TABLE stocks
             (date text, trans text, symbol text, qty real, price real)''')

Note that sqlite3 datatypes are nonstandard and don't reflect types found in databases such as MySQL: INTEGER: all int types (TINYINT, BIGINT, INT, etc.) REAL: FLOAT, DOUBLE, REAL, etc. NUMERIC: DECIMAL, BOOLEAN, DATE, DATETIME, NUMERIC TEXT: CHAR, VARCHAR, etc. BLOB: BLOB (non-typed (binary) data, usually large)


Insert a row of data

c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

Larger example that inserts many records at a time

purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
             ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
             ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
            ]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)

Commit the changes -- this actually executes the insert

conn.commit()

Retrieve single row of data

t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)

tuple_row = c.fetchone()
print(tuple_row)               # (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)

Retrieve multiple rows of data

for tuple_row in c.execute('SELECT * FROM stocks ORDER BY price'):
    print(tuple_row)

### (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
### (u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
### (u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
### (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)

Close the database

conn.close()

Using the requests Module to Make an HTTP Browser Request

A Python program can take the place of a browser, requesting and downloading CSV, HTML pages and other files.


Your Python program can work like a web spider (for example visiting every page on a website looking for particular data or compiling data from the site), can visit a page repeatedly to see if it has changed, can visit a page once a day to compile information for that day, etc.


Basic Example: Download and Save Data

import requests

url = 'https://www.python.org/dev/peps/pep-0020/'   # the Zen of Python (PEP 20)

response = requests.get(url)     # a response object

text = response.text             # text of response


# writing the response to a local file -
# you can open this file in a browser to see it
wfh = open('pep_20.html', 'w')
wfh.write(text)
wfh.close()

More Complex Example: Send Headers, Parameters, Body; Receive Status, Headers, Body

import requests

url = 'http://davidbpython.com/cgi-bin/http_reflect'   # my reflection program

div_bar = '=' * 10


# headers, parameters and message data to be passed to request
header_dict =  { 'Accept': 'text/plain' }          # change to 'text/html' for an HTML response
param_dict =   { 'key1': 'val1', 'key2': 'val2' }
data_dict =    { 'text1': "We're all out of gouda." }


# a GET request (change to .post for a POST request)
response = requests.get(url, headers=header_dict,
                             params=param_dict,
                             data = data_dict)


response_status = response.status_code   # status of the response (OK, Not Found, etc.)

response_headers = response.headers      # headers sent by the server

response_text = response.text            # body sent by server


# outputting response elements (status, headers, body)

# response status
print(f'{div_bar} response status {div_bar}\n')
print(response_status)
print(); print()

# response headers
print(f'{div_bar} response headers {div_bar}\n')
for key in response_headers:
    print(f'{key}:  {response_headers[key]}\n')
print()

# response body
print(f'{div_bar} response body {div_bar}\n')
print(response_text)

Note that if import requests raises a ModuleNotFoundError exception, requests must be installed: Mac: open the Terminal program and issue this command: pip3 install requests Windows: open the Command Prompt program and issue the following command: pip install requests If you have any problems with these commands, please let me know!


Using requests to read CSV and JSON Data

Specific techniques for reading the most common data formats.


CSV: feed string response to .splitlines(), then to csv.reader:

import requests
import csv

url = 'path to csv file'

response = requests.get(url)
text = response.text

lines = text.splitlines()
reader = csv.reader(lines)

for row in reader:
    print(row)

JSON: requests accesses built-in support:

import requests

url = 'path to json file'

response = requests.get(url)

obj = response.json()

print(type(obj))          # <class 'dict'>

Alternative to requests: the urllib module

If the requests module cannot be installed, this module is part of the standard distribution.


urllib2 is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution. urllib is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution.


The urlopen method takes a url and returns a file-like object that can be read() as a file:

import urllib.request
my_url = 'http://www.yahoo.com'
readobj = urllib.request.urlopen(my_url)  # return a 'file-like' object
text = readobj.read()                     # read into a 'byte string'
# text = text.decode('utf-8')             # optional, sometimes required:
                                          # decode as a 'str' (see below)
readobj.close()

Alternatively, you can call readlines() on the object (keep in mind that many objects that can deliver file-like string output can be read with this same-named method):

for line in readobj.readlines():
  print(line)
readobj.close()

Parsing CSV Files Downloaded CSV files should be parsed with the CSV module, as CSV can be more complex than just comma separators.


The csv.reader() function usually requires a file object, but we can also pass a list of lines to it:

readobj = urllib.request.urlopen(my_url, context=ctx)   # file
text = readobj.read()                                   # bytes, entire download
text = text.decode('utf-8')                             # str, entire download
lines = text.splitlines()                               # list of str (lines)

reader = csv.reader(lines)

for row in reader:
    print(row)

For discussion of potential issues with using urllib, please see the unit titled "Supplementary Modules: CSV, SQL, JSON and the Internet". POTENTIAL ERRORS AND REMEDIES WITH urllib


TypeError mentioning 'bytes' -- sample exception messages:

TypeError: can't use a string pattern on a bytes-like object
TypeError: must be str, not bytes
TypeError: can't concat bytes to str

These errors indicate that you tried to use a byte string where a str is appropriate.


The urlopen() response usually comes to us as a special object called a byte string. In order to work with the response as a string, we can use the decode() method to convert it into a string with an encoding.

text = text.decode('utf-8')

'utf-8' is the most common encoding, although others ('ascii', 'utf-16', 'utf-32' and more) may be required. I have found that we do not always need to convert (depending on what you will be doing with the returned string) which is why I commented out the line in the first example. SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).


import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
readobj = urllib.request.urlopen(my_url, context=ctx)

Encoding Parameters: urllib.requests.urlencode()

When including parameters in our requests, we must encode them into our request URL. The urlencode() method does this nicely:


import urllib.request, urllib.parse

params = urllib.parse.urlencode({'choice1': 'spam and eggs',
                                 'choice2': 'spam, spam, bacon and spam'})
print("encoded query string: ", params)

this prints:

encoded query string:
choice1=spam+and+eggs&choice2=spam%2C+spam%2C+bacon+and+spam



Filepaths for Locating Files

Locating Files with Filepaths

Filepaths pinpoint the location of any file.


Your computer's filesystem contains files and folders, arranged in a tree (folders and files within folders within other folders, etc.) In this session we'll look at how we can open files anywhere on the filesystem tree. Here's a sample tree for us to work with, containing both files (ending in .txt) and python scripts (ending in .py). (This tree and files are replicated in your data folder for this session.)

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py


When our script is located in the same directory as a file we want to open, we can give Python the name of the file, and it will find it in this same directory.

""" test2b.py:  open and read a file """

fh = open('file2b.txt')   # OS looks for file in present working directory
print(fh.read())          # this is file 2b - note that it is in same directory as script

This works because test2b.py and file2b.txt are in the same directory.


However, if our script is in a different location from the file we want to open, we have a problem -- the OS won't be able to find the file.

""" test3a.py:  open and read a file """

fh = open('file2b.txt')    # raises a FileNotFoundError exception
                           # (OS looks for file in the pwd (dir3a)
                           # but doesn't find it)

The file exists, but it is in a different directory. The OS can't find the file because it needs to be told in which directory it should look for the file. So, if we are running our script from a different location than the file we wish to open, we must use a relative path or an absolute path to show the OS where the file is located.


Relative vs. Absolute Paths

There are two different ways of expressing a file's location.


Again, let's use the sample tree that can be found in your session folder:

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py

Absolute path: this is one that locates a file from the root of the filesystem. It lists each of the directories that lead from the root to the directory that holds the file.


In Windows, absolute paths begin with a drive letter, usually C:\:

""" test3a.py:  open and read a file """

filepath = r'C:\Users\david\Downloads\python_data\session_03_strings_lists_files\dir1\dir2b\file2b.txt'
fh = open(filepath)

print(fh.read())

(Note that r'' should be used with any Windows paths that contain backslashes.)


On the Mac, absolute paths begin with a forward slash:

""" test3a.py:  open and read a file """

filepath = '/Users/david/Downloads/python_data/session_03_strings_lists_files/dir1/dir2b/file2b.txt'
fh = open(filepath)

print(fh.read())

(The above paths assume that the python_data folder is in the Downloads directory; your may have placed yours elsewhere on your system. Of course, the above paths also assume that my home directory is called david/; yours is likely different.) Relative Path: locate a file folder in relation to the present working directory


A relative path is read as an extension of the present working directory. The below path assumes that our present working directory is /Users/david/Downloads/python_data/dir1/dir2a:

""" test2a.py:  open and read a file """

filepath = 'dir3a/dir4/file4.txt'   # starts from /Users/david/Downloads/python_data/dir1/dir2a
fh = open(filepath)

print(fh.read())

When we use a relative path, we can think of it as extending the pwd. So the whole path is: /Users/david/Downloads/python_data/dir1/dir2a/dir3a/dir4/file4.txt Therefore, in order to use a relative path, you must first ascertain your present working directory in the filesystem. Only then can you know the relative path needed to find the file you are looking for. Special Note: Windows paths and the "raw string" Note that Windows paths featuring backslashes should use r'' ("raw string"), in which a backslash is not seen as an escape sequence such as \n (newline). This is not required on Macs or on paths without backslashes. For simplicity, you can substitute forward slashes in Windows paths, and Python will translate the slashes for you. Using forward slashes is probably the easiest way to work with Windows paths in Python.


Locating a File in a Parent Directory

We use .. to signify the parent; this can be used in a relative filepath.


Again, let's use the sample tree that is replicated in your session folder:

dir1
├── file1.txt
├── test1.py
│
├── dir2a
│   ├── file2a.txt
│   ├── test2a.py
│   │
│   ├── dir3a
│   │   ├── file3a.txt
│   │   ├── test3a.py
│   │   │
│   │   └── dir4
│   │       ├── file4.txt
│   │       └── test4.py
└── dir2b
    ├── file2b.txt
    ├── test2b.py
    │
    └── dir3b
       ├── file3b.txt
       └── test3b.py

What if we are in dir3a running file test3a.py but want to access file file1.txt?


Think of .. (two dots) as representing the parent directory:

""" test3a.py:  open and read a file """

filepath = '../../file1.txt'  # reads from /Users/david/Downloads/python_data/dir1
fh = open(filepath)

print(fh.read())

As with all relative paths, you must first consider the location from which we are running the script, then the location of the file you're trying to open. If we are in the dir3 directory when we run test3a.py, then we are two directories "below" the dir1 directory. The first .. takes us to the dir2 directory. The second .. takes us to the dir1 directory. We can then access the file1.txt directory from there. Going up, then down What if we wanted to go from dir2a to dir2b? They are at the same level, in other words they are neither above or below each other.


The answer is to go up to the parent, then down to the other child:

""" test2a.py:  open and read a file """

filepath = '../dir2b/file2b.txt'
fh = open(filepath)

print(fh.read())

.. takes us to the dir1 directory. dir2b can be accessed from that directory.




Containers: More List Operations

using containers to collect data

Containers are Python objects that can contain other objects.



containers allow for manipulation and analysis

Once collected, values in a container can be sorted or filtered (i.e. selected) according to whatever rules we choose. A collection of numeric values offers many new opportunities for analysis:


A collection of string values allows us to perform text analysis:


container objects: list, set, tuple

Compare and contrast the characteristics of each container.


mylist =  ['a', 'b', 'c', 'd', 1, 2, 3]

mytuple = ('a', 'b', 'c', 'd', 1, 2, 3)

myset =   {'a', 'b', 'c', 'd', 1, 2, 3}

mydict =  {'a': 1, 'b': 2, 'c': 3, 'd': 4}



review: the list container object

A list is an ordered sequence of values.


var = []                     # initialize an empty list

var2 = [1, 2, 3, 'a', 'b']   # initialize a list of values


review: subscripting a list

Subscripting allows us to read individual items from a list.


mylist = [1, 2, 3, 'a', 'b']       # initialize a list of values

xx = mylist[2]                     # 3

yy = mylist[-1]                    # 'b'


review: slicing a list

Slicing a list returns a new list.


var2 = [1, 2, 3, 'a', 'b']            # initialize a list of values

sublist1 = var2[0:3]                  # [1, 2, 3]

sublist2 = var2[2:4]                  # [3, 'a']

sublist3 = var2[3:]                   # ['a', 'b']

Remember the rules of slicing, similar to strings:


finding an item within a list

The 'in' operator works with lists similar to how it works with strings.


mylist = [1, 2, 3, 'a', 'b']

if 'b' in mylist:                        # this is True for mylist
    print("'b' can be found in mylist")

print('b' in mylist)                      # "True":  the 'in' operator
                                         # actually returns True or False

summary functions: len(), sum(), max(), min()

Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?


mylist = [1, 3, 5, 7, 9]        # initialize a list

print(len(mylist))               # 5 (count of items)
print(sum(mylist))               # 25 (sum of values)
print(min(mylist))               # 1 (smallest value)
print(max(mylist))               # 9 (largest value)

sorting a list

sorted() returns a new list of sorted values.


mylist = [4, 9, 1.2, -5, 200, 20]

smyl = sorted(mylist)                     # [-5, 1.2, 4, 9, 20, 200]

concatenating two lists with +

Concatenation works in the same way as strings.


var = ['a', 'b', 'c']
var2 = ['d', 'e', 'f']

var3 = var + var2                  # ['a', 'b', 'c', 'd', 'e', 'f']

adding (appending) an item to a list

var = []

var.append(4)                # Note well! call is not assigned
var.append(5.5)              # list is changed in-place

print(var)                                # [4, 5.5]

It is the nature of a list to hold these items in order as they were added.


the AttributeError exception

An AttributeError exception occurs when calling a method on an object type that doesn't support that method.


mylines = ['line1\n', 'line2\n', 'line3\n']

mylines = mylines.rstrip()         # AttributeError:
                                   # 'list' object has no attribute 'rstrip'


the AttributeError when using .append()

This exception may sometimes result from a misuse of the append() method, which returns None.


mylist = ['a', 'b', 'c']

# oops:  returns None -- call to append() should not be assigned
mylist = mylist.append('d')

mylist = mylist.append('e')        # AttributeError:  'NoneType'
                                   # object has no attribute 'append'


avoiding the incorrect use of .append()

mylist = ['a', 'b', 'c']

mylist.append('d')                 # now mylist equals ['a', 'b', 'c', 'd']


sidebar: removing a container element

There are a number of additional list methods to manipulate a list, though they are less often used.


mylist = ['a', 'hello', 5, 9]

popped = mylist.pop(0)         # str, 'a'
                               # (argument specifies the index  of the item to remove)

mylist.remove(5)               # remove an element by value

print(mylist)               # ['hello', 9]

mylist.insert(0, 10)

print(mylist)               # [10, 'hello', 9]



Containers: Tuples and Sets

tuples and sets: like lists but different

It's helpful to contrast these containers and lists.



It's easy to remember how to use one of these containers by considering how they differ in behavior.


the tuple container object

A tuple is an immutable ordered sequence of values.


var2 = (1, 2, 3, 'a', 'b')   # initialize a tuple of values


subscripting a tuple

Subscripting allows us to read individual items from a tuple.


mytuple = (1, 2, 3, 'a', 'b')       # initialize a tuple of values

xx = mytuple[3]                     # 'a'

Note that indexing starts at 0, so index 1 is the 2nd item, index 2 is the 3rd item, etc.


slicing a tuple

Slicing a tuple returns a new tuple.


var2 = (1, 2, 3, 'a', 'b')             # initialize a tuple of values

subtuple1 = var2[0:3]                  # (1, 2, 3)

subtuple2 = var2[2:4]                  # (3, 'a')

subtuple3 = var2[3:]                   # ('a', 'b')

Remember the rules of slicing, same as lists and strings:


concatenating two tuples with +

Concatenation works in the same way as lists and strings.


var = ('a', 'b', 'c')
var2 = ('d', 'e', 'f')

var3 = var + var2                  # ('a', 'b', 'c', 'd', 'e', 'f')

"set" container object

A set is an unordered, unique collection of values.


Initialize a Set

myset = set()                  # initialize an empty set (note empty curly
                               # are reserved for dicts)

myset = {'a', 9999, 4.3, 'a'}  # initialize a set with elements

print(myset)                   # {9999, 4.3, 'a'}


adding an item to a set

myset = set()                  # initialize an empty set

myset.add(4.3)                 # note well method call not assigned
myset.add('a')

print(myset)                   # {'a', 4.3}    (order is not
                               #                necessarily maintained)

getting information about a set or tuple

# Get Length of a set or tuple (compare to len() of a list or string)
myset = {1, 2, 3, 'a', 'b'}

yy = len(myset)              # 5 (# of elements in myset)


# Test for membership in a set or tuple
mytuple = (1, 2, 3, 'a', 'b')

if 'b' in mytuple:                        # this is True for mytuple
    print("'b' can be found in mytuple")

print('b' in mytuple)                      # "True":  the 'in' operator
                                          # actually returns True or False


looping through a set or tuple

The 'for' loop allows us to traverse a set or tuple and work with each item.


mytuple = (1, 2, 3, 'a', 'b')            # could also be a set here

for var in mytuple:
    print(var)                            # prints 1, then 2, then 3,
                                         # then a, then b


summary functions: len(), sum(), max(), min()

Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?


Whether a set or tuple, these operations work in the same way.


mytuple = (1, 3, 5, 7, 9)       # initialize a tuple
myset =   {1, 3, 5, 7, 9}       # initialize a set

print(len(mytuple))              # 5  (count of items)
print(sum(myset))                # 25 (sum of values)
print(min(myset))                # 1 (smallest value)
print(max(mytuple))              # 9 (largest value)

sorting a set or tuple

Regardless of type, sorted() returns a list of sorted values.


mytuple = (4, 9, 1.2, -5, 200, 20)       # could also be a set here

smyl = sorted(mytuple)                   # [-5, 1.2, 4, 9, 20, 200]




Building Up Containers from File

introduction: building up containers from file

This technique forms the core of much of what we do.


In order to work with data, the usual steps are:


We call this process Extract-Transform-Load, or ETL. ETL is at the heart of what core Python does best.


looping through a data source and building up a list

This "summary algorithm" is very similar to building a float sum from a file source.


build a list of company names

company_list = []                             # initialize an empty list
fh = open('revenue.csv')                      # 'file' object

for line in fh:                               # str, 'Haddad's,PA,239.50'

    elements = line.split(':')                # list, ["Haddad's", 'PA', '239.50']
    company_list.append(elements[0])          # add the name for this row
                                              # to company_list

print(company_list)       # list, ["Haddad's", 'Westfield', 'The Store', "Hipster's",
                          #        'Dothraki Fashions', "Awful's", 'The Clothiers']

fh.close()

Just as we did when counting lines of a file or summing up values, we can use a 'for' loop over a file to collect values.


looping through a data source and building up a unique set

This "summary algorithm" uses a set collect unique items from repeating data.


state_set = set()                       # initialize an empty list
fh = open('revenue.csv')                # 'file' object

for line in fh:                         # str, 'Haddad's,PA,239.50'

    elements = line.split(':')          # list, ["Haddad's", 'PA', '239.50']
    state_set.add(elements[1])          # add the state for this row
                                        # to state_set

print(state_set)       # set, {'PA', 'NY', 'NJ'}   (your order may be different)

chosen_state = input('enter a state:  ')

if chosen_state in state_set:
   print('that state was found in the file')
else:
    print('that state was not found')

fh.close()


treating a file as a list

Data files can be rendered as lists of lines, and slicing can manipulate them holistically rather than by using a counter.


fh = open('student_db.txt')
file_lines_list = fh.readlines()          # a list of lines in the file
print(file_lines_list)
      # [ "id:address:city:state:zip",
      #   "jk43:23 Marfield Lane:Plainview:NY:10023",
      #   "ZXE99:315 W. 115th Street, Apt. 11B:New York:NY:10027",
      #   "jab44:23 Rivington Street, Apt. 3R:New York:NY:10002" ... (list continues) ]

wanted_lines = file_lines_list[1:]        # take all but 1st element
                                          # (i.e., 1st line)
for line in wanted_lines:
    print(line.rstrip())                   # jk43:23 Marfield Lane:
                                          # Plainview:NY:10023

                                          # axe99:315 W. 115th Street,
                                          # Apt. 11B:New York:NY:10027

                                          # jab44:23 Rivington Street,
                                          # Apt. 3R:New York:NY:10002

                                          # etc.
fh.close()


slicing and dicing a file: the line, word, character count (1/3)

Once we have read a file as a single string, we can "chop it up" any way we like.


# read(): file text as a single strings
fh = open('guido.txt')          # 'file' object
text = fh.read()                # read() method called on
                                # file object returns a string

fh.close()                      # close the file

print(text)
print(len(text))                 # 207 (number of characters in the file)

    # single string, entire text:

    # 'For three months I did my day job, \nand at night and
    #  whenever I got a \nchance I kept working on Python.  \n
    #  After three months I was to the \npoint where I could
    #  tell people, \n"Look here, this is what I built."'


slicing and dicing a file: splitting a string into words (2/3)

String .split() on a whole file string returns a list of words.


file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built." """

words = file_text.split()      # split entire file on whitespace (spaces or newlines)

print(words)
    # ['For', 'three', 'months', 'I', 'did', 'my', 'day', 'job,',
    #  'and', 'at', 'night', 'and', 'whenever', 'I', 'got', 'a',
    #  'chance', 'I', 'kept', 'working', 'on', 'Python.', 'After',
    #  'three', 'months', 'I', 'was', 'to', 'the', 'point', 'where',
    #  'I', 'could', 'tell', 'people,', '“Look', 'here,', 'this',
    #  'is', 'what', 'I', 'built.”']

print(len(words))       # 42 (number of words in the file)


slicing and dicing a file: the line, word, character count (3/3)

String .splitlines() will split any string on the newlines, delivering a list of lines from the file.


file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built."" """

lines = file_text.splitlines()

print(lines)

    # ['For three months I did my day job, ', 'and at night and whenever I got a ',
    #  'chance I kept working on Python.  ', 'After three months I was to the ',
    #  'point where I could tell people, ', '“Look here, this is what I built.”']

print(len(lines))          # 6 (number of lines in the file)


Summary: 3 ways to read strings from a file

for: read (newline ('\n') marks the end of a line)

fh = open('students.txt')        # file object allows looping
                                 # through a series of strings
for my_file_line in fh:          # my_file_line is a string
    print(my_file_line)           # prints each line of students.txt

fh.close()                       # close the file

read(): read entire file as a single string

fh = open('students.txt')  # file object allows reading
text = fh.read()                 # read() method called on file
                                 # object returns a string
fh.close()                       # close the file

print(text)                       # entire text as a single string

readlines(): read as a list of strings (each string a line)

fh = open('students.txt')
file_lines = fh.readlines()      # file.readlines() returns
                                 # a list of strings
fh.close()                       # close the file

print(file_lines)                 # entire text as a list of lines


sidebar: writing to a file

We don't have call to write to a file in this course, but it's important to know how


wfh = open('newfile.txt', 'w')    # open for writing
                                  # (will overwrite an existing file)

wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')

wfh.close()


sidebar: the range() function

This function allows us to iterate over an integer sequence.


counter = range(10)
for i in counter:
    print(i)                        # prints integers 0 through 9

for i in range(3, 8):               # prints integers 3 through 7
    print(i)

If we need an literal list of integers, we can simply pass the iterable to a list:

intlist = list(range(5))
print(intlist)                      # [0, 1, 2, 3, 4]



Dictionaries: Lookup Tables

dictionaries

A dictionary (or dict) is a collection of unique key/value pairs of objects.


mydict = {}                      # empty dict

mydict = {'a':1, 'b':2, 'c':3}   # dict with str keys and int values

print(mydict['a'])               # look up 'a' to get 1


example uses: dictionaries

Pairs describe data relationships that we often want to consider:


You yourself may consider data in pairs, even in your personal life:


types of dictionaries

There are a few main ways dictionaries are used:


initialize a dict

Dicts are marked by curly braces. Keys and values are separated with colons.


initialize a dict

mydict = {}                        # empty dict

mydict = {'a':1, 'b':2, 'c':3}     # dict with str keys and int values

add a key/value pair to a dict

We use subscript syntax to assign a value to a key.


mydict = {'a':1, 'b':2, 'c':3}

mydict['d'] = 4                 # setting a new key and value

print(mydict)                   # {'a': 1, 'c': 3, 'b': 2, 'd': 4}

retrieve a value from a dict using a key

We also use subscript syntax to retrieve a value.


mydict = {'a':1, 'b':2, 'c':3, 'd': 4}

dval = mydict['d']                 # value for 'd' is 4

xxx = mydict['c']                  # value for 'c' is 3

You might notice that this subscripting is very close in syntax to list subscripting. The only difference is that instead of an integer index we are using the dict key (most often a string).


the KeyError exception

This exception is raised when we request a key that does not exist in the dict.


mydict = {'a': 1, 'b': 2, 'c': 3}

val = mydict['d']       # KeyError:  'd'

Like the IndexError exception, which is raised if we ask for a list item that doesn't exist, KeyError is raised if we ask for a dict key that doesn't exist.


check for key membership

If we're not sure whether a key is in the dict, before we subscript we can check to confirm.


mydict = {'a': 1, 'b': 2, 'c': 3}

if 'a' in mydict:
    print("'a' is a key in mydict")



Dictionaries: Rankings

dictionary rankings

Dictionaries can be sorted by value to produce a ranking.



loop through dict keys and values

We loop through keys and then use subscripting to get values.


mydict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

for key in mydict:         # a
    val =  mydict[key]
    print(key)             # a
    print(val)             # 1
    print()
                           # b
                           # 2

                           # (etc.)

Note that plain 'for' looping over a dict delivers the keys:

for key in mydict:
    print(key)             # prints a, then b, then c...

review: sorting any container with sorted()

With any container or iterable (list, tuple, file), sorted() returns a list of sorted elements.


namelist = ['jo', 'pete', 'michael', 'zeb', 'avram']

slist = sorted(namelist)          # ['avram', 'jo', 'michael', 'pete', 'zeb']

Remember that no matter what container is passed to sorted(), the function returns a list. Also remember that the reverse=True argument to sorted() can be used to sort the items in reverse order.


sorting a dict (sorting its keys)

sorted() returns a sorted list of a dict's keys.


bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'alice': 184}

sorted_keys = sorted(bowling_scores)

print(sorted_keys)     # [ 'alice', 'jeb', 'mike', 'zeb' ]

for key in sorted_keys:
    print(f'{key}={bowling_scores[key]}')

sorting a dictionary's keys by its values

A special "sort criteria" argument can cause Python to sort a dict's keys by its values.


bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'alice': 184}

sorted_keys = sorted(bowling_scores, key=bowling_scores.get)

print(sorted_keys)                 # ['zeb', 'jeb', 'alice', 'mike']

for player in sorted_keys:
    print(f"{player} scored {bowling_scores[player]}")

        ##  zeb scored 98
        ##  jeb scored 123
        ##  alice scored 184
        ##  mike scored 202

The key= argument allows us to specify an alternate criteria by which we might sort the keys. The .get() method takes a key and returns a value from the dict, which is what we are asking sorted() to do with each key when sorting by value. However, this complex sorting is more advanced a topic than we cabn cover here.


assign multiple values to individual variables

multi-target assignment performs the assignments in one statement


csv_line = "Haddad's,PA,239.50"

row = csv_line.split(',')        # ["Haddad's", 'PA', '239.50']

codata = ["Haddad's", 'PA', '239.50']

company, state, revenue = codata

print(company)       # "Haddad's"
print(revenue)       # 239.50

csv_line = 'jk43:23 Marfield Ln.:Plainview:NY:10024'

stuid, street, city, state, zip = csv_line.split(':')

print(stuid)      # 'jk43'
print(city)       # 'Plainview'

build up a dict from two fields in a file

As with all containers, we loop through a data source, select and add to a dict.


ids_names = {}                 # initialize an
                               # empty dict

fh = open('student_db.txt')
for line in fh:
    stuid, street, city, state, zip = line.split(':')

    ids_names[stuid] = state   # key id is paired to
                               # student's state


print("here is the state for student 'jb29':  ")
print(ids_names['jb29'])        #  NJ

fh.close()



Dictionaries: Aggregations

dict aggregations

A "counting" or "summing" dictionary answers the question "how many of each" or "how much of each".


Aggregations may answer the following questions:


The dict is used to store this information. Each unique key in the dict will be associated with a count or a sum, depending on how many we found in the data source or the sum of values associated with each key in the data source.


building a counting dict

A "counting" dict increments the value associated with each key, and adds keys as new ones are found.


Customarily we loop through data, using the dictionary to keep a tally as we encounter items.


state_count = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')       # ["Haddad's", 'PA', '239.50']
    state = items[1]              # str, 'PA'

    if state not in state_count:
        state_count[state] = 0

    state_count[state] = state_count[state] + 1


print(state_count)                # {'PA': 2, 'NJ': 2, 'NY': 3}

print("here is the count of states from revenue.csv:  ")
for state in state_count:
    print(f"{state}:  {state_count[state]} occurrences")

print("here is the count for 'NY':  ")
print(state_count['NY'])                   # 3

fh.close()

building a summing dict

A "summing" dict sums the value associated with each key, and adds keys as new ones are found.


As with a counting dict, we loop through data, using the dictionary to keep a tally as we encounter items.


state_sum = {}                     # initialize an empty dict

fh = open('revenue.csv')

for line in fh:

    items = line.split(',')          # ["Haddad's", 'PA', '239.50']
    state = items[1]                 # str, 'PA'
    value = float(items[2])          # float, 239.5

    if state not in state_sum:
        state_sum[state] = 0

    state_sum[state] = state_sum    [state] + value


print(state_sum)      # {'PA': 263.45, 'NJ': 265.4, 'NY': 133.16}

print("here is the sum for 'NY':  ")
print(state_sum['NY'])                 # 133.16

fh.close()

dictionary size with len()

len() counts the pairs in a dict.


mydict = {'a': 1, 'b': 2, 'c': 3}

print(len(mydict))                 # 3 (number of keys in dict)

sidebar: dict .get() method

This method may be used to retrieve a value without checking the dict to see if the key exists.


mydict = {'a': 1, 'b': 2, 'c': 3}

xx = mydict.get('a', 0)          # 1 (key exists so paired value is returned)

yy = mydict.get('zzz', 0)        # 0 (key does not exist so the
                                 #    default value is returned)

You may use any value as the default. This method is sometimes used as an alternative to testing for a key in a dict before reading it -- avoiding the KeyError exception that occurs when trying to read a nonexistent key.


sidebar: obtaining keys of a dict

The .keys() method gives access to the keys in a dict.


mydict = {'a': 1, 'b': 2, 'c': 3}

these_keys = mydict.keys()

for key in these_keys:
    print(key)

print(list(these_keys))            # ['a', 'c', 'b']

sidebar: obtaining values of a dict

The .values() method gives views on the dict.


mydict = {'a': 1, 'b': 2, 'c': 3}

values = list(mydict.values())     # [1, 2, 3]

if 'c' in mydict.values():
    print("'c' was found")

for value in mydict.values():
    print(value)

The values cannot be used to get the keys - it's a one-way lookup from the keys. However, we might want to check for membership in the values, or sort or sum the values, or some other less-used approach.


sidebar: using the dict .items() method

.items() gives key/value pairs as 2-item tuples.


mydict = {'a': 1, 'b': 2, 'c': 3}

print(list(mydict.items()))         # [('a', 1), ('c', 3), ('b', 2)]

for key, value in mydict.items():
    print(key, value)               # a 1
                                    # b 2
                                    # c 3

.items() is usually used as another approach for looping through a dict. With each iteration for 'for', or each item when converted to a list, we see a 2-item tuple. The first item is a key, and the second a value. When looping with 'for', since each iteration produces a 2-item (key/value) tuple, we can assign the key and value to variable names and use them immediately, rather than resorting to subscripting. This is usually easier and it is also more efficient.


sidebar: working with dict items()

dict items() can give us a list of 2-item tuples. dict() can convert this list back to a dictionary.


mydict = {'a': 1, 'b': 2, 'c': 3}
these_items = list(mydict.items())    # [('a', 1), ('c', 3), ('b', 2)]

newdict = dict(these_items)

print(newdict)                        # {'a': 1, 'b': 2, 'c': 3}

2-item tuples can be sorted and sliced, so they are a handy alternate structure.


sidebar: converting parallel lists to tuples

zip() zips up parallel lists into tuples; dict() can convert this to dict.


list1 = ['a', 'b', 'c', 'd']
list2 = [ 1,   2,   3,   4 ]

tupes = list(zip(list1, list2))

print(tupes)          # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print(dict(tupes))    # {'a': 1,    'b': 2,   'c': 3,   'd': 4}

Occasionally we are faced with two lists that relate to each other one a 1-to-1 basis... or, we sometimes even shape our data into this form. Paralell lists like these can be zipped into multi-item tuples.




Exception Trapping

exception trapping: handling errors after they occur

Introduction: unanticipated vs. anticipated errors


Think of errors as being of two general kinds -- unanticipated and anticipated:


Exampls of anticipated errors:


KeyError: when a dictionary key cannot be found.

If the user enters a key that is not in the dict, we can expect this error.


mydict = {'1972': 3.08, '1973': 1.01, '1974': -1.09}

uin = input('please enter a year: ')         # user enters 2116

print(f'mktrf for {uin} is {mydict[uin]}')

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 5, in <module>
  #      print(f'mktrf for {uin} is {mydict[uin]}')
  #                                  ~~~~~~^^^^^
  #  KeyError: '9999'

ValueError: when the wrong value is used with a function or statement.

If we ask the user for a number, but anticipate they might not give us one.


uin = input('please enter an integer:  ')

intval = int(uin)                           # user enters 'hello'

print('{uin} doubled is {intval*2}')

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 3, in <module>
  #      intval = int(uin)                           # user enters 'hello'
  #               ^^^^^^^^
  #  ValueError: invalid literal for int() with base 10: 'hello'

FileNotFoundError: when a file can't be found.

If we attempt to open a file but it has been moved or deleted.


filename = 'thisfile.txt'

fh = open(filename)

  #  Traceback (most recent call last):
  #    File "/Users/david/test.py", line 3, in <module>
  #      fh = open(filename)
  #           ^^^^^^^^^^^^^^
  #  FileNotFoundError: [Errno 2] No such file or directory: 'thisfile.txt'

handling errors approach: "asking for permission"

Up to now we have managed anticipated errors by testing to make sure an action will be succesful.


Examples of testing for anticipated errors:


So far we have been dealing with anticipated errors by checking first -- for example, using .isdigit() to make sure a user's input is all digits before converting to int().
However, there is an alternative to "asking for permission": begging for forgiveness.


handling errors approach: "begging for forgiveness"

the try block and except block


try:
    uin = input('please enter an integer:  ')   # user enters 'hello'
    intval = int(uin)                           # int() raises a ValueError
                                                # ('hello' is not a valid value)

    print('{uin} doubled is {intval*2}')

except ValueError:
    exit('sorry, I needed an int')   # the except block cancels the
                                     # ValueError and takes action


the procedure for setting up exception handling

It's important to witness the exception and where it occurs before attempting to trap it.


It's strongly recommended that you follow a specific procedure in order to trap an exception:

  1. allow the exception to occur
  2. note the exception type and line number where it occurs
  3. wrap the line that caused the error in a try: block
  4. wrap statements you would like to be executed if the error occurs in an except: block
  5. test that when the exception is raised, the except block is executed
  6. test that when the exception is not raised, the except block is not executed

  7. trapping multiple exceptions

    Multiple exceptions can be trapped using a tuple of exception types.


    companies = ['Alpha', 'Beta', 'Gamma']
    
    user_index = input('please enter a ranking:  ')   # user enters '4' or 'hello'
    
    try:
        list_idx = int(user_index) - 1
    
        print(f'company at ranking {user_index} is {companies[list_idx]}')
    
    except (ValueError, IndexError):
        exit(f'max index is {len(companies) - 1}')
    

    Here we trap two anticipated errors: if the user types a non-number and a ValueError exception is raised, or an invalid list index and an IndexError is raised, the except: block will be executed.


    chaining except: blocks

    The same try: block can be followed by multiple except: blocks, which we can use to specialize our response to the exception type.


    companies = ['Alpha', 'Beta', 'Gamma']
    
    user_index = input('please enter a ranking:  ')   # user enters '4'
    
    try:
        list_idx = int(user_index) - 1
    
        print(f'company at ranking {user_index} is {companies[list_idx]}')
    
    except ValueError:
        exit('please enter a numeric ranking')
    
    except IndexError:
        exit(f'max index is {len(companies) - 1}')
    

    The exception raised will be matched against each type, and the first one found will excecute its block.


    avoiding except: and except exception:

    When we don't specify an exception, Python will trap any exception. This is a bad practice.


    ui = input('please enter a number: ')
    
    try:
        fval = float(ui)
    except:                  # AVOID!!  Should be 'except ValueError:'
        exit('please enter a number - thank you')
    

    However, this is a bad practice. Why?

    1. except: or except Exception: can trap any type of error, so an unexpected error could go undetected
    2. except: or except Exception: does not specify which type of exception was expected, so it is less clear to the reader


    There are certain limited circumstances under which we might use except: by itself, or except Exception. These might include wrapping the whole program execution in a try: block and trapping any exception that is raised so the error can be logged and the program doesn't need to exit as a result.




    Command Line: Moving Around and Executing a Script

    The Command Line

    The Command Line (also known as "Command Prompt" or "Terminal Prompt") gives us access to the Operating System's files and programs.


    Before the graphical user interface was invented, programmers used a text-based interface called the command line to run programs and read and write files. Programmers still make heavy use of the command line because it provides a much more efficient way to communicate with the operating system than Windows File Explorer or Mac Finder. It is the "power user's" way of talking to the OS, and it should be considered essential for anyone wanting to develop their programming skills. To reach the command line, you must search for and open one of these programs:


    On Windows -- search for Command Prompt:

    Microsoft Windows [Version 10.0.18363.1016]          # these 2 lines may look different
    (c) 2019 Microsoft Corporation. All rights reserved.
    
    C:\Users\david>                       < -- command line

     


    On Mac -- search for Terminal:

    Last login: Thu Sep  3 13:46:14 on ttys001
    
    Davids-MBP-3:~ david %                 < -- command line

    Your command line will look similar to those shown above, but will have different names and directory paths (for example, your username instead of 'david'). Your prompt may also feature a dollar sign (%) instead of a percent sign. After opening the command line program on your computer, note the blinking cursor: this is the OS awaiting your next command.


    The Present Working Directory (pwd)

    Your command line session works from one directory location at a time.


    When you first launch the command line program, you are placed at a specific directory within your filesystem. We call this the "present working directory". You may "move around" the system, and when you do, your pwd will change. By default, your initial pwd is your home directory -- the directory at which all your individual files are stored. This directory is usually named after your username, and can be found at /Users/[username] or C:\Users\[username]. On Windows: Your present working directory is always displayed as the command prompt.


    C:\Users\david>


    On Mac: Your present working directory can be shown by using the pwd command:


    Davids-MBP-3:~ david % pwd
    /Users/david

    As we move around the filesystem, we will see the present working directory change. You must always be mindful of the pwd as it is your current location and it will affect how you can access other files and programs in the filesystem.


    Listing files in the present working directory: 'ls' or 'dir'

    We can list out the contents (files and folders) of any directory.


    On Mac, use the 'ls' command to see the files and folders in the present working directory:

    Davids-MBP-3:~ david % ls
    
    Applications
    Desktop
    Documents
    Downloads
    Dropbox
    Library
    Movies
    Music
    Public
    PycharmProjects
    Sites
    archive
    ascii_test.py
    requests_demo.py
    static.zip



    On Windows, use the 'dir' command to see the files and folders in the present working directory:

    C:\Users\david> dir
    
     Volume Serial Number is 0246-9FF7
    
     Directory of C:\Users\david
    
    08/29/2020  11:37 AM    <DIR>          .
    08/29/2020  11:37 AM    <DIR>          ..
    05/29/2020  06:27 PM    <DIR>          .astropy
    05/29/2020  06:35 PM    <DIR>          .config
    05/29/2020  06:36 PM    <DIR>          .matplotlib
    08/07/2020  10:33 AM             1,460 .python_history
    08/29/2020  11:28 AM    <DIR>          3D Objects
    08/29/2020  11:28 AM    <DIR>          Contacts
    08/29/2020  12:50 PM    <DIR>          Desktop
    08/29/2020  11:28 AM    <DIR>          Documents
    09/02/2020  10:25 AM    <DIR>          Downloads
    08/29/2020  11:28 AM    <DIR>          Favorites
    08/29/2020  11:28 AM    <DIR>          Links
    08/29/2020  11:28 AM    <DIR>          Music
    08/29/2020  11:29 AM    <DIR>          OneDrive
    08/29/2020  11:28 AM    <DIR>          Pictures
    08/29/2020  12:46 PM    <DIR>          PycharmProjects
    08/29/2020  11:28 AM    <DIR>          Saved Games
    08/29/2020  11:28 AM    <DIR>          Searches
    08/29/2020  11:28 AM    <DIR>          Videos
                   1 File(s)          1,460 bytes
                  20 Dir(s)   7,049,539,584 bytes free

    Moving Around the Directory Tree With 'cd'

    The 'change directory' command moves us 'up' or 'down' the tree.


    To move around the filesystem (i.e. to change the present working directory), we use the cd ("change directory") command. In the examples below, note how the present working directory changes after we move. [Please note: in the paths below you'll see that my class project directory python_data_ipy/ is in my Downloads/ directory (i.e., at /Users/david/Downloads/python_data_ipy). If you want your output and directory moves to match mine, you can put yours there -- or if you can substitute your own directory path for the one I'm using.]


    on Mac:

    Davids-MBP-3:~ david % pwd
    /Users/david
    
    Davids-MBP-3:~ david % cd Downloads
    
    Davids-MBP-3:~ david % pwd
    /Users/david/Downloads

    on Windows:

    C:\Users\david> cd Downloads
    C:\Users\david\Downloads>

    So using the ls or dir command together with the cd command, we can travel from directory to directory, listing out the contents of each directory to decide where to go next (for Windows in the below examples, simply substitute the dir command for ls -- also note that Windows output for dir will look different than below):

    Davids-MBP-3:Downloads david % pwd
    /Users/david/Downloads
    
    Davids-MBP-3:Downloads david % ls       dir on Windows
    python_data_ipy
    [... likely other files/folders as well ...]
    
    Davids-MBP-3:Downloads david % cd python_data_ipy
    
    Davids-MBP-3:python_data_ipy david % pwd
    /Users/david/Downloads/python_data_ipy
    
    Davids-MBP-3:python_data_ipy david % ls       dir on Windows
    
    session_00_test_project
    session_01_objects_types
    session_02_funcs_condits_methods
    session_03_strings_lists_files
    session_04_containers_lists_sets
    session_05_dictionaries
    session_06_multis
    session_07_functions_power_tools
    session_08_files_dirs_stdout
    session_09_funcs_modules
    session_10_classes
    username.txt
    
    Davids-MBP-3:python_data_ipy david % cd session_06_multis/
    
    Davids-MBP-3:session_06_multis david % ls    dir on Windows
    warmup_exercises
    inclass_exercises
    notebooks_inclass_warmup
    [...several more files and folders, may be in a different order...]
    
    Davids-MBP-3:session_06_multis david % cd inclass_exercises
    
    Davids-MBP-3:inclass_exercises david % ls    dir on Windows
    inclass_6.1.py
    inclass_6.2.py
    inclass_6.3.py
    inclass_6.4.py
    inclass_6.5.py
    inclass_6.6.py
    ...
    
    Davids-MBP-3:inclass_exercises david % pwd
    /Users/david/Downloads/python_data_ipy/session_06_multis/inclass_exercises

    The 'parent directory'

    The '..' (double dot) indicates the parent and can move us one directory "up".


    As you saw, we can move "down" the directory tree by using the name of the next directory -- this extends the path (Mac paths will of course look different; use pwd to confirm your present working directory):

    C:\Users\david> cd Desktop
    
    C:\Users\david\Desktop>

    But if we'd like to travel up the directory tree, we use the special directory shortcut .. which signifies the parent directory:

    C:\Users\david\Desktop> cd ..
    
    C:\Users\david\> cd ..
    
    C:\Users\>

    We can also travel directly to an inner folder by using the full path. In order to complete the next exercise, I'll travel to an inner folder within my project directory (again, yours may be different depending on where you put the project folder):

    C:\Users\> cd david\Downloads\python_data_ipy\session_06_multis\inclass_exercises

    Executing a Python Script from the Command Line

    This is the "true" way to ask Python to execute our script.


    Every developer should be able execute scripts through the command line, without having to use an IDE like PyCharm or Jupyter. If you are in the same directory as the script, you can execute a program by running Python and telling Python the name of the script:


    On Windows:

    C:\Users\david\Downloads\python_data_ipy\session_06_multis\inclass_exercises\> python inclass_6.1.py

    On Mac:

    Davids-MBP-3:inclass_exercises david % python3 inclass_6.1.py

    Unless you've changed it, you won't see any result from running this program, because it does not print anything. Make a change and run it again to see the result! Each week we'll try to spend a few minutes traveling to and executing one or more Python programs from the command line.




    The JSON File Format and Multidimensional Containers

    the json file format

    JavaScript Object Notation is a simple "data interchange" format for sending or storing structured data as text.



    a sample json file

    Fortunately for us, JSON resembles Python in many ways, making it easy to read and understand.


    {
       "key1":  ["a", "b", "c"],
       "key2":  {
                  "innerkey1": 5,
                  "innerkey2": "woah"
                },
       "key3":  false,
       "key4":  null
    }


    reading a structure from a json file

    The json.load() function decodes the contents of a JSON file.


    Here's a program to read the structure shown earlier, read from a file that contains it:

    import json                 # we use this module to read JSON
    
    fh = open('sample.json')
    mys = json.load(fh)         # load from a file, convert into Python container
    fh.close()
    
    print((type(mys)))            # dict (the outer container of this struct)
    
    print(mys['key2']['innerkey2'])     # woah
    

    reading a structure from a json string

    The json.loads() function decodes the contents of a JSON string.


    In this example, we show what we must do if we don't have access to a file, but instead receive the data as a string, as in the case with web request.


    import json                 # we use this module to read JSON
    import requests             # use 'pip install' to install
    
    
    response = requests.get('https://davidbpython.com/mystruct.json')
    
    text = response.text        # file data as a single string
    
    mys = json.loads(text)      # load from a file, convert into Python container
    
    print((type(mys)))            # dict (the outer container of this struct)
    
    print((mys['key2']['innerkey2']))     # woah
    

    The requests module allows your Python program to act like a web browser, making web requests (i.e., with a URL) and downloading the response. If you try this program and receive a ModuleNotFoundError, you must run pip install at the command line to install it.


    printing a complex object readably: writing to a string

    A nested object can be confusing to read.


    If we have an multidimensional object that is squished together and hard to read, we can use .dumps() with indent=4

    import json
    
    obj = {'a': {'x': 1, 'y': 2, 'z': 3}, 'b': {'x': 1, 'y': 2, 'z': 3}, 'c': {'x': 1, 'y': 2, 'z': 3} }
    
    print((json.dumps(obj, indent=4)))
    

    this prints:

    {
        "a": {
            "x": 1,
            "y": 2,
            "z": 3
        },
        "b": {
            "x": 1,
            "y": 2,
            "z": 3
        }
    }

    sidebar: writing an object to json file

    We can use json.dump() write to a JSON file.


    Dumping a Python structure to JSON


    import json
    
    wfh = open('newfile.json', 'w')  # open file for writing
    
    obj = {'a': 1, 'b': 2}
    
    json.dump(obj, wfh)
    
    wfh.close()
    



    Introduction to User-Defined Functions

    Proper Code Organization

    Core principles


    Here are the main components of a properly formatted program:


    """ tip_calculator.py -- calculate tip for a restaurant bill
        Author:  David Blaikie dbb212@nyu.edu
        Last modified:  9/19/2017
    """
    
    import sys             # part of Python distribution (installed with Python)
    import pandas as pd    # installed "3rd party" modules
    import myownmod as mm  # "local" module (part of local codebase)
    
    
    # constant message strings are not required to be placed
    # here, but in professional programs they are kept
    # separate from the logic, often in separate "config" files
    MSG1 = 'A {}% tip (${}) was added to the bill, for a total of ${}.'
    MSG2 = 'With {} in your party, each person must pay ${}.'
    
    
    # sys.argv[0] is the program's pathname (e.g. /this/that/other.py)
    # os.path.basename() returns just the program name (e.g. other.py)
    USAGE_STRING = "Usage:  {os.path.basename(sys.argv[0])}   [total amount] [# in party] [tip percentage]
    
    
    def usage(msg):
        """ print an error message, usage: string and exit
    
        Args:     msg (str):  an error message
        Returns:  None (exits from here)
        Raises:   N/A (does not explicitly raise an exception)
    
        """
        sys.stderr.write(f'Error:  {msg}')
        exit(USAGE_STRING)
    
    
    def validate_normalize_input(args):
        """ verify command-line input
    
        Args:     N/A (reads from sys.argv)
    
        Returns:
            bill_amt (float):  the bill amount
            party_size (int):  the number of people
            tip_pct (float):   the percent tip to be applied, in 100’s
    
        Raises:  N/A (does not explicitly raise an exception)
    
        """
        if not len(sys.argv) == 4:
            usage('please enter all required arguments')
    
        try:
            bill_amt = float(sys.argv[1])
            party_size = int(sys.argv[2])
            tip_pct = float(sys.argv[3])
        except ValueError:
            usage('arguments must be numbers')
    
        return bill_amt, party_size, tip_pct
    
    
    def perform_calculations(bill_amt, party_size, tip_pct):
        """
        calculate tip amount, total bill and person's share
    
        Args:
            bill_amount (float):  the total bill
            party_size (int):  the number in party
            tip_pct (float):  the tip percentage in 100’s
    
        Returns:
            tip_amt (float):  the tip in $
            total_bill (float):  the bill including tip
            person_share (float):  equal share of bill per person
    
        Raises:
            N/A (does not specifically raise an exception)
        """
    
        tip_amt = bill_amt * tip_pct * .01
        total_bill = bill_amt + tip_amt
        person_share = total_bill / party_size
    
        return tip_amt, total_bill, person_share
    
    
    def report_results(pct, tip_amt, total_bill, size, person_share):
        """ print results in formatted strings
    
        Args:
            pct (float):  the tip percentage in 100’s
            tip_amt (float):  the tip in $
            total_bill (float):  the bill including tip
            size (int):  the party slize
            person_share (float):  equal share of bill per person
        Returns:
            None (prints result)
    
        Raises:
            N/A
        """
    
        print(MSG1.format(pct, tip_amt, total_bill))
        print(MSG2.format(size, person_share))
    
    
    def main(args):
        """ execute script
    
        Args:     args (list):  the command-line arguments
        Returns:  None
        Raises:   N/A
    
        """
    
        bill, size, pct = validate_normalize_input(args)
        tip_amt, total_bill, person_share = perform_calculations(bill, size,
                                                                 pct)
    
        report_results(pct, tip_amt, total_bill, size, person_share)
    
    
    if __name__ == '__main__':            # 'main body' code
    
        main(sys.argv[1:])
    

    The code inside the if __name__ == '__main__' block is intended to be the call that starts the program. If this Python script is imported, the main() function will not be called, because the if test will only be true if the script is executed, and will not be true if it is imported. We do this in order to allow the script's functions to be imported and used without actually running the script -- we may want to test the script's functions (unit testing) or make use of a function from the script in another program. Whether we intend to import a script or not, it is considered a "best practice" to build all of our programs in this way -- with a "main body" of statements collected under function main(), and the call to main() inside the if __name__ == '__main__' gate. This structure will be required for all assignments submitted for the remainder of the course.


    user-defined functions

    User-defined functions are a block of code that can be executed by name.


    def add(val1, val2):
        valsum = val1 + val2
        return valsum
    
    ret = add(5, 10)           # int, 15
    
    ret2 = add(0.3, 0.9)       # float, 1.2
    

    A function is a block of code:


    user defined functions: calling the function

    A user-defined function is simply a named code block that can be executed any number of times.


    def print_hello():
        print("Hello, World!")
    
    print_hello()             # prints 'Hello, World!'
    print_hello()             # prints 'Hello, World!'
    print_hello()             # prints 'Hello, World!'
    


    user defined functions: arguments

    The argument is the input to a function.


    def print_hello(greeting, person):              # note we do not
        full_greeting = f'{greeting}, {person}!'    # refer to 'name1'
        print(full_greeting)                        # 'place2', etc.
                                                    # inside the function
    name1 = 'Hello'
    place1 = 'World'
    
    print_hello(name1, place1)             # prints 'Hello, World!'
    
    
    name2 = 'Bonjour'
    place2 = 'Python'
    
    print_hello(name2, place2)             # prints 'Bonjour, Python!'
    


    user defined functions: function return values

    A function's return value is passed back from the function using the return statement.


    def print_hello(greeting, person):
      full_greeting = f'{greeting}, {person}!'
      return full_greeting
    
    msg = print_hello('Bonjour', 'parrot')
    
    print(msg)                                       # 'Bonjour, parrot!'
    




    Set Operations, List Comprehensions

    Advanced Container Processing

    In this unit we will complete our tour of the core Python data processing features.


    So far we have explored the reading and parsing of data; the loading of data into built-in structures; and the aggregation and sorting of these structures. This unit explores advanced tools for container processing. list comprehensions and set comparisons are two "power tools" which can do basic things we have been able to do before -- like looping through a list and doing the same thing to each element in a list, loop through and select items from a list, and compare two collections to see what is common or different between them.


    set operations

    a = {'a', 'b', 'c'}
    b = {'b', 'c', 'd'}
    print(a.difference(b))            # {'a'}
    print(a.union(b))                 # {'a', 'b', 'c', 'd'}
    print(a.intersection(b))          # {'b', 'c'}
    print(a.symmetric_difference(b))  # {'a', 'd'}
    

    list comprehensions

    a = ['hello', 'there', 'harry']
    print([ var.upper() for var in a if var.startswith('h') ])
                               # ['HELLO', 'HARRY']
    

    ternary assignment

    rev_sort = True if user_input == 'highest' else False
    
    pos_val = x if x >= 0 else x * -1
    

    conditional assignment

    val = this or that       # 'this' if this is True else 'that'
    val = this and that      # 'this' if this is False else 'that'
    

    Container processing: Set Comparisons

    We have used the set to create a unique collection of objects. The set also allows comparisons of sets of objects. Methods like set.union (complete member list of two or more sets), set.difference (elements found in this set not found in another set) and set.intersection (elements common to both sets) are fast and simple to use.


    set_a = {1, 2, 3, 4}
    set_b = {3, 4, 5, 6}
    
    print(set_a.union(set_b))           # {1, 2, 3, 4, 5, 6}  (set_a + set_b)
    print(set_a.difference(set_b))      # {1, 2}              (set_a - set_b)
    print(set_a.intersection(set_b))    # {3, 4}     (what is common between them?)
    

    List comprehensions: filtering a container's elements

    List comprehensions abbreviate simple loops into one line.


    Consider this loop, which filters a list so that it contains only positive integer values:

    myints = [0, -1, -5, 7, -33, 18, 19, 55, -100]
    myposints = []
    for el in myints:
      if el > 0:
        myposints.append(el)
    
    print(myposints)                   # [7, 18, 19, 55]
    

    This loop can be replaced with the following one-liner:

    myposints = [ el for el in myints if el > 0 ]
    

    See how the looping and test in the first loop are distilled into the one line? The first el is the element that will be added to myposints - list comprehensions automatically build new lists and return them when the looping is done.


    The operation is the same, but the order of operations in the syntax is different:

    # this is pseudo code
    # target list = item for item in source list if test
    

    Hmm, this makes a list comprehension less intuitive than a loop. However, once you learn how to read them, list comprehensions can actually be easier and quicker to read - primarily because they are on one line. This is an example of a filtering list comprehension - it allows some, but not all, elements through to the new list.


    List comprehensions: transforming a container's elements

    Consider this loop, which doubles the value of each value in it:


    nums = [1, 2, 3, 4, 5]
    dblnums = []
    for val in nums:
      dblnums.append(val*2)
    
    print(dblnums)                          # [2, 4, 6, 8, 10]
    

    This loop can be distilled into a list comprehension thusly:

    dblnums = [ val * 2 for val in nums ]
    

    This transforming list comprehension transforms each value in the source list before sending it to the target list:

    # this is pseudo code
    # target list = item transform for item in source list
    

    We can of course combine filtering and transforming:

    vals = [0, -1, -5, 7, -33, 18, 19, 55, -100]
    doubled_pos_vals = [ i*2 for i in vals if i > 0 ]
    print(doubled_pos_vals)                # [14, 36, 38, 110]
    

    List comprehensions: examples

    If they only replace simple loops that we already know how to do, why do we need list comprehensions? As mentioned, once you are comfortable with them, list comprehensions are much easier to read and comprehend than traditional loops. They say in one statement what loops need several statements to say - and reading multiple lines certainly takes more time and focus to understand.


    Some common operations can also be accomplished in a single line. In this example, we produce a list of lines from a file, stripped of whitespace:

    stripped_lines = [ i.rstrip() for i in open(r'FF_daily.txt').readlines() ]
    

    Here, we're only interested in lines of a file that begin with the desired year (1972):

    totals = [ i for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]
    

    If we want the MktRF values (the leftmost floating-point value on each line) for our desired year, we could gather the bare amounts this way:

    mktrf_vals = [ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]
    

    And in fact we can do part of an earlier assignment in one line -- the sum of MktRF values for a year:

    mktrf_sum = sum([ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ])
    

    From experience I can tell you that familiarity with these forms make it very easy to construct and also to decode them very quickly - much more quickly than a 4-6 line loop.


    List Comprehensions with Dictionaries

    Remember that dictionaries can be expressed as a list of 2-element tuples, converted using items(). Such a list of 2-element tuples can be converted back to a dictionary with dict():


    mydict =  {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': 1, 'f': 4}
    
    my_items = list(mydict.items())      # my_items is now [('a',5), ('b',0), ('c',-3), ('d',2), ('e',1), ('f',4)]
    mydict2 = dict(my_items)       # mydict2 is now   {'a':5,   'b':0,   'c':-3,   'd':2,   'e':1,   'f':4}
    

    It becomes very easy to filter or transform a dictionary using this structure. Here, we're filtering a dictionary by value - accepting only those pairs whose value is larger than 0:

    mydict = {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': -22, 'f': 4}
    filtered_dict = dict([ (i, j) for (i, j) in mydict.items() if j > 0 ])
    

    Here we're switching the keys and values in a dictionary, and assigning the resulting dict back to mydict, thus seeming to change it in-place:

    mydict = dict([ (j, i) for (i, j) in mydict.items() ])
    

    The Python database module returns database results as tuples. Here we're pulling two of three values returned from each row and folding them into a dictionary.

    # 'tuple_db_results' simulates what a database returns
    tuple_db_results = [
      ('joe', 22, 'clerk'),
      ('pete', 34, 'salesman'),
      ('mary', 25, 'manager'),
    ]
    
    names_jobs = dict([ (name, role) for name, age, role in tuple_db_results ])
    



    The Command Prompt: Program Arguments

    sys.argv to capture command line arguments

    sys.argv is a list that holds string arguments entered at the command line


    a python script get_args.py

    import sys                           # import the sys library
    
    print('first arg: ' + sys.argv[1])   # print first command line arg
    print('second arg: ' + sys.argv[2])  # print second command line arg
    

    running the script from the command line, with two arguments

    $ python myscript.py hello there
    first arg: hello
    second arg: there


    The default item in sys.argv: the program name

    sys.argv[0] will always contain the name of our program.



    a python script print_args.py

    import sys
    print(sys.argv)
    

    (passing 3 arguments)

    $ python print_args.py hello there budgie
    ['myscript2.py', 'hello', 'there', 'budgie']

    running the script from the command line (passing no arguments)

    $ python print_args.py
    ['myscript2.py']

    IndexError with sys.argv (when user passes no argument)

    Since we read arguments from a list, we can trigger an IndexError if we try to read an argument that wasn't passed.


    a python script addtwo.py

    import sys
    
    firstint = int(sys.argv[1])
    secondint = int(sys.argv[2])
    
    mysum = firstint + secondint
    
    print(f'the sum of the two values is {mysum}')
    

    passing 2 arguments

    $ python addtwo.py 5 10
    the sum of the two values is 15

    passing no arguments

    $ python addtwo.py
    Traceback (most recent call last):
      File "addtwo.py", line 3, in <module>
    firstint = int(sys.argv[1])
    IndexError: list index out of range

    How to handle this exception? Test the len() of sys.argv, or trap the exception.




    File Tests and Manipulations

    os.path.isfile() and os.path.isdir()

    With these we can see whether a file is a plain file, or a directory.


    import os                         # os ('operating system') module talks
                                      # to the os (for file access & more)
    mydirectory = '/Users/david'
    
    items = os.listdir(mydirectory)
    
    for item in items:
    
        item_path = os.path.join(mydirectory, item)
    
        if os.path.isdir(item_path):
            print(f"{item}:  directory")
        elif os.path.isfile(item_path):
            print(f"{item}:  file")
                                         # photos:  directory
                                         # backups:  directory
                                         # college_letter.docx:  file
                                         # notes.txt:  file
                                         # finances.xlsx:  file
    


    os.path.exists()

    This function tests to see if a file exists on the filesystem.


    import os
    
    fn = input('please enter a file or directory name:  ')
    if not os.path.exists(fn):
        print('item does not exist')
    
    elif os.path.isfile(fn):
        print('item is a file')
    
    elif os.path.isdir(fn):
        print('item is a directory')
    


    read file size with os.path.getsize()

    os.path.getsize() takes a filename and returns the size of the file in bytes


    import os                        # os ('operating system') module
                                     # talks to the os (for file access & more)
    mydirectory = '/Users/david'
    
    items = os.listdir(mydirectory)
    
    for item in items:
        item_path = os.path.join(mydirectory, item)
        item_size = os.path.getsize(item_path)
        print(f"{item_path}:  {item_size} bytes")
    


    moving or renaming a file

    moving and renaming a file are essentailly the same thing


    import os
    
    filename = 'file1.txt'
    new_filename = 'newname.txt'
    
    os.rename(filename, new_filename)
    

    import os
    
    filename = 'file1.txt'      # or could be a filepath incluing directory
    move_to_dir = 'old/'
    
    os.rename(filename, os.path.join(move_to_dir, filename))  # file1.txt, old/file1.txt
    


    copying or backing up a file

    import shutil
    
    filename = 'file1.txt'
    backup_filename = 'file1.txt_bk'        # must be a filepath, including filename
    
    shutil.copyfile(filename, backup_filename)
    

    import shutil
    
    filename = 'file1.txt'
    target_dir = 'backup'                   # can be a filepath or just a directory name
    
    shutil.copy(filename, target_dir)  # dst can be a folder; use shutil.copy2()
    


    creating a directory: os.mkdir()

    This function is named after the unix utility mkdir.


    import os
    
    os.mkdir('newdir')
    

    removing a directory or filetree: os.remove() and shutil.rmtree()

    If your directory has files, shutil.rmtree must be used.


    import os
    import shutil
    
    os.mkdir('newdir')
    
    wfh = open('newdir/newfile.txt', 'w')  # creating a file in the dir
    wfh.write('some data')
    wfh.close()
    
    os.rmdir('newdir')        # OSError: [Errno 66] Directory not empty: 'newdir'
    

    shutil.rmtree('newdir')   # success
    


    copying a filetree

    import shutil
    
    shutil.copytree('olddir', 'newdir')
    

    Regardless of what files and folders are in the directory to be copied, all files and folders (and indeed all folders and files within) will be copied to the new name or location.




    File and Directory Listings

    writing to files using the file object

    Opening an existing file for writing truncates the file.


    fh = open('new_file.txt', 'w')
    fh.write("here's a line of text\n")
    fh.write('I add the newlines explicitly if I want to write to the file\n')
    fh.close()
    


    appending to files

    Appending is usually used for log files.


    fh = open('new_file.txt', 'w')
    fh.write("here's a line of text\n")
    fh.write('I add the newlines explicitly if I want to write to the file\n')
    fh.close()
    

    Again, note that we are explicitly adding newlines to the end of each line.


    show the present/current working directory.

    The pwd is the location from which we run our programs.


    import os
    
    cwd = os.getcwd()        # str (your current directory)
    
    print(cwd)
    

    a sample file tree

    this tree can be found among your course files.


    dir1
    ├── file1.txt
    ├── test1.py
    │
    ├── dir2a
    │   ├── file2a.txt
    │   ├── test2a.py
    │   │
    │   ├── dir3a
    │   │   ├── file3a.txt
    │   │   ├── test3a.py
    │   │   │
    │   │   └── dir4
    │   │       ├── file4.txt
    │   │       └── test4.py
    └── dir2b
        ├── file2b.txt
        ├── test2b.py
        │
        └── dir3b
           ├── file3b.txt
           └── test3b.py
    


    relative filepaths

    These paths locate files relative to the present working directory.


    If the file you want to open is in the same directory as the script you're executing, use the filename alone:

    fh = open('filename.txt')
    

    relative filepaths: parent directory

    To reach the parent directory, prepend the filename with ../


    :

    fh = open('../filename.txt')
    

    relative filepaths: parent directory

    To reach the child directory, prepend the filename with the name of the child directory.


    fh = open('childdir/filename.txt')
    

    relative filepaths: sibling directory

    To reach a sibling directory, prepend the filename with ../ and the name of the child directory.


    fh = open('childdir/filename.txt')
    

    To reach a sibling directory, we must go "up, then down" by using ../ to go to the parent, then the sibling directory name to go down to the child.


    absolute filepaths

    These paths locate files from the root of the filesystem.


    In Windows, absolute paths begin with a drive letter, usually C:\:

    """ test3a.py:  open and read a file """
    
    filepath = r'C:\Users\david\Desktop\python_data\dir1\file1.txt'
    fh = open(filepath)
    
    print(fh.read())
    

    (Note that r'' should be used with any Windows paths that contain backslashes.)


    On the Mac, absolute paths begin with a forward slash:

    """ test3a.py:  open and read a file """
    
    filepath = '/Users/david/Desktop/python_data/dir1/file1.txt'
    fh = open(filepath)
    
    print(fh.read())
    

    (The above paths assume that the python_data folder is in the Desktop directory; your may have placed yours elsewhere on your system. Of course, the above paths also assume that my home directory is called david/; yours is likely different.)


    os.path.join()

    This function joins together directory and file strings with slashes appropriate with the current operating system.


    dirname = '/Users/david'
    filename = 'journal.txt'
    
    filepath = os.path.join(dirname, filename)   # '/Users/david/journal.txt'
    
    filepath2 = os.path.join(dirname, 'backup', filename)  # '/Users/david/backup/journal.txt'
    


    os.listdir(): list a directory

    os.listdir() can read the contents of any directory.


    import os
    
    mydirectory = '/Users/david'
    
    items = os.listdir(mydirectory)
    
    for item in items:                                # 'photos'
    
        item_path = os.path.join(mydirectory, item)
    
        print(item_path)   # /Users/david/photos/
                          # /Users/david/backups/
                          # /Users/david/college_letter.docx
                          # /Users/david/notes.txt
                          # /Users/david/finances.xlsx
    

    Note the os.path.join() call. This is a standard algorithm for looping through a directory -- each item must be joined to the directory to ensure that the filepath is correct.


    exceptions for missing or incorrect files or directories

    Several exceptions can indicate a file or directory misfire.


    exception typeexample trigger
    FileNotFoundErrorattempt to open a file not in this location
    FileExistsErrorattempt to create a directory (or in some cases a file) that already exists
    IsADirectoryErrorattempt to open() a file that is already a directory
    NotADirectoryErrorattempt to os.listdir() a directory that is not a directory
    PermissionErrorattempt to read or write a file or directory to which you haven't the permissions
    WindowsError, OSErrorthese exception types are sometimes raised in place of one or more of the above when on a Windows computer


    traversing a directory tree with os.walk()

    os.walk() visits every directory in a directory tree so we can list files and folders.


    import os
    root_dir = '/Users/david'
    for root, dirs, files in os.walk(root_dir):
    
        for tdir in dirs:                    # loop through dirs in this directory
            print(os.path.join(root, tdir))  # print full path to tdir
    
        for tfile in files:                  # loop through files in this dir
            print(os.path.join(root, tfile)) # print full path to file
    


    At each iteration, these three variables are assigned these values:




    More About User-Defined Functions

    user-defined functions and code organization

    User-defined functions help us organize our code -- and our thinking.


    Let's now return to functions from the point of view of code organization. Functions are useful because they:


    review: function block, argument and return value

    def add(val1, val2):
        mysum = val1 + val2
        return mysum
    
    a = add(5, 10)      # int, 15
    
    b = add(0.2, 0.2)   # float, 0.4
    

    Review what we've learned about functions:


    functions without a return statement return None

    When a function does not return anything, it returns None.


    def do(arg):
        print(f'{arg} doubled is {arg * 2}')
        # no return statement returns None
    
    x = do(5)        # (prints '5 doubled is 10')
    
    print(x)         # None
    


    Actually, since do() does not return anything useful, then we should not call it with an assignment (i.e., x = above), because no useful value will be returned. If you should call a function and find that its return value is None, it often means that it was not meant to be assigned because there is no useful return value.


    the None object type

    The None value is the "value that means 'no value'".


    zz = None
    
    print(zz)        # None
    print(type(zz))  # <class 'NoneType'>
    
    aa = 'None'      # a string -- not the None value!
    


    function argument type: positional

    Positional arguments are required to be passed, and assigned by position.


    def greet(firstname, lastname):
        print(f"Hello, {firstname} {lastname}!")
    
    greet('Joe', 'Wilson')   # passed two arguments:  correct
    
    greet('Marie')           # TypeError: greet() missing 1 required positional argument: 'lastname'
    


    function argument type: keyword

    Keyword args are not required, and if not passed return a default value.


    def greet(lastname, firstname='Citizen'):
        print(f"Hello, {firstname} {lastname}!")
    
    greet('Kim', firstname='Joe')   # Hello, Joe Kim!
    
    greet('Kim')                    # Hello, Citizen Kim!
    




    User-Defined Function Variable Scoping

    variable name scoping: the local variable

    Variable names initialized inside a function are local to the function.


    def myfunc():
        a = 10
        return a
    
    var = myfunc()
    print(var)          # 10
    print(a)            # NameError ('a' does not exist here)
    


    variable name scoping: the global variable

    Any variable defined outside a function is global.


    var = 'hello global'
    
    def myfunc():
        print(var)
    
    myfunc()                  # hello global
    


    "pure" functions

    Functions that do not touch outside variables, and do not create "side effects" (for example, calling exit(), print() or input()), are considered "pure" -- and are preferred.


    "Pure" functions have the following characteristics:


    "pure" functions: working only with "inside" (local) variables

    "Outside" (Global) variables are ones defined outside the function -- they should be avoided.


    wrong way: referring to an outside variable inside a function

    val = '5'                   # defined outside any function
    
    def doubleit():
        dval = int(val) * 2     # BAD:  refers to "global" variable 'val'
        return dval
    
    new_val = doubleit()
    

    right way: passing outside variables as arguments

    val = '5'                   # defined outside any function
    
    def doubleit(arg):
        dval = int(arg) * 2     # GOOD:  refers to same value as 'val',
        return dval             #        but accessed through local
                                #        argument 'arg'
    
    new_val = doubleit(val)     # passing variable to function -
                                #   correct way to get a value into the function
    


    "pure" functions: avoiding "side-effects"

    print(), input(), exit() all "touch" the outside world and in many cases should be avoided inside functions.



    Although it is of course possible (and sometimes practical) to use these built-in functions inside our function, we should avoid them if we are interested in making a function "pure".


    "pure" functions: why prefer them?

    Here are some positive reasons to strive for purity.


    You may notice that these "impure" practices do not cause errors. So why should we avoid them?


    Please note that during development it is perfectly allowable to print(), exit() or input() from inside a function. We may also decide on our own that this is all right in shorter programs, or ones that we working on in isolation. It is with longer programs and collaborative programs where purity becomes more important.


    "pure" functions: using 'raise' instead of exit() inside functions

    exit() should not be called inside a function.


    def doubleit(arg):
        if not arg.isdigit():
            raise ValueError('arg must be all digits')   # GOOD:  error signaled with raise
        dval = int(arg) * 2
        return dval
    
    val = input('what is your value? ')
    new_val = doubleit(val)
    


    signalling errors (exceptions) with 'raise'

    'raise' creates an error condition (exception) that usually terminates program execution.



    To raise an exception, we simply follow raise with the type of error we would like to raise, and an optional message:

    raise IndexError('I am now raising an IndexError exception')
    

    You may raise any existing exception (you may even define your own). Here is a list of common exceptions:

    Exception TypeReason
    TypeError the wrong type used in an expression
    ValueError the wrong value used in an expression
    FileNotFoundError a file or directory is requested that doesn't exist
    IndexError use of an index for a nonexistent list/tuple item
    KeyError a requested key does not exist in the dictionary


    global variables and function "purity"

    Globals should be used inside functions only in select circumstances.


    STATE_TAX = .05    # ALL CAPS designates a "constant"
    
    
    def calculate_bill(bill_amount, tip_pct):
    
        tax = bill_amount * STATE_TAX     # int, 5
        tip = bill_amount * tip_pct       # float, 20.0
    
        total_amount = bill_amount + tax + tip   # float, 125.0
    
        return total_amount
    
    
    total = calculate_bill(100, .20)      # float, 125.0
    


    the four variable scopes: l-e-g-b

    Four kinds of variables: (L)ocal, (E)nclosing, (G)lobal and (B)uiltin.


    filename = 'pyku.txt'        # 'filename':  global
    
                                    # 'get_text':  global (function name is a
                                    #                      variable as well)
    def get_text(fname):            # 'fname':     local
        fh = open(fname)            # 'fh':        local; 'open':  builtin
        text = fh.read()            # 'text':      local
        return text
    
    txt = get_text(filename)        # 'txt':       global
    print(txt)                      # 'print':     builtin
    


    proper code organization

    Core principles.


    Here are the main components of a properly formatted program:


    See the tip_calculator.py file in your files directory for an example and notes below.




    Modules in Python

    importing built-in modules

    Python comes with hundreds of preinstalled modules.


    import sys             # find and import the sys
    import json            # find and importa the json
    
    print(sys.copyright)   # the .copyright attribute points
                           # to a string with the copyright notice
    
          # Copyright (c) 2001-2023 Python...
    
    
    obj = json.loads('{"a": 1, "b": 2}')   # the .loads attribute points to
                                           # a function that reads str JSON data
    
    print(type(obj))    # <class 'dict'>
    


    other module import patterns

    These patterns are purely for convenience when needed.


    Abbreviating the name of a module:

    import json as js           # 'json' is now referred to as 'js'
    
    obj = js.loads('{"a": 1, "b": 2}')
    

    Importing a module variable into our program directly:

    from json import loads      # making the 'loads' function part of the global namespace
    
    obj = loads('{"a": 1, "b": 2}')
    

    Please note that this does not import only a part of the module: the entire module code is still imported.


    built-in module examples

    Each module has a specific focus.



    user-defined modules

    A module of our own design may be saved as a .py file.


    messages.py: a simple Python module that prints messages

    import sys
    
    def print_warning(msg):
        print(f'Warning!  {msg}')
    

    test.py: a Python script that imports messages.py

    import messages
    
    # accessing the print_warning() function
    messages.print_warning('Look out!')   # Warning!  Look out!
    


    module search path

    Python must be told where to find our own custom modules.


    To view the currently used module search paths, we can use sys.path

    import sys
    
    print(sys.path)        # shows a list of strings, each a directory
                           # where modules can be found
    


    setting the PYTHONPATH system environment variable

    Like the PATH for programs, this variable tells Python where to find modules.



    the python standard distribution of modules

    Modules included with Python are installed when Python is installed -- they are always available.


    Python provides hundreds of supplementary modules to perform myriad tasks. The modules do not need to be installed because they come bundled in the Python distribution, that is they are installed at the time that Python itself is installed. The documentation for the standard library is part of the official Python docs.


    PyPI: the python package index

    This index contains links to all modules ever added by anyone to the index.


    Search for any module's home page at the PyPI website:

    https://pypi.python.org/pypi


    finding third-party modules

    Take some care when installing modules -- it is possible to install nefarious code.



    - [demo: searching for powerpoint module, verifying ]


    installing modules

    Third-party modules must be downloaded and installed into your Python distribution.


    Commands to use at the command line:

    pip search pandas         # searches for pandas in the PyPI repository
    pip install pandas        # installs pandas


    Featured module: math

    The math module handles advanced math calculations.


    These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)


    A quick look at the module's attributes gives us an idea of what is included:

    import math
    
    print(dir(math))
    
       # ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
       #  'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
       #  'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
       #  'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
       #  'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
       #  'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
       #  'tanh', 'tau', 'trunc']
    

    For example, here are some simple geometry calculations using math:

    import math
    
    print(math.pi)                           # 3.141592653589793
    
    radius = 3
    circumference = 2 * math.pi * radius     # 18.84955592153876
    
    area = math.pi * radius * radius         # 28.274333882308138
    

    Featured module: statistics

    This module provides basic statistical analysis.


    Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.


    import statistics as stats                       # set a convenient name for the module
    
    values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]
    
    # average value
    meanval = stats.mean(values)                     # 4.083333333333333
    
    # "middle" value in a list of sorted values (list does not need to be sorted)
    medianval = stats.median(values)                 # 4.0
    
    # average distance of each value from the mean
    standev = stats.stdev(values)                    # 1.781640374554423
    
    # square of the standard deviation
    varianceval = stats.variance(values)             # 3.1742424242424243
    
    # most common value
    modeval = stats.mode(values)                     # 6
    

    Featured module: string

    This module provides useful lists of characters.


    import string
    
    print(string.ascii_letters)       # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
    
    print(string.ascii_lowercase)     # abcdefghijklmnopqrstuvwxyz
    
    print(string.ascii_uppercase)     # ABCDEFGHIJKLMNOPQRSTUVWXYZ
    
    print(string.digits)              # 0123456789
    
    print(string.hexdigits)           # 0123456789abcdefABCDEF
    
    print(string.punctuation)         # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
    
    print(string.whitespace)          # \t\n\r\x0b\x0c'   (prints as invisible characters)
    

    Featured module: zipfile

    The zipfile module builds, unpacks and inspects .zip archives.


    import zipfile as zp
    
    myzip = zp.ZipFile('myzip.zip', 'w')
    
    # add names of files (of course these must exist)
    myzip.write('file1.txt')
    myzip.write('file2.pdf')
    myzip.write('file3.doc')
    
    myzip.close()                     # builds and writes zip file
    
    print('done')
    

    After running the above code and referencing real files, check the session files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.


    Featured module: time

    The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.


    time can be used to sleep (or pause execution) for a set number of seconds:

    import time
    
    # pause execution # of seconds
    time.sleep(5)
    

    We can also use time to show the current time:

    # current time and date
    print(time.ctime())                 # Sat May 23 17:10:55 2020
    

    At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).

    # read current time in seconds
    secs = time.time()                  # 1590257729.297496  (includes milliseconds)
    
    # calculate 24 hours, in seconds (subtract 86,400 seconds)
    yestersecs = secs - (60 * 60 * 24)
    
    # show the current time minus 24 hours
    print(time.ctime(yestersecs))          # Fri May 22 17:10:55 2020
    
    # a "time struct"
    print(time.localtime(yestersecs))
                                        # time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
                                        # tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
                                        # tm_yday=143, tm_isdst=1)
    

    The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.


    Featured module: datetime

    The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.


    import datetime as dt
    
    
    # build a 'date' object from year, month, day
    mydate1 = dt.date(2019, 9, 3)
    
    
    # build a 'date' object representing today
    mydate2 = dt.date.today()
    
    
    # build a datetime object from year, month, day, hour, minute and second
    mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)
    
    
    # build a datetime object representing right now
    mydatetime2 = dt.datetime.now()
    
    
    # build a datetime object from a formatted string
    mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')
    
    
    # build a "timedelta" (time interval) object:  3 days, 2 hours
    myinterval = dt.timedelta(days=3, seconds=7200)
    
    
    # date objects and intervals can be calculated like math
    newdate = mydatetime3 + myinterval
    
    print(newdate)                                # 2019-03-06 00:02:00
    
    
    # render a date object in a string format
    print(newdate.strftime('%Y-%m-%d  (%H:%M)'))  # 2019-03-06 (02:00)
    

    Featured module: random

    The random module generates pseudorandom numbers.


    'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.


    import random
    
    # random float from 0 to 1
    myfloat = random.random()        # 0.22845730036901912
    
    
    # random integer from 1 to 10
    num = random.randint(1, 10)
    
    
    # random choice from a list
    x = ['a', 'b', 'c']
    choice = random.choice(x)        # 'b'
    

    Featured module: csv

    The csv module reads and writes CSV files.


    import csv
    
    # reading a CSV file
    fh = open('dated_file.csv')
    reader = csv.reader(fh)
    
    for row in reader:
        print(row)
    
    fh.close()
    
    # writing to a CSV file
    wfh = open('newfile.csv', 'w', newline='')
    writer = csv.writer(wfh)
    
    writer.writerow(['a', 'b', 'c'])
    writer.writerow(['d', 'e', 'f'])
    writer.writerow(['g', 'b', 'i'])
    
    wfh.close()                 # required - otherwise you may not see the writes
    

    (newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits. (With Jupyter notebooks or the Python interactive interpreter, the unclosed file will not see changes until after the interpreter is closed.)


    Featured module: sqlite3

    The sqlite3 module allows file-based writing and reading of relational tables.


    # connecting
    import sqlite3
    
    conn = sqlite3.connect('mydatabase.db')     # open an existing, or create a new file
    
    cur = conn.cursor()
    
    
    #creating a table
    cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")
    
    
    # insert rows into a table
    rows = [
      [ 'Joe', 23, 23.9],
      [ 'Marie', 19, 7.95 ],
      [ 'Zoe', 29, 17.5 ]
    ]
    
    for row in rows:
        cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)
    
    conn.commit()                                # essential to see the write
    
    
    # selecting data from a table
    cur = conn.cursor()
    
    cur.execute('SELECT name, years, balance FROM mytable')
    
    for row in cur:
        print(row)            # ('Joe', 23, 23.9)
                              # ('Marie', 19, 7.95)
                              # ('Zoe', 29, 17.5)
    

    Featured module: requests

    requests


    requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.


    import requests
    
    # make URL request; download the response
    response = requests.get('http://www.nytimes.com')
    
    # the HTTP response code (200 OK, 404 not found, 500 error, etc.)
    status_code = response.status_code
    
    # the text of the response
    page_text =   response.text
    
    # decoding the text of the response (if necessary)
    page_text = page_text.encode('utf-8')
    
    print(f'status code:  {status_code}')
    print('======================= page text =======================')
    print(page_text)
    

    Featured module: urllib

    If requests is not available on your system, urllib provides similar functionality.


    import urllib
    
    # make URL request; download the text of the response
    read_object = urllib.request.urlopen('http://www.nytimes.com')
    
    # a file-like object, can also 'for' loop or use .readlines()
    text = read_object.read()
    
    # decoding the text of the response (if necessary)
    text = text.decode('utf-8')
    

    SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).


    import ssl
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    
    my_url = 'http://www.nytimes.com'
    read_object = urllib.request.urlopen(my_url, context=ctx)
    

    Featured module: bs4 (Beautiful Soup)

    The bs4 module can parse HTML to extract data from web pages.


    This module must be installed separately.


    import bs4
    
    fh = open('dormouse.html')
    text = fh.read()
    fh.close()
    
    soup = bs4.BeautifulSoup(text, 'html.parser')
    
    
    # show all plain text in a page
    print(soup.get_text())
    
    
    # retrieve first tag with this name (a <title> tag)
    tag = page.title
    
    
    # same, using <B>.find()
    tag = page.find('title')
    
    
    # find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
    link1_a_tags = soup.find_all('a', {'id': 'link1'})
    
    
    # find all <a> tags (hyperlinks)
    tags = soup.find_all('a')
    

    Featured module: re (regular expressions)

    The re module can recognize patterns in text and extract portions of text based on patterns.


    import re
    
    line = 'a phone number:  213-298-1990'
    
    matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)
    
    print(matchobj.group(1))   # '213-298-1990'
    

    The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.


    Featured module: textwrap

    The textwrap module allows you to wrap text at a certain width.


    import textwrap
    
    text = "This is some really long text that we would like to wrap.  Wouldn't you know it, there's a module for that!  "
    
    
    # returns a list of lines
    # text is limited to 10 characters width
    items = textwrap.wrap(text, 10)
    
    
    # join lines together into multi-line string with new width
    print('\n'.join(items))
    

    pandas for table manipulation

    The pandas module enables table manipulations similar to those done by excel relational databases.


    The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.


    pandas can read and write to and from a multitude of formats

    import pandas as pd
    import sqlite3
    
    # read from multiple formats to a DataFrame
    df = pd.read_csv('dated_file.csv')
    # df = pd.read_excel('dated_file.xls')
    # df = pd.read_json('dated_file.json')
    
    # write DataFrame to multiple formats
    df.to_csv('new_file.csv')
    # df.to_excel('new_file.xls')
    # df.to_json('new_file.json')
    
    
    # read from database through query
    conn = sqlite3.connect('testdb.db')
    df = pd.read_sql('SELECT * FROM test', conn)
    

    pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:

    df = pd.read_csv('dated_file.csv')
    
    
    # select rows thru a filter
    df2 = df[ df[3] > 18 ]      # all rows where the field in column '3' (4th column) is > 18
    
    
    # sum, average, etc. a column
    df.tax.mean()                         # average values in 'tax' column
    df.revenue.sum()                      # sum values in 'revenue' column
    
    
    # create a new column
    df['col99'] = df.col1 + df.revenue   # new column sums 'col1' and 'revenue' field from each line
    
    
    # groupby aggregation
    dfgb = df.state.groupby.sum().revenue       # show sum of revenue for each state
    

    pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.

    # groupby bar chart
    dfgb.plot().bar()
    
    # weather temp line chart
    weather_df.temp.plot().line()
    



    Useful Modules

    Useful Modules: Introduction

    This slide deck contains basic documentation on some of the most useful modules in the Python standard distribution. There are many more!


    As you know, a module is Python code stored in a separate file or files that we can import into our code, to help us do specialized work. The Python documentation lists modules that come installed with Python (collectively, these modules are known as the "Standard Library"). Every module demonstrated below has many features and options. You can refer to documentation, or an article or blog post, to learn more about each.


    Featured module: math

    The math module handles advanced math calculations.


    These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)


    A quick look at the module's attributes gives us an idea of what is included:

    import math
    
    print(dir(math))
    
       # ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
       #  'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
       #  'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
       #  'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
       #  'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
       #  'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
       #  'tanh', 'tau', 'trunc']
    

    For example, here are some simple geometry calculations using math:

    import math
    
    print(math.pi)                           # 3.141592653589793
    
    radius = 3
    circumference = 2 * math.pi * radius     # 18.84955592153876
    
    area = math.pi * radius * radius         # 28.274333882308138
    

    Featured module: statistics

    This module provides basic statistical analysis.


    Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.


    import statistics as stats                       # set a convenient name for the module
    
    values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]
    
    # average value
    meanval = stats.mean(values)                     # 4.083333333333333
    
    # "middle" value in a list of sorted values (list does not need to be sorted)
    medianval = stats.median(values)                 # 4.0
    
    # average distance of each value from the mean
    standev = stats.stdev(values)                    # 1.781640374554423
    
    # square of the standard deviation
    varianceval = stats.variance(values)             # 3.1742424242424243
    
    # most common value
    modeval = stats.mode(values)                     # 6
    

    Featured module: string

    This module provides useful lists of characters.


    import string
    
    print(string.ascii_letters)       # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
    
    print(string.ascii_lowercase)     # abcdefghijklmnopqrstuvwxyz
    
    print(string.ascii_uppercase)     # ABCDEFGHIJKLMNOPQRSTUVWXYZ
    
    print(string.digits)              # 0123456789
    
    print(string.hexdigits)           # 0123456789abcdefABCDEF
    
    print(string.punctuation)         # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
    
    print(string.whitespace)          # \t\n\r\x0b\x0c'   (prints as invisible characters)
    

    Featured module: zipfile

    The zipfile module builds, unpacks and inspects .zip archives.


    import zipfile as zp
    
    myzip = zp.ZipFile('myzip.zip', 'w')
    
    # add names of files (of course these must exist)
    myzip.write('file1.txt')
    myzip.write('file2.pdf')
    myzip.write('file3.doc')
    
    myzip.close()                     # builds and writes zip file
    
    print('done')
    

    After running the above code and referencing real files, check this unit's files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.


    Featured module: time

    The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.


    time can be used to sleep (or pause execution) for a set number of seconds:

    import time
    
    # pause execution # of seconds
    time.sleep(5)
    

    We can also use time to show the current time:

    # current time and date
    print(time.ctime())                 # Sat May 23 17:10:55 2020
    

    At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).

    # read current time in seconds
    secs = time.time()                  # 1590257729.297496  (includes milliseconds)
    
    # calculate 24 hours, in seconds (subtract 86,400 seconds)
    yestersecs = secs - (60 * 60 * 24)
    
    # show the current time minus 24 hours
    print(time.ctime(yestersecs))          # Fri May 22 17:10:55 2020
    
    # a "time struct"
    print(time.localtime(yestersecs))
                                        # time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
                                        # tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
                                        # tm_yday=143, tm_isdst=1)
    

    The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.


    Featured module: datetime

    The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.


    import datetime as dt
    
    
    # build a 'date' object from year, month, day
    mydate1 = dt.date(2019, 9, 3)
    
    
    # build a 'date' object representing today
    mydate2 = dt.date.today()
    
    
    # build a datetime object from year, month, day, hour, minute and second
    mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)
    
    
    # build a datetime object representing right now
    mydatetime2 = dt.datetime.now()
    
    
    # build a datetime object from a formatted string
    mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')
    
    
    # build a "timedelta" (time interval) object:  3 days, 2 hours
    myinterval = dt.timedelta(days=3, seconds=7200)
    
    
    # date objects and intervals can be calculated like math
    newdate = mydatetime3 + myinterval
    
    print(newdate)                                # 2019-03-06 00:02:00
    
    
    # render a date object in a string format
    print(newdate.strftime('%Y-%m-%d  (%H:%M)'))  # 2019-03-06 (02:00)
    

    Featured module: random

    The random module generates pseudorandom numbers.


    'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.


    import random
    
    # random float from 0 to 1
    myfloat = random.random()        # 0.22845730036901912
    
    
    # random integer from 1 to 10
    num = random.randint(1, 10)
    
    
    # random choice from a list
    x = ['a', 'b', 'c']
    choice = random.choice(x)        # 'b'
    

    Featured module: csv

    The csv module reads and writes CSV files.


    import csv
    
    # reading a CSV file
    fh = open('dated_file.csv')
    reader = csv.reader(fh)
    
    for row in reader:
        print(row)
    
    fh.close()
    
    # writing to a CSV file
    wfh = open('newfile.csv', 'w', newline='')
    writer = csv.writer(wfh)
    
    writer.writerow(['a', 'b', 'c'])
    writer.writerow(['d', 'e', 'f'])
    writer.writerow(['g', 'b', 'i'])
    
    wfh.close()                 # essential - otherwise you may not see the writes until the program exits
    

    (newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits.


    Featured module: sqlite3

    The sqlite3 module allows file-based writing and reading of relational tables.


    # connecting
    import sqlite3
    
    conn = sqlite3.connect('mydatabase.db')     # open an existing, or create a new file
    
    cur = conn.cursor()
    
    
    #creating a table
    cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")
    
    
    # insert rows into a table
    rows = [
      [ 'Joe', 23, 23.9],
      [ 'Marie', 19, 7.95 ],
      [ 'Zoe', 29, 17.5 ]
    ]
    
    for row in rows:
        cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)
    
    conn.commit()                                # essential to see the write
    
    
    # selecting data from a table
    cur = conn.cursor()
    
    cur.execute('SELECT name, years, balance FROM mytable')
    
    for row in cur:
        print(row)            # ('Joe', 23, 23.9)
                              # ('Marie', 19, 7.95)
                              # ('Zoe', 29, 17.5)
    

    Featured module: requests

    requests


    requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.


    import requests
    
    # make URL request; download the response
    response = requests.get('http://www.nytimes.com')
    
    # the HTTP response code (200 OK, 404 not found, 500 error, etc.)
    status_code = response.status_code
    
    # the text of the response
    page_text =   response.text
    
    # decoding the text of the response (if necessary)
    page_text = page_text.encode('utf-8')
    
    print(f'status code:  {status_code}')
    print('======================= page text =======================')
    print(page_text)
    

    Featured module: urllib

    If requests is not available on your system, urllib provides similar functionality.


    import urllib
    
    # make URL request; download the text of the response
    read_object = urllib.request.urlopen('http://www.nytimes.com')
    
    # a file-like object, can also 'for' loop or use .readlines()
    text = read_object.read()
    
    # decoding the text of the response (if necessary)
    text = text.decode('utf-8')
    

    SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).


    import ssl
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    
    my_url = 'http://www.nytimes.com'
    read_object = urllib.request.urlopen(my_url, context=ctx)
    

    Featured module: bs4 (Beautiful Soup)

    The bs4 module can parse HTML to extract data from web pages.


    This module must be installed separately.


    import bs4
    
    fh = open('dormouse.html')
    text = fh.read()
    fh.close()
    
    soup = bs4.BeautifulSoup(text, 'html.parser')
    
    
    # show all plain text in a page
    print(soup.get_text())
    
    
    # retrieve first tag with this name (a <title> tag)
    tag = page.title
    
    
    # same, using <B>.find()
    tag = page.find('title')
    
    
    # find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
    link1_a_tags = soup.find_all('a', {'id': 'link1'})
    
    
    # find all <a> tags (hyperlinks)
    tags = soup.find_all('a')
    

    Featured module: re (regular expressions)

    The re module can recognize patterns in text and extract portions of text based on patterns.


    import re
    
    line = 'a phone number:  213-298-1990'
    
    matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)
    
    print(matchobj.group(1))   # '213-298-1990'
    

    The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.


    Featured module: subprocess

    The subprocess module allows your program to launch other programs / applications.


    import subprocess
    
    
    # execute another program; read from STDIN and write to STDOUT
    subprocess.call(['ls', 'path/to/my/dir'])
    
    
    # execute another Python script
    subprocess.call(['python', 'hello.py'])
    
    
    # execute another program and capture output
    out = subprocess.check_output(['python', 'hello.py'])
    

    Featured module: textwrap

    The textwrap module allows you to wrap text at a certain width.


    import textwrap
    
    text = "This is some really long text that we would like to wrap.  Wouldn't you know it, there's a module for that!  "
    
    
    # returns a list of lines
    # text is limited to 10 characters width
    items = textwrap.wrap(text, 10)
    
    
    # join lines together into multi-line string with new width
    print('\n'.join(items))
    

    pandas for table manipulation

    The pandas module enables table manipulations similar to those done by excel relational databases.


    The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.


    pandas can read and write to and from a multitude of formats

    import pandas as pd
    import sqlite3
    
    # read from multiple formats to a DataFrame
    df = pd.read_csv('dated_file.csv')
    # df = pd.read_excel('dated_file.xls')
    # df = pd.read_json('dated_file.json')
    
    # write DataFrame to multiple formats
    df.to_csv('new_file.csv')
    # df.to_excel('new_file.xls')
    # df.to_json('new_file.json')
    
    
    # read from database through query
    conn = sqlite3.connect('testdb.db')
    df = pd.read_sql('SELECT * FROM test', conn)
    

    pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:

    df = pd.read_csv('dated_file.csv')
    
    
    # select rows thru a filter
    df2 = df[ df[3] > 18 ]      # all rows where '3' field is > 18
    
    
    # sum, average, etc. a column
    df.tax.mean()                         # average values in 'tax' column
    df.revenue.sum()                      # sum values in 'revenue' column
    
    
    # create a new column
    df['col99'] = df.col1 + df.revenue   # new column sums 'col1' and 'revenue' field from each line
    
    
    # groupby aggregation
    dfgb = df.state.groupby.sum().revenue       # show sum of revenue for each state
    

    pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.

    # groupby bar chart
    dfgb.plot().bar()
    
    # weather temp line chart
    weather_df.temp.plot().line()
    



    User-Defined Classes and Object-Oriented Programming

    Introduction: Classes

    Classes allow us to create a custom type of object -- that is, an object with its own behaviors and its own ways of storing data. Consider that each of the objects we've worked with previously has its own behavior, and stores data in its own way: dicts store pairs, sets store unique values, lists store sequential values, etc. An object's behaviors can be seen in its methods, as well as how it responds to operations like subscript, operators, etc. An object's data is simply the data contained in the object or that the object represents: a string's characters, a list's object sequence, etc.


    Objectives for this Unit: Classes

  8. Understand what classes, instances and attributes are and why they are useful
  9. Create our own classes -- our own object types
  10. Set attributes in instances and read attributes from instances
  11. Define methods in classes that can be used by an instance
  12. Define instance initializers with __init__()
  13. Use getter and setter methods to enforce encapsulation
  14. Understand class inheritance
  15. Understand polymorphism


    Class Example: the date and timedelta object types

    First let's look at object types that demonstrate the convenience and range of behaviors of objects.


    A date object can be set to any date and knows how to calculate dates into the future or past. To change the date, we use a timedelta object, which can be set to an "interval" of days to be added to or subtracted from a date object.


    from datetime import date, timedelta
    
    dt = date(1926, 12, 30)         # create a new date object set to 12/30/1926
    td = timedelta(days=3)          # create a new timedelta object:  3 day interval
    
    dt = dt + timedelta(days=3)     # add the interval to the date object:  produces a new date object
    
    print(dt)                        # '1927-01-02' (3 days after the original date)
    
    
    dt2 = date.today()              # as of this writing:  set to 2016-08-01
    dt2 = dt2 + timedelta(days=1)   # add 1 day to today's date
    
    print(dt2)                       # '2016-08-02'
    
    print(type(dt))                  # <type 'datetime.datetime'>
    print(type(td))                  # <type 'datetime.timedelta'>
    

    Class Example: the proposed server object type

    Now let's imagine a useful object -- this proposed class will allow you to interact with a server programmatically. Each server object represents a server that you can ping, restart, copy files to and from, etc.


    import time
    from sysadmin import Server
    
    
    s1 = Server('blaikieserv')
    
    if s1.ping():
        print('{} is alive '.format(s1.hostname))
    
    s1.restart()                       # restarts the server
    
    s1.copyfile_up('myfile.txt')       # copies a file to the server
    s1.copyfile_down('yourfile.txt')   # copies a file from the server
    
    print(s1.uptime())                  # blaikieserv has been alive for 2 seconds
    

    A class block defines an instance "factory" which produces instances of the class.

    Method calls on the instance refer to functions defined in the class.


    class Greeting:
        """ greets the user """
    
        def greet(self):
            print('hello, user!')
    
    
    c = Greeting()
    
    c.greet()                    # hello, user!
    
    print(type(c))                # <class '__main__.Greeting'>
    

    Each class object or instance is of a type named after the class. In this way, class and type are almost synonymous.


    Each instance holds an attribute dictionary

    Data is stored in each instance through its attributes, which can be written and read just like dictionary keys and values.


    class Something:
        """ just makes 'Something' objects """
    
    obj1 = Something()
    obj2 = Something()
    
    obj1.var = 5             # set attribute 'var' to int 5
    obj1.var2 = 'hello'      # set attribute 'var2' to str 'hello'
    
    obj2.var = 1000          # set attribute 'var' to int 1000
    obj2.var2 = [1, 2, 3, 4] # set attribute 'var2' to list [1, 2, 3, 4]
    
    
    print(obj1.var)           # 5
    print(obj1.var2)          # hello
    
    print(obj2.var)           # 1000
    print(obj2.var2)          # [1, 2, 3, 4]
    
    obj2.var2.append(5)      # appending to the list stored to attribute var2
    
    print(obj2.var2)          # [1, 2, 3, 4, 5]
    

    In fact the attribute dictionary is a real dict, stored within a "magic" attribute of the instance:

    print(obj1.__dict__)      # {'var': 5, 'var2': 'hello'}
    
    print(obj2.__dict__)      # {'var': 1000, 'var2': [1, 2, 3, 4, 5]}
    

    The class also holds an attribute dictionary

    Data can also be stored in a class through class attributes or through variables defined in the class.


    class MyClass:
        """ The MyClass class holds some data """
    
        var = 10              # set a variable in the class (a class variable)
    
    
    MyClass.var2 = 'hello'    # set an attribute directly in the class object
    
    print(MyClass.var)         # 10      (attribute was set as variable in class block)
    print(MyClass.var2)        # 'hello' (attribute was set as attribute in class object)
    
    print(MyClass.__dict__)    # {'var': 10,
                              #  '__module__': '__main__',
                              #  '__doc__': ' The MyClass class holds some data ',
                              #  'var2': 'hello'}
    

    The additional __module__ and __doc__ attributes are automatically added -- __module__ indicates the active module (here, that the class is defined in the script being run); __doc__ is a special string reserved for documentation on the class).


    object.attribute lookup tries to read from object, then from class

    If an attribute can't be found in an object, it is searched for in the class.


    class MyClass:
        classval = 10         # class attribute
    
    a = MyClass()
    b = MyClass()
    
    b.classval = 99         # instance attribute of same name
    
    print(a.classval)        # 10 - still class attribute
    print(b.classval)        # 99 - instance attribute
    
    del b.classval          # delete instance attribute
    
    print(b.classval)        # 10 -- now back to class attribute
    
    print(MyClass.classval)  # 10 -- class attributes are accessible through Class as well
    

    Method calls pass the instance as first (implicit) argument, called self

    Object methods or instance methods allow us to work with the instance's data.


    class Do:
        def printme(self):
            print(self)      # <__main__.Do object at 0x1006de910>
    
    x = Do()
    
    print(x)                 # <__main__.Do object at 0x1006de910>
    x.printme()
    

    Note that x and self have the same hex code. This indicates that they are the very same object.


    Instance methods / object methods and instance attributes: changing instance "state"

    Since instance methods pass the instance, and we can store values in instance attributes, we can combine these to have a method modify an instance's values.


    class Sum:
        def add(self, val):
            if not hasattr(self, 'x'):
                self.x = 0
            self.x = self.x + val
    
    myobj = Sum()
    myobj.add(5)
    myobj.add(10)
    
    print(myobj.x)      # 15
    

    Instances are often modified using getter and setter methods

    These methods are used to read and write instance attributes in a controlled way.


    class Counter:
        def setval(self, val):     # arguments are:  the instance, and the value to be set
            if not isinstance(val, int):
                raise TypeError('arg must be a string')
    
            self.value = val        # set the value in the instance's attribute
    
        def getval(self):          # only one argument:  the instance
            return self.value       # return the instance attribute value
    
        def increment(self):
            self.value = self.value + 1
    
    a = Counter()
    b = Counter()
    
    a.setval(10)       # although we pass one argument, the implied first argument is a itself
    
    a.increment()
    a.increment()
    
    print(a.getval())   # 12
    
    
    b.setval('hello')  # TypeError
    

    __init__() is automagically called when a new instance is created

    The initializer of an instance allows us to set the initial attribute values of the instance.


    class MyCounter:
        def __init__(self, initval):   # self is implied 1st argument (the instance)
            try:
                initval = int(initval)     # test initval to be an int,
            except ValueError:           # set to 0 if incorrect
                initval = 0
            self.value = initval         # initval was passed to the constructor
    
        def increment_val(self):
            self.value = self.value + 1
    
        def get_val(self):
            return self.value
    
    a = MyCounter(0)
    b = MyCounter(100)
    
    a.increment_val()
    a.increment_val()
    a.increment_val()
    
    b.increment_val()
    b.increment_val()
    
    print(a.get_val())    # 3
    print(b.get_val())    # 102
    

    Classes can be organized into an an inheritance tree

    When a class inherits from another class, attribute lookups can pass to the parent class when accessed from the child.


    class Animal:
        def __init__(self, name):
            self.name = name
        def eat(self, food):
            print('{} eats {}'.format(self.name, food))
    
    class Dog(Animal):
        def fetch(self, thing):
            print('{} goes after the {}!'.format(self.name, thing))
    
    class Cat(Animal):
        def swatstring(self):
            print('{} shreds the string!'.format(self.name))
        def eat(self, food):
            if food in ['cat food', 'fish', 'chicken']:
                print('{} eats the {}'.format(self.name, food))
            else:
                print('{}:  snif - snif - snif - nah...'.format(self.name))
    
    d = Dog('Rover')
    c = Cat('Atilla')
    
    d.eat('wood')                 # Rover eats wood.
    c.eat('dog food')             # Atilla:  snif - snif - snif - nah...
    

    Conceptually similar methods can be unified through polymorphism

    Same-named methods in two different classes can share a conceptual similarity.


    class Animal:
        def __init__(self, name):
            self.name = name
        def eat(self, food):
            print('{} eats {}'.format(self.name, food))
    
    class Dog(Animal):
        def fetch(self, thing):
            print('{} goes after the {}!'.format(self.name, thing))
        def speak(self):
            print('{}:  Bark!  Bark!'.format(self.name))
    
    class Cat(Animal):
        def swatstring(self):
            print('{} shreds the string!'.format(self.name))
        def eat(self, food):
            if food in ['cat food', 'fish', 'chicken']:
                print('{} eats the {}'.format(self.name, food))
            else:
                print('{}:  snif - snif - snif - nah...'.format(self.name))
        def speak(self):
            print('{}:  Meow!'.format(self.name))
    
    for a in (Dog('Rover'), Dog('Fido'), Cat('Fluffy'), Cat('Precious'), Dog('Rex'), Cat('Kittypie')):
        a.speak()
    
                       # Rover:  Bark!  Bark!
                       # Fido:  Bark!  Bark!
                       # Fluffy:  Meow!
                       # Precious:  Meow!
                       # Rex:  Bark!  Bark!
                       # Kittypie:  Meow!
    

    Static Methods and Class Methods

    A class method can be called through the instance or the class, and passes the class as the first argument. We use these methods to do class-wide work, such as counting instances or maintaining a table of variables available to all instances. A static method can be called through the instance or the class, but knows nothing about either. In this way it is like a regular function -- it takes no implicit argument. We can think of these as 'helper' functions that just do some utility work and don't need to involve either class or instance.


    class MyClass:
    
        def myfunc(self):
            print("myfunc:  arg is {}".format(self))
    
        @classmethod
        def myclassfunc(klass):      # we spell it differently because 'class' will confuse the interpreter
            print("myclassfunc:  arg is {}".format(klass))
    
        @staticmethod
        def mystaticfunc():
            print("mystaticfunc: (no arg)")
    
    a = MyClass()
    
    a.myfunc()             # myfunc:  arg is <__main__.MyClass instance at 0x6c210>
    
    MyClass.myclassfunc()  # myclassfunc:  arg is __main__.MyClass
    a.myclassfunc()        # [ same ]
    
    a.mystaticfunc()       # mystaticfunc: (no arg)
    

    Here is an example from Learning Python, which counts instances that are constructed:


    class Spam:
    
        numInstances = 0
    
        def __init__(self):
            Spam.numInstances += 1
    
        @staticmethod
        def printNumInstances():
            print("instances created:  ", Spam.numInstances)
    
    s1 = Spam()
    s2 = Spam()
    s3 = Spam()
    
    Spam.printNumInstances()        # instances created:  3
    s3.printNumInstances()          # instances created:  3
    

    Class methods are often used as class "Factories", producing customized objects based on preset values. Here's an example from the RealPython blog that uses a class method as a factory method to produce variations on a Pizza object:


    class Pizza:
        def __init__(self, ingredients):
            self.ingredients = ingredients
    
        def __repr__(self):
            return f'Pizza({self.ingredients!r})'
    
        @classmethod
        def margherita(cls):
            return cls(['mozzarella', 'tomatoes'])
    
        @classmethod
        def prosciutto(cls):
            return cls(['mozzarella', 'tomatoes', 'ham'])
    
    
    marg = Pizza.margherita()
    print(marg.ingredients)       # ['mozzarella', 'tomatoes']
    
    schute = Pizza.prosciutto()
    print(schute.ingredients)     # ['mozzarella', 'tomatoes']
    



    [pr]