Python 3home |
Python's popularity is due to its elegance and simplicity.
Do other languages have a manifesto like this one?
The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
I am dedicated to student success.
Prior exposure to Python is helpful, but not required.
You do not have to know anything about Python or programming, but some personal qualities will be very helpful. These are "soft skills" that will benefit you greatly as you proceed:
If you already have an editor and Python installed, you do not need to add these.
Please keep in mind that if you are already able to write and run Python programs, you only need to add the class files.
The zip file contains all files needed for our course exercises.
python_data/ ├── session_00_test_project/ ├── session_01_objects_types/ ├──── inclass_exercises/ │ ├── inclass_1.1.py │ ├── inclass_1.2.py │ ├── ..etc.. │ ├── inclass_1.6_lab.py │ ├── inclass_1.7_lab.py │ ├── ..etc.. ├──── notebooks_inclass_warmup/ ├── session_02_funcs_condits_methods/ ├──── inclass_exercises/ │ ├── inclass_2.1.py │ ├── inclass_2.2.py │ ├── ..etc.. ├── session_03_strings_lists_files/ ├── session_04_containers_lists_sets/ ├── ..etc.. └── session_10_classes/
Computers can do many different things for us.
Think about what our computers do for us:
At base, computers really only do three things.
Python can do many things, but we will focus on the first item -- working with data. The main purpose of any programming language is to allow us to store data in memory and then process that data according to our needs.
A programming language like Python is designed to allow us to give instructions to our computer.
The Interpreter is Python Itself.
When we run a python program, the Interpeter takes these three steps.
Python is very smart in some ways.
Python is not smart in some ways, too!
We should seek to understand what the Interpreter is telling us.
This learning is not just about making programs work -- it's about understanding the interpreter -- what it can and can't do.
A folder in PyCharm is known as a 'project'.
Open a Folder, which will correspond to a new workspace.
Add a new file.
print('hello, world!') print()Take care when reproducing the above script - every character must be in its place. (The print() at end is to clarify the Terminal output.) Next, we'll execute the script.
PyCharm may be able to run your script, or some configuration may be required.
Attempt to run your script.
'Without error' means Python did everything you asked.
On my Mac, I see this output:
hello, world! Process finished with exit code 0
When you see the terminal prompt repeated, it means that the script has completed executing
An 'exception' is when Python cannot, or will not, do everything you asked in you program.
To demonstrate an exception, I removed one character from my code. Here is the result:
File "/Users/david/test_project/test.py", line 2 print('hello, world!) ^ SyntaxError: unterminated string literal (detected at line 2)
How should we read our exception?
Throughout this course I will repeatedly stress that you must identify the exception type, pinpoint the error to the line, and seek to understand the error in terms of the exception type, and where Python says it occurred.
Some element of the code is misplaced or missing.
print('hello, world!)
print()
File "/Users/david/test_project/test.py", line 2 print('hello, world!) ^ SyntaxError: unterminated string literal (detected at line 2)
How do we respond to a SyntaxError? First by understanding that there's something missing or out of place in the syntax (the proper placement of language elements -- brackets, braces, parentheses, quotes, etc.) We look at the syntax on the line, and compare it to similar examples in other code that we've seen. Careful comparison between our code and working code will usually show us what's missing or misplaced. In the example above, the first print() statement is missing a quotation mark. It might be hard to see at first, but eventually you will develop "eyes" for this kind of error.
Use hash marks to comment individual lines; blank lines are ignored.
# this program adds numbers
var1 = 5
var2 = 2
var3 = var1 + var2 # add these numbers together
# these lines modify the value further
# var3 = var3 * 2
# var3 = var3 / 20
print(var3)
We will use some exercises for demos in class; you will use them to practice you skills, and prepare for tests.
python_data/ ├── session_00_test_project/ ├── session_01_objects_types/ ├──── inclass_exercises/ │ ├── inclass_1.1.py │ ├── inclass_1.2.py │ ├── ..etc.. │ ├── inclass_1.6_lab.py │ ├── inclass_1.7_lab.py │ ├── ..etc.. ├──── notebooks_inclass_warmup/ ├── session_02_funcs_condits_methods/ ├──── inclass_exercises/ │ ├── inclass_2.1.py │ ├── inclass_2.2.py │ ├── ..etc.. ├── session_03_strings_lists_files/ ├── session_04_containers_lists_sets/ ├── ..etc.. └── session_10_classes/
The exercises come in two forms:
A variable is a value assigned ("bound") to an object.
xx = 10 # assign 10 to xx
yy = 2
zz = xx * yy # compute 10 * 2 and assign integer 20 to variable yy
print(zz) # print 20 to screen
xx is a variable bound to 10 = is an assignment operator assigning 10 to xx yy is another variable bound to 2 * is a multiplication operator computing its operands (10 and 2) zz is bound to the product, 20 print() is a function that renders its argument to the screen.
early on we need to distinguish between a variable and a literal.
xx = 10 # assign 10 to xx
yy = 2
zz = xx * yy # compute 10 * 2 and assign integer 20 to variable yy
print(zz) # print 20 to screen
early on we need to distinguish between a variable and a literal.
xx = 10 # assign 10 to xx
yy = 2
zz = xx * yy # compute 10 * 2 and assign integer 20 to variable yy
print zz # print 20 to screen
An object is a data value of a particular type.
Every data value in Python is an object.
var_int = 100 # assign integer object 100 to variable var_int
var2_float = 100.0 # assign float object 100.0 to variable var2_float
var3_str = 'hello!' # assign str object 'hello' to variable var3_str
# NOTE: 'hash mark' comments are ignored by Python.
At every point you must be aware of the type and value of every object in your code.
The three object types we'll look at in this unit are int, float and str. They are the "atoms" of Python's data model.
data type | known as | description | example value |
---|---|---|---|
int | integer | a whole number | 5 |
float | float | a floating-point number | 5.03 |
str | string | a character sequence, i.e. text | 'hello, world!' |
The string has 3 ways to enquote -- all produce a string.
s1 = 'hello, quote'
s2 = "hello, quote"
s3 = """hello, quote # multi-line strings can be expressed with triple-quotes
Sincerely, Python"""
s4 = 'He said "yes!"' # using single quotes to enquote double quotes
s5 = "Don't worry about that." # using double quotes to enquote a single quote
The way a variable is written in the code determines type.
It's vital that we always be aware of type.
myint = 5
myfloat = 5.0
mystr = '5.0'
Other languages (like Java and C) use explicit type declarations to indicate type, for example int a = 5. But Python does not do this.
The way a variable is written in the code determines type.
It's vital that we always be aware of type.
myint = 5 # written as a whole number: int
myfloat = 5.0 # written with a decimal point: float
mystr = '5.0' # written with quotes: str
Other languages (like Java and C) use explicit type declarations to indicate type, for example int a = 5. But Python does not do this.
Printing is usually not enough to determine type, since a string can look like any object.
myint = 5
myfloat = 5.0
mystr = '5.0'
print(myint) # 5
print(myfloat) # 5.0
print(mystr) # 5.0
mystr looks like a float, but it is a str.
If we're not sure, we can always have Python tell us an object's type.
myint = 5
myfloat = 5.0
mystr = '5.0'
print(type(myint)) # <class 'int'>
print(type(myfloat)) # <class 'float'>
print(type(mystr)) # <class 'str'>
This means that what an object can do is defined by its type.
a = 5 # int, 5
b = 10.0 # float, 10.0
c = '10.0' # str, '10.0'
x = a + b # 15.0 (adding int to float)
y = a + c # TypeError (cannot add int to str!)
Even though the value '10.0' looks like a number, it is of type str. Python will not add an int to a str.
You must follow correct style even though Python does not always require it.
name = 'Joe'
age = 29
my_wordy_variable = 100
student3 = 'jg61'
Math operators behave as you might expect.
var_int = 5
var2_float = 10.3
var3_float = var_int + var2_float # int plus a float: 15.3, a float
var4_float = var3_float - 0.3 # float minus a float: 15.0, a float
var5_float = var4_float / 3 # float divided by an int: 5.0, a float
Every operation or function call results in a predictable type.
With two integers, the result is integer. If a float is involved, it's always flot.
vari = 7
vari2 = 3
varf = 3.0
var3 = var * var2 # 35, an int.
var4 = var + var2 # 10.0, a float
When an integer is divided into another integer, the result is always a float.
var = 7
var2 = 3
var3 = var / var2 # 2.3333, a float
The exponentiation operator (**) raises its left operand to the power of its right operand and returns the result as a float or int.
var = 11 ** 2 # "eleven raised to the 2nd power (squared)"
print(var) # 121
var = 3 ** 4
print(var) # 81
The modulus operator (%) shows the remainder that would result from division of two numbers.
var = 11 % 2 # "eleven modulo two"
print(var) # 1 (11/2 has a remainder of 1)
var2 = 10 % 2 # "ten modulo two"
print(var2) # 0 (10/2 divides evenly: remainder of 0)
The plus operator (+) with two strings returns a concatenated string.
aa = 'Hello, '
bb = 'World!'
cc = aa + bb # 'Hello, World!'
Note that this is the same operator (+) that is used with numbers for summing. Python uses the type of the operands (values on either side of the operator) to determine behavior and result.
The "string repetition operator" (*) creates a new string with the operand string repeated the number of times indicated by the other operand:
aa = '!'
bb = 5
cc = aa * bb # '!!!!!!'
Note that this is the same operator (*) that is used with numbers for multiplication. Python uses the type of the operands to determine behavior and result.
Object types determine behavior.
int or float "added" to int or float: addition
tt = 5 # assign an integer value to tt
zz = 10.0 # assign a float value to zz
qq = tt + zz # compute 5 plus 10 and assign float 15.0 to qq
str "added" to str: concatenation
kk = '5' # assign a str value (quotes mean str) to kk
rr = '10.0' # assign a str value to rr
mm = kk + rr # concatenate '5' and '10.0'
# to construct a new str object, assign to mm
print(mm) # '510.0'
Again, object types determine behavior.
int or float "multipled" by int or float: multiplication
tt = 5 # assign an integer value to tt
zz = 10 # assign an integer value to zz
qq = tt * zz # compute 5 times 10 and assign integer 50 to qq
print(qq) # 50, an int
str "multiplied" by int: string repetition
aa = '5'
bb = 3
cc = aa * bb # '555'
Built-in functions activate functionality when they are called.
aa = 'hello' # str, 'hello'
bb = len(aa) # pass string object aa as an argument to function len(),
# which returns an integer object as a return value.
print(bb) # int, 5
The len() function takes a string argument and returns an integer -- the length of (number of characters in) the string.
varx = 'hello, world!'
vary = len(varx) # 13
The round() function takes a float argument and returns another float, rounded to the specified decimal place.
aa = 5.9583
bb = round(aa, 2) # 5.96
cc = round(aa) # 6
Some floating-point operations will result in a number with a small remainder:
x = 0.1 + 0.2
print(x) # 0.30000000000000004 (should be 0.3?)
y = 0.1 + 0.1 + 0.1 - 0.3
print(y) # 5.551115123125783e-17 (should be 0.0?)
The solution is to round any result
x = 0.1 + 0.2 # 0.30000000000000004
z = round(x, 1)
print(z) # 0.3
This function allows us to enter data into the program through the keyboard.
cc = input('enter name: ') # program pauses! Now the user types something
print(cc) # [a string, whatever the user typed]
The exit() function terminates execution immediately. An optional string argument can be passed as an error message.
aa = input('to quit, press "q" ')
if aa == 'q':
exit(0) # 0 indicates a successful termination (no error)
if aa == '': # if user typed nothing and hit [Return]
exit('error: input required') # string argument passed to exit()
# indicates an error led to termination
Note: the above examples make use of if, which we will cover in a later lesson.
This function can be used as a temporary stop to the program if we'd like to isolate some statements.
We can also use exit() to simply stop program execution in order to debug:
aa = '55'
bb = float(aa)
print('type of bb is:')
print((type(bb)))
exit() # we inserted this to stop the code
# from continuing; we'll remove it later
cc = bb * 2 # because of exit() above, this code
# will not be reached
This function converts an appropriate value to the int type.
# str -> int
aa = '55'
bb = int(aa) # 55 (an int)
print(type(bb)) # <class 'int'>
# float -> int
var = 5.95
var2 = int(var) # 5: the rest is lopped off (not rounded)
The conversion functions are named after their types -- they take an appropriate value as argument and return an object of that type.
This function converts an appropriate value to the float type.
# int -> float
xx = 5
yy = float(xx) # 5.0
# str -> float
var = '5.95'
var2 = float(var) # 5.95 (a float)
This function converts any value to the str type.
var = 5
var2 = 5.5
svar = str(var) # '5'
svar2 = str(var2) # '5.5'
print(len(svar)) # 1
print(len(svar2)) # 3
Because Python is strongly typed, conversions can be necessary.
Numeric data sometimes arrives as strings (e.g. from input() or a file). Use int() or float() to convert to numeric types.
aa = input('enter number and I will double it: ')
print(type(aa)) # <class 'str'>
num_aa = int(aa) # int() takes the user's input as an argument
# and returns an integer
print(num_aa * 2) # prints the user's number doubled
You can use int() and float() to convert strings to numbers.
It's important for early coders to follow existing syntax and not make up their own.
Imagine that would like to find the length of a string. What do you do? Some students being writing code off the top of their head, even though they are not completely familiar with the right syntax
they may write something like this...
var = 'hello'
mylen = var.len() # or mylen = length('var')
# or mylen = lenth(var)
...and then run it, only to get a strange error that's difficult to diagnose.
When you want to use a Python feature, you must follow an existing example -- you must not improvise!
Let's say you have a string and you'd like to get its length:
s = "this is a string I'd like to measure" # determine length (36)
You look up the function in a reference, like pythonreference.com:
mylen = len('hello')
Then you use the feature syntax very carefully:
slen = len(s) # int, 36
However, your code will be slightly different from the example code:
early on we need to distinguish between a variable and a literal.
xx = 10 # assign 10 to xx
yy = 2
zz = xx * yy # compute 10 * 2 and assign integer 20 to variable yy
print(zz) # print 20 to screen
early on we need to distinguish between a variable and a literal.
xx = 10
yy = 2
zz = xx * yy
print(zz)
Here's a common error that beginners make - try to avoid it!
Going back to our previous example - you'd like to use len() to measure this string:
s = "this is a string I'd like to measure" # determine length (36)
You look up the function in a reference, like pythonreference.com:
mylen = len('hello')
You have been told to make your syntax match the example's. But should you do this?
slen = len('s') # int, 1
You were expecting a length of 36, but you got a length of 1. Can you see why? The variable s points to a long string. The literal 's' is just a one-character string. In trying to match the example code, you may have thought you The takeaway is this: anyplace a literal is used, a variable can be used instead; and anyplace a variable is used, a literal can be used instead.
All programs must make decisions during execution.
Consider these decisions by programs you know:
Conditional statements allow any program to make the decisions it needs to do its work.
The if statement executes code in its block only if the test is True.
aa = input('please enter a positive integer: ')
int_aa = int(aa)
if int_aa < 0: # test: is this a True statement?
print('error: input invalid') # block (2 lines) -- lines are
exit() # executed only if test is True
d_int_aa = int_aa * 2 # double the value
print('your value doubled is ' + str(d_int_aa))
The two components of an if statement are the test and the block. The test determines whether the block will be executed.
An else statement will execute its block if the if test before it was not True.
xx = input('enter an even or odd number: ')
yy = int(xx)
if yy % 2 == 0: # can 2 divide into yy evenly?
print(xx + ' is even')
print('congratulations.')
else:
print(xx + ' is odd')
print('you are odd too.')
Therefore we can say that only one block of an if/else statement will execute.
elif is also used with if (and optionally else): you can chain additional conditions for other behavior.
zz = input('type an integer and I will tell you its sign: ')
zyz = int(zz)
if zyz > 0:
print('that number is positive')
elif zyz < 0:
print('that number is negative')
else:
print('0 is neutral')
A code block is marked by indented lines. The end of the block is marked by a line that returns to the prior indent.
xx = input('enter an even or odd number: ') # not in any block
yy = int(xx) # ditto
if yy % 2 == 0: # the start of the 'if' block
print('your number is even')
print('even is cool') # last line of the 'if' block
else: # the start of the 'else' block
print('your number is odd')
print('you are cool') # last line of the 'else' block
print('thanks for playing "even/odd number"') # not in any block
Note also that a block is preceded by an unindented line that ends in a colon.
Blocks can be nested within one another. A nested block (a "block within a block") simply moves the code block further to the right.
var_a = int(input('enter a number: '))
var_b = int(input('enter another number: '))
if var_b >= var_a: # compare int values for truth
print("the test was true")
print("var b is at least as large")
if var_a == var_b: # if the two values are equivalent
print('the two values are equivalent')
print("now we're in the outer block but not in the inner block")
print('this gets printed in any case (i.e., not part of either block)')
Complex decision trees using 'if' and 'else' is the basis for most programs.
>, <, <=, >= tests with numbers work as you might expect.
var = 5
var2 = 3.3
if var >= var2:
print('var is greater or equal')
if var == var2:
print('they are equivalent')
With strings, this operator tests to see if two strings are identical.
var = 'hello'
var2 = 'hello'
if var == var2:
print('these are equivalent strings')
'in' with strings allows you can to see if a 'substring' appears within a string.
article = 'The market rallied, buoyed by a rise in Samsung Electronics. The other...'
if 'Samsung' in article:
print('Samsung was found')
Python uses the operator and to combine tests: both must be True.
The 'and' compound statement if both tests are True, the entire statement is True.
xx = input('what is your ID? ')
yy = input('what is your pin? ')
if xx == 'dbb212' and yy == '3859':
print('you are a validated user')
else:
print('you are not validated')
Note the lack of parentheses around the tests -- if the syntax is unambiguous, Python will understand. We can use parentheses to clarify compound statements like these, but they often aren't necessary. You should avoid parentheses wherever you can.
Python uses the operator or to combine tests: either can be True.
The 'or' compound statement if either test is True, the entire statement is True.
aa = input('please enter "q" or "quit" to quit: ')
if aa == 'q' or aa == 'quit':
exit()
print('continuing...')
Note the lack of parentheses around the tests -- if the syntax is unambiguous, Python will understand. We can use parentheses to clarify compound statements like these, but they often aren't necessary. You should avoid parentheses wherever you can.
Bogth sides of an 'or' or 'and' must be complete tests.
if aa == 'q' or aa == 'quit': # not "if aa == 'q' or 'quit'""
exit()
Note the 'or' test above -- we would not say if aa == 'q' or 'quit'; this would always succeed (for reasons discussed later).
We can also test a variable against multiple values by using in with a list (more on lists next week):
if aa in ['q', 'quit']:
exit()
You can negate a test with the not keyword.
var_a = 5
var_b = 10
if not var_a > var_b:
print("var_a is not larger than var_b (well - it isn't).")
Of course this particular test can also be expressed by replacing the comparison operator > with <=, but when we learn about new True/False condition types we'll see how this operator can come in handy.
True and False are boolean values (type bool), and are produced by expressions that can be seen as True or False.
aa = 3
bb = 5
if aa > bb:
print("that is true")
Tests are actually expressions that resolve to True or False, which are values of boolean type:
var = 5
var2 = 10
xx = (5 > 3)
print(xx) # True
print(type(xx)) # <class 'bool'>
Note that we would almost never assign comparisons like these to variables, but we are doing so here to illustrate that they resolve to boolean values.
We reassign the value of an integer to effect an incrementing.
x = 0 # int, 0
x = x + 1 # int, 1
x = x + 1 # int, 2
x = x + 1 # int, 3
print(x) # 3
For each of the three incrementing statements above, a new value that equals the value of x is created, and then assigned back to x. The previous value of x is replaced with the new, incremented value. Incrementing is most often used for counting within loops -- see next.
A while test causes Python to loop through a block repetitively, as long as the test is True.
This program prints each number between 0 and 4
cc = 0 # initialize a counter
while cc < 5: # "if test is True, enter the block"
print(cc)
cc = cc + 1 # "increment" cc: add 1 to its current value
# WHEN WE REACH THE END OF THE BLOCK,
# JUMP BACK TO THE while TEST
print('done')
The block is executing the print and cc = cc + 1 lines multiple times - again and again until the test becomes False. Of course, the value being tested must change as the loop progresses - otherwise the loop will cycle indefinitely (infinite loop).
while loops have 3 components: the test, the block, and the automatic return.
cc = 10
while cc > 0: # the test (if True, enter the block)
print(cc) # the block (execute as regular Python statements)
cc = cc - 1
# the automtic return [invisible!]
# (at end of block, go back to the test)
print('done')
break is used to exit a loop regardless of the test condition.
xx = 0
while xx < 10:
answer = input("do you want loop to break? ")
if answer == 'y':
break # drop down below the block
print('Hello, User')
xx = xx + 1
print('I have now greeted you ' + str(xx) + ' times')
print("ok, I'm done")
The continue statement jumps program flow to next loop iteration.
x = 0
while x < 10:
x = x + 1
if x % 2 != 0: # will be True if x is odd
continue # jump back up to the test and test again
print(x)
Note that print(x) will not be executed if the continue statement comes first. Can you figure out what this program prints?
while with True and break provide us with a handy way to keep looping until we feel like stopping.
while True:
var = input('please enter a positive integer: ')
if int(var) > 0:
break
else:
print('sorry, try again')
print('thanks for the integer!')
Note the use of True in a while expression: since True is always True this test will be always be True, and cause program flow to enter (and re-enter) the block. Therefore the break statement is essential to keep this loop from looping indefinitely.
Use print() statements to give visibility to your code execution.
The output of the code should be the sum of all numbers from 0-10, or 55:
revcounter = 0
while revcounter < 10:
varsum = 0
revcounter = revcounter + 1
varsum = varsum + revcounter
print("loop iteration complete")
print("revcounter value: ", revcounter)
print("varsum value: ", varsum)
input('pausing...')
print()
print()
print(varsum) # 10
I've added quite a few statements, but if you run this example you will be able to get a hint as to what is happening:
loop iteration complete revcounter value: 1 varsum value: 1 pausing... # here I hit [Return] to continue loop iteration complete revcounter value: 2 varsum value: 2 pausing... # [Return]
So the solution is to initialize varsum before the loop and not inside of it:
revcounter = 0
varsum = 0
while revcounter < 10:
revcounter = revcounter + 1
varsum = varsum + revcounter
print(varsum)
This outcome makes more sense. We might want to check the total to be sure, but it looks right. The hardest part of learning how to code is in designing a solution. This is also the hardest part to teach! But the last thing you want to do in response is to guess repeatedly. Instead, please examine the outcome of your code through print statements, see what's happening in each step, then compare this to what you think should be happening. Eventually you'll start to see what you need to do. Step-by-baby-step!
Objects are capable of behaviors, which are expressed as methods.
Use object methods to process object values
var = 'Hello, World!'
var2 = var.replace('World', 'Mars') # replace substring, return a str
print(var2) # Hello, Mars!
Methods are type-specific functions that are used only with a particular type.
Compare method syntax to function syntax.
mystr = 'HELLO'
x = len(mystr) # int, 5
y = mystr.count('L') # int, 2
print(y) # 2
Methods and functions are both called (using the parentheses after the name of the function or method). Both also may take an argument and/or may return a return value.
This "transforming" method returns a new string with a string's value uppercased.
upper() string method
var = 'hello'
newvar = var.upper()
print(newvar) # 'HELLO'
This "transforming" method returns a new string with a string's value uppercased.
lower() string method
var = 'Hello There'
newvar = var.lower()
print(newvar) # 'hello there'
this "transforming" method returns a new string based on an old string, with specified text replaced.
var = 'My name is Joe'
newvar = var.replace('Joe', 'Greta') # str, 'My name is Greta'
print(newvar) # My name is Greta
This method takes two arguments, the search string and replace string.
This "inspector" method returns True if a string is all digits.
mystring = '12345'
if mystring.isdigit():
print("that string is all numeric characters")
if not mystring.isdigit():
print("that string is not all numeric characters")
Since it returns True or False, inspector methods like isdigit() are used in an if or while expression. To test the reverse (i.e. "not all digits"), use if not before the method call.
This "inspector" method returns True if a string starts with or ends with a substring.
bb = 'This is a sentence.'
if bb.endswith('.'):
print("that line had a period at the end")
This "inspector"method returns True if the string starts with a substring.
cc = input('yes? ')
if cc.startswith('y') or cc.startswith('Y'):
print('thanks!')
else:
print("ok, I guess not.")
This "inspector" method returns a count of occurrences of a substring within a string.
aa = 'count the substring within this string'
bb = aa.count('in')
print(bb) # 3 (the number of times 'in' appears in the string)
This "inspector" method returns the character position of a substring within a string.
xx = 'find the name in this string'
yy = xx.find('name')
print(yy) # 9 -- the 10th character in mystring
An f'' string allows us to embed any value such as numbers into a new, completed string.
aa = 'Jose'
var = 34
# 2 arguments to replace 2 {} tokens
bb = f'{aa} is {var} years old.'
print(bb) # Jose is 34 years old.
An f'' string allows us to embed any value such as numbers into a new, completed string.
overview of formatting
# text padding and justification # :<15 # left justify width # :>10 # right justify width # :^8 # center justify width # numeric formatting :f # as float (6 places) :.2f # as float (2 places) :, # 000's commas :,.2f # 000's commas with float to 2 places
examples
x = 34563.999999
f'hi: {x:<30}' # 'hi: 34563.999999 '
f'hi: {x:>30}' # 'hi: 34563.999999'
f'hi: {x:^30}' # 'hi: 34563.999999 '
f'hi: {x:f}' # 'hi: 34563.999999'
f'hi: {x:.2f}' # 'hi: 34564.00'
f'hi: {x:,}' # 'hi: 34,563.999999'
f'hi: {x:,.2f}' # 'hi: 34,564.00'
Please note that f'' strings are available only as of Python 3.6.
The return value of an expression can be used in another expression.
letters = "aabbcdefgafbdchabacc"
vara = letters.count("a") # 5
varb = len(letters) # 20
varc = vara / varb # 5 / 20, or 0.25
vard = varc * 100 # 25
print(len(letters) / letters.count("a") * 100) # statements combined
The CSV format will allow us to explore Python's text parsing tools.
comma-separated values file (CSV)
19260701,0.09,0.22,0.30,0.009 19260702,0.44,0.35,0.08,0.009 19270103,0.97,0.21,0.24,0.010
Tables consist of records (rows) and fields (column values).
Tabular text files are organized into rows and columns.
comma-separated values file (CSV)
19260701,0.09,0.22,0.30,0.009 19260702,0.44,0.35,0.08,0.009 19270103,0.97,0.21,0.24,0.010 19270104,0.30,0.15,0.73,0.010 19280103,0.43,0.90,0.20,0.010 19280104,0.14,0.47,0.01,0.010
space-separated values file
19260701 0.09 0.22 0.30 0.009 19260702 0.44 0.35 0.08 0.009 19270103 0.97 0.21 0.24 0.010 19270104 0.30 0.15 0.73 0.010 19280103 0.43 0.90 0.20 0.010 19280104 0.14 0.47 0.01 0.010
Text files are just sequences of characters. Commas and newline characters separate the data.
If we print a CSV text file, we may see this:
19260701,0.09,0.22,0.30,0.009 19260702,0.44,0.35,0.08,0.009 19270103,0.97,0.21,0.24,0.010 19270104,0.30,0.15,0.73,0.010 19280103,0.43,0.90,0.20,0.010 19280104,0.14,0.47,0.01,0.010
However, here's what a text file really looks like under the hood:
19260701,0.09,0.22,0.30,0.009\n19260702,0.44,0.35,0.08, 0.009\n19270103,0.97,0.21,0.24,0.010\n19270104,0.30,0.15, 0.73,0.010\n19280103,0.43,0.90,0.20,0.010\n19280104,0.14, 0.47,0.01,0.010
Looping through file line strings, we can split and isolate fields on each line.
The process: 1. Open the file for reading. 2. Use a for loop to read each line of the file, one at a time. Each line will be represented as a string. 3. Remove the newline from the end of each string with .rstrip 4. Divide (using .split()) the string into fields. 5. Read a value from one of the fields, representing the data we want. 6. As the loop progresses, build a sum of values from each line. We will begin by reviewing each feature necessary to complete this work, and then we will begin to put it all together.
This method can remove any character from the right side of a string.
When no argument is passed, the newline character (or any "whitespace" character) is removed from the end of the line:
line_from_file = 'jw234,Joe,Wilson\n'
stripped = line_from_file.rstrip() # str, 'jw234,Joe,Wilson'
When a string argument is passed, that character is removed from the end of the ine:
line_from_file = 'I have something to say.'
stripped = line_from_file.rstrip('.') # str, 'I have something to say'
This method divides a delimited string into a list.
line_from_file = 'jw234:Joe:Wilson:Smithtown:NJ:2015585894\n'
xx = line_from_file.split(':')
print(xx) # ['jw234', 'Joe', 'Wilson',
# 'Smithtown', 'NJ', '2015585894\n']
We can also thing of a string as delimited by spaces.
gg = 'this is a file with some whitespace'
hh = gg.split() # splits on any "whitespace character"
print(hh) # ['this', 'is', 'a', 'file',
# 'with', 'some', 'whitespace']
Subscripting allows us to select individual elements of a list.
fields = ['jw234', 'Joe', 'Wilson', 'Smithtown', 'NJ', '2015585894']
var = fields[0] # 'jw234'
var2 = fields[4] # 'NJ'
var3 = fields[-1] # '2015585894' (-1 means last index)
Slicing allows us to select multiple items from a list.
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
first_four = letters[0:4]
print(first_four) # ['a', 'b', 'c', 'd']
# no upper bound takes us to the end
print(letters[5:]) # ['f', 'g', 'h']
Here are the rules for slicing:
1) the first index is 0 2) the lower bound is the 1st element to be included 3) the upper bound is one higher the last element to be included 4) no upper bound means "to the end"
Slicing a string selects characters the way that slicing a list selects items.
mystr = '2014-03-13 15:33:00'
year = mystr[0:4] # '2014'
month = mystr[5:7] # '03'
day = mystr[8:10] # '13'
Again, please review the rules for slicing:
1) the first index is 0 (first character) 2) the lower bound is the 1st character to be included 3) the upper bound is one higher the last character to be included 4) no upper bound means "to the end"
An IndexError exception indicates use of an index for a list element that doesn't exist.
mylist = ['a', 'b', 'c']
print(mylist[5]) # IndexError: list index out of range
Since mylist does not contain a sixth item (i.e., at index 5), Python tells us it cannot complete this operation.
'for' with a list repeats its block as many times as there are items in the list.
mylist = [1, 2, 'b']
for var in mylist: # 1
print(var) # ===
print('===') # 2
# ===
print('done') # b
# ===
# done
Similar to a while block, the for block repeats the contents of its block multiple times, but does so only the number itms in the list. The control variable var is reassigned for each iteration of the loop. This means that if the list has 3 items, the loop executes 3 times and var is reassigned a new value 3 times.
We reassign the value of an integer to effect an incrementing.
x = 0 # int, 0
x = x + 1 # int, 1
x = x + 1 # int, 2
x = x + 1 # int, 3
print(x) # 3
For each of the three incrementing statements above, a new value that equals the value of x is created, and then assigned back to x. The previous value of x is replaced with the new, incremented value. Incrementing is most often used for counting within loops -- see next.
An integer, updated for each iteration, can be used to count iterations.
mylist = [1, 2, 'b']
my_counter = 0
for var in mylist:
my_counter = my_counter + 1
print(f'count: {my_counter} items') # counter: 3 items
The value of my_counter is initialized at 0 before the loop begins. Then, since the incrementing line my_counter = my_counter + 1 is inside the loop, the value of my_counter goes up once with each iteration. Please note that the len() function can count list items more efficiently, but we are using a counter to demonstrate the counter technique, which can be used in situations where len() can't be used, as when we count lines.
An integer, updated for each iteration, can be used to count iterations.
mylist = [1, 2, 3]
my_sum = 0
for val in mylist:
my_sum = my_sum + val
print(f'sum: {my_sum}') # sum: 6 (value of 1 + 2 + 3)
The value of my_sum is initialized at 0 before the loop begins. Then, since the incrementing line my_sum = my_sum + val is inside the loop, the value of my_sum goes up once with each iteration. Please note that the sum() function can count list items more efficiently, but we are using a summing variable to demonstrate the summing technique, which can be used in situations where sum() can't be used, as when we are summing values from a file.
'for' with a file repeats its block as many times as there are lines in the file.
fh = open('students.txt') # file object allows
# looping through a
# series of strings
for xx in fh: # xx is a string, a line of the file
print(xx) # prints each line of students.txt
fh.close() # close the file
"xx" is called a control variable, and it is automatically reassigned each line in the file as a string. break and continue work with for as well as while loops. Again, the control variable xx is reassigned for each iteration of the loop. This means that if the file has 5 lines, the loop executes 5 times and xx is reassigned a new value 5 times.
A file is automatically closed upon exiting the 'with' block.
A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.
with open('pyku.txt') as fh:
for line in fh:
print(line)
## at this point (outside the with block), filehandle fh has been closed.
Here we put together all features learned in this session.
fh = open('revenue.csv') # 'file' object
counter = 0
summer = 0.0
for line in fh: # str, "Haddad's,PA,239.50\n"
line = line.rstrip() # str, "Haddad's,PA,239.50"
fieldlist = line.split(',') # list, ["Haddad's", 'PA', '239.50']
rev_val = fieldlist[2] # str, '239.50' (value from first line)
f_rev = float(rev_val) # float, 239.5
counter = counter + 1
summer = summer + f_rev
fh.close()
print(f'counter: {counter}') # 7 (number of lines in file)
print(f'summer: {summer}') # 662.01000001 (sum of all 3rd col values in file)
Files can be opened for writing or appending; we use the file object and the file write() method.
fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()
fh = open('new_file.txt')
lines = fh.readlines()
print(lines)
# ["here's a line of text\n",
# 'I add the newlines explicitly if I want to write to the file\n']
fh.close()
Note that we are explicitly adding newlines to the end of each line. The write() method doesn't do this for us.
A module is Python code (a code library) that we can import and use in our own code -- to do specific types of tasks.
import csv # make csv (a library module) part of our code
fh = open('thisfile.csv')
reader = csv.reader(fh)
for row in reader:
print(row)
Once a module is imported, its Python code is made available to our code. We can then call specialized functions and use objects to accomplish specialized tasks. Python's module support is profound and extensive. Modules can do powerful things, like manipulate image or sound files, munge and process huge blocks of data, do statistical modeling and visualization (charts) and much, much, much more. The Python 3 Standard Library documentation can be found at https://docs.python.org/3/library/index.html Python 2 Standard Library: https://docs.python.org/2.7/library/index.html
The CSV module parses CSV files, splitting the lines for us. We read the CSV object in the same way we would a file object.
import csv
fh = open('students.txt', 'rb') # second argument: default "read"
reader = csv.reader(fh)
next(fh) # skip one row (useful for header lines)
for record in reader: # loop through each row
print(f'id:{record[0]}; fname:{record[1]}; lname: {record[2]}')
fh.close()
This module takes into account more advanced CSV formatting, such as quotation marks (which are used to allow commas within data.) The second argument to open() ('rb') is sometimes necessary when the csv file comes from Excel, which output newlines in the Windows format (\r\n), and can confuse the csv reader.
Writing is similarly easy:
import csv
wfh = open('some.csv', 'w', newline='')
writer = csv.writer(wfh)
writer.writerow(['some', 'values', "boy, don't you like long field values?"])
writer.writerows([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']])
wfh.close()
Please be advised that you will not see writes to a file until you close the file with fh.close() or until the program ends execution. (newline='' is necessary when opening the write file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.)
An sqlite3 lightweight database instance is built into Python and accessible through SQL statements. It can act as a simple storage solution, or can be used to prototype database interactivity in your Python script and later be ported to a production database like MySQL, Postgres or Oracle.
Keep in mind that the interface to your relational fdatabase will be the same or similar to the one presented here with the file-based one.
import sqlite3
conn = sqlite3.connect('example.db') # a db connection object
c = conn.cursor() # a cursor object for issuing queries
Once a cursor object is established, SQL can be used to write to or read from the database:
c.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
Note that sqlite3 datatypes are nonstandard and don't reflect types found in databases such as MySQL: INTEGER: all int types (TINYINT, BIGINT, INT, etc.) REAL: FLOAT, DOUBLE, REAL, etc. NUMERIC: DECIMAL, BOOLEAN, DATE, DATETIME, NUMERIC TEXT: CHAR, VARCHAR, etc. BLOB: BLOB (non-typed (binary) data, usually large)
Insert a row of data
c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
Larger example that inserts many records at a time
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
('2006-04-06', 'SELL', 'IBM', 500, 53.00),
]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
Commit the changes -- this actually executes the insert
conn.commit()
Retrieve single row of data
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
tuple_row = c.fetchone()
print(tuple_row) # (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
Retrieve multiple rows of data
for tuple_row in c.execute('SELECT * FROM stocks ORDER BY price'):
print(tuple_row)
### (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
### (u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
### (u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
### (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)
Close the database
conn.close()
A Python program can take the place of a browser, requesting and downloading CSV, HTML pages and other files.
Your Python program can work like a web spider (for example visiting every page on a website looking for particular data or compiling data from the site), can visit a page repeatedly to see if it has changed, can visit a page once a day to compile information for that day, etc.
Basic Example: Download and Save Data
import requests
url = 'https://www.python.org/dev/peps/pep-0020/' # the Zen of Python (PEP 20)
response = requests.get(url) # a response object
text = response.text # text of response
# writing the response to a local file -
# you can open this file in a browser to see it
wfh = open('pep_20.html', 'w')
wfh.write(text)
wfh.close()
More Complex Example: Send Headers, Parameters, Body; Receive Status, Headers, Body
import requests
url = 'http://davidbpython.com/cgi-bin/http_reflect' # my reflection program
div_bar = '=' * 10
# headers, parameters and message data to be passed to request
header_dict = { 'Accept': 'text/plain' } # change to 'text/html' for an HTML response
param_dict = { 'key1': 'val1', 'key2': 'val2' }
data_dict = { 'text1': "We're all out of gouda." }
# a GET request (change to .post for a POST request)
response = requests.get(url, headers=header_dict,
params=param_dict,
data = data_dict)
response_status = response.status_code # status of the response (OK, Not Found, etc.)
response_headers = response.headers # headers sent by the server
response_text = response.text # body sent by server
# outputting response elements (status, headers, body)
# response status
print(f'{div_bar} response status {div_bar}\n')
print(response_status)
print(); print()
# response headers
print(f'{div_bar} response headers {div_bar}\n')
for key in response_headers:
print(f'{key}: {response_headers[key]}\n')
print()
# response body
print(f'{div_bar} response body {div_bar}\n')
print(response_text)
Note that if import requests raises a ModuleNotFoundError exception, requests must be installed: Mac: open the Terminal program and issue this command: pip3 install requests Windows: open the Command Prompt program and issue the following command: pip install requests If you have any problems with these commands, please let me know!
Specific techniques for reading the most common data formats.
CSV: feed string response to .splitlines(), then to csv.reader:
import requests
import csv
url = 'path to csv file'
response = requests.get(url)
text = response.text
lines = text.splitlines()
reader = csv.reader(lines)
for row in reader:
print(row)
JSON: requests accesses built-in support:
import requests
url = 'path to json file'
response = requests.get(url)
obj = response.json()
print(type(obj)) # <class 'dict'>
If the requests module cannot be installed, this module is part of the standard distribution.
urllib2 is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution. urllib is a full-featured module for making web requests. Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution.
The urlopen method takes a url and returns a file-like object that can be read() as a file:
import urllib.request
my_url = 'http://www.yahoo.com'
readobj = urllib.request.urlopen(my_url) # return a 'file-like' object
text = readobj.read() # read into a 'byte string'
# text = text.decode('utf-8') # optional, sometimes required:
# decode as a 'str' (see below)
readobj.close()
Alternatively, you can call readlines() on the object (keep in mind that many objects that can deliver file-like string output can be read with this same-named method):
for line in readobj.readlines():
print(line)
readobj.close()
Parsing CSV Files Downloaded CSV files should be parsed with the CSV module, as CSV can be more complex than just comma separators.
The csv.reader() function usually requires a file object, but we can also pass a list of lines to it:
readobj = urllib.request.urlopen(my_url, context=ctx) # file
text = readobj.read() # bytes, entire download
text = text.decode('utf-8') # str, entire download
lines = text.splitlines() # list of str (lines)
reader = csv.reader(lines)
for row in reader:
print(row)
For discussion of potential issues with using urllib, please see the unit titled "Supplementary Modules: CSV, SQL, JSON and the Internet". POTENTIAL ERRORS AND REMEDIES WITH urllib
TypeError mentioning 'bytes' -- sample exception messages:
TypeError: can't use a string pattern on a bytes-like object TypeError: must be str, not bytes TypeError: can't concat bytes to str
These errors indicate that you tried to use a byte string where a str is appropriate.
The urlopen() response usually comes to us as a special object called a byte string. In order to work with the response as a string, we can use the decode() method to convert it into a string with an encoding.
text = text.decode('utf-8')
'utf-8' is the most common encoding, although others ('ascii', 'utf-16', 'utf-32' and more) may be required. I have found that we do not always need to convert (depending on what you will be doing with the returned string) which is why I commented out the line in the first example. SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
my_url = 'http://www.nytimes.com'
readobj = urllib.request.urlopen(my_url, context=ctx)
When including parameters in our requests, we must encode them into our request URL. The urlencode() method does this nicely:
import urllib.request, urllib.parse
params = urllib.parse.urlencode({'choice1': 'spam and eggs',
'choice2': 'spam, spam, bacon and spam'})
print("encoded query string: ", params)
this prints:
encoded query string: choice1=spam+and+eggs&choice2=spam%2C+spam%2C+bacon+and+spam
Filepaths pinpoint the location of any file.
Your computer's filesystem contains files and folders, arranged in a tree (folders and files within folders within other folders, etc.) In this session we'll look at how we can open files anywhere on the filesystem tree. Here's a sample tree for us to work with, containing both files (ending in .txt) and python scripts (ending in .py). (This tree and files are replicated in your data folder for this session.)
dir1 ├── file1.txt ├── test1.py │ ├── dir2a │ ├── file2a.txt │ ├── test2a.py │ │ │ ├── dir3a │ │ ├── file3a.txt │ │ ├── test3a.py │ │ │ │ │ └── dir4 │ │ ├── file4.txt │ │ └── test4.py └── dir2b ├── file2b.txt ├── test2b.py │ └── dir3b ├── file3b.txt └── test3b.py
When our script is located in the same directory as a file we want to open, we can give Python the name of the file, and it will find it in this same directory.
""" test2b.py: open and read a file """
fh = open('file2b.txt') # OS looks for file in present working directory
print(fh.read()) # this is file 2b - note that it is in same directory as script
This works because test2b.py and file2b.txt are in the same directory.
However, if our script is in a different location from the file we want to open, we have a problem -- the OS won't be able to find the file.
""" test3a.py: open and read a file """
fh = open('file2b.txt') # raises a FileNotFoundError exception
# (OS looks for file in the pwd (dir3a)
# but doesn't find it)
The file exists, but it is in a different directory. The OS can't find the file because it needs to be told in which directory it should look for the file. So, if we are running our script from a different location than the file we wish to open, we must use a relative path or an absolute path to show the OS where the file is located.
There are two different ways of expressing a file's location.
Again, let's use the sample tree that can be found in your session folder:
dir1 ├── file1.txt ├── test1.py │ ├── dir2a │ ├── file2a.txt │ ├── test2a.py │ │ │ ├── dir3a │ │ ├── file3a.txt │ │ ├── test3a.py │ │ │ │ │ └── dir4 │ │ ├── file4.txt │ │ └── test4.py └── dir2b ├── file2b.txt ├── test2b.py │ └── dir3b ├── file3b.txt └── test3b.py
Absolute path: this is one that locates a file from the root of the filesystem. It lists each of the directories that lead from the root to the directory that holds the file.
In Windows, absolute paths begin with a drive letter, usually C:\:
""" test3a.py: open and read a file """
filepath = r'C:\Users\david\Downloads\python_data\session_03_strings_lists_files\dir1\dir2b\file2b.txt'
fh = open(filepath)
print(fh.read())
(Note that r'' should be used with any Windows paths that contain backslashes.)
On the Mac, absolute paths begin with a forward slash:
""" test3a.py: open and read a file """
filepath = '/Users/david/Downloads/python_data/session_03_strings_lists_files/dir1/dir2b/file2b.txt'
fh = open(filepath)
print(fh.read())
(The above paths assume that the python_data folder is in the Downloads directory; your may have placed yours elsewhere on your system. Of course, the above paths also assume that my home directory is called david/; yours is likely different.) Relative Path: locate a file folder in relation to the present working directory
A relative path is read as an extension of the present working directory. The below path assumes that our present working directory is /Users/david/Downloads/python_data/dir1/dir2a:
""" test2a.py: open and read a file """
filepath = 'dir3a/dir4/file4.txt' # starts from /Users/david/Downloads/python_data/dir1/dir2a
fh = open(filepath)
print(fh.read())
When we use a relative path, we can think of it as extending the pwd. So the whole path is: /Users/david/Downloads/python_data/dir1/dir2a/dir3a/dir4/file4.txt Therefore, in order to use a relative path, you must first ascertain your present working directory in the filesystem. Only then can you know the relative path needed to find the file you are looking for. Special Note: Windows paths and the "raw string" Note that Windows paths featuring backslashes should use r'' ("raw string"), in which a backslash is not seen as an escape sequence such as \n (newline). This is not required on Macs or on paths without backslashes. For simplicity, you can substitute forward slashes in Windows paths, and Python will translate the slashes for you. Using forward slashes is probably the easiest way to work with Windows paths in Python.
We use .. to signify the parent; this can be used in a relative filepath.
Again, let's use the sample tree that is replicated in your session folder:
dir1 ├── file1.txt ├── test1.py │ ├── dir2a │ ├── file2a.txt │ ├── test2a.py │ │ │ ├── dir3a │ │ ├── file3a.txt │ │ ├── test3a.py │ │ │ │ │ └── dir4 │ │ ├── file4.txt │ │ └── test4.py └── dir2b ├── file2b.txt ├── test2b.py │ └── dir3b ├── file3b.txt └── test3b.py
What if we are in dir3a running file test3a.py but want to access file file1.txt?
Think of .. (two dots) as representing the parent directory:
""" test3a.py: open and read a file """
filepath = '../../file1.txt' # reads from /Users/david/Downloads/python_data/dir1
fh = open(filepath)
print(fh.read())
As with all relative paths, you must first consider the location from which we are running the script, then the location of the file you're trying to open. If we are in the dir3 directory when we run test3a.py, then we are two directories "below" the dir1 directory. The first .. takes us to the dir2 directory. The second .. takes us to the dir1 directory. We can then access the file1.txt directory from there. Going up, then down What if we wanted to go from dir2a to dir2b? They are at the same level, in other words they are neither above or below each other.
The answer is to go up to the parent, then down to the other child:
""" test2a.py: open and read a file """
filepath = '../dir2b/file2b.txt'
fh = open(filepath)
print(fh.read())
.. takes us to the dir1 directory. dir2b can be accessed from that directory.
Containers are Python objects that can contain other objects.
Once collected, values in a container can be sorted or filtered (i.e. selected) according to whatever rules we choose. A collection of numeric values offers many new opportunities for analysis:
A collection of string values allows us to perform text analysis:
Compare and contrast the characteristics of each container.
mylist = ['a', 'b', 'c', 'd', 1, 2, 3]
mytuple = ('a', 'b', 'c', 'd', 1, 2, 3)
myset = {'a', 'b', 'c', 'd', 1, 2, 3}
mydict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
A list is an ordered sequence of values.
var = [] # initialize an empty list
var2 = [1, 2, 3, 'a', 'b'] # initialize a list of values
Subscripting allows us to read individual items from a list.
mylist = [1, 2, 3, 'a', 'b'] # initialize a list of values
xx = mylist[2] # 3
yy = mylist[-1] # 'b'
Slicing a list returns a new list.
var2 = [1, 2, 3, 'a', 'b'] # initialize a list of values
sublist1 = var2[0:3] # [1, 2, 3]
sublist2 = var2[2:4] # [3, 'a']
sublist3 = var2[3:] # ['a', 'b']
Remember the rules of slicing, similar to strings:
The 'in' operator works with lists similar to how it works with strings.
mylist = [1, 2, 3, 'a', 'b']
if 'b' in mylist: # this is True for mylist
print("'b' can be found in mylist")
print('b' in mylist) # "True": the 'in' operator
# actually returns True or False
Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?
mylist = [1, 3, 5, 7, 9] # initialize a list
print(len(mylist)) # 5 (count of items)
print(sum(mylist)) # 25 (sum of values)
print(min(mylist)) # 1 (smallest value)
print(max(mylist)) # 9 (largest value)
sorted() returns a new list of sorted values.
mylist = [4, 9, 1.2, -5, 200, 20]
smyl = sorted(mylist) # [-5, 1.2, 4, 9, 20, 200]
Concatenation works in the same way as strings.
var = ['a', 'b', 'c']
var2 = ['d', 'e', 'f']
var3 = var + var2 # ['a', 'b', 'c', 'd', 'e', 'f']
var = []
var.append(4) # Note well! call is not assigned
var.append(5.5) # list is changed in-place
print(var) # [4, 5.5]
It is the nature of a list to hold these items in order as they were added.
An AttributeError exception occurs when calling a method on an object type that doesn't support that method.
mylines = ['line1\n', 'line2\n', 'line3\n']
mylines = mylines.rstrip() # AttributeError:
# 'list' object has no attribute 'rstrip'
This exception may sometimes result from a misuse of the append() method, which returns None.
mylist = ['a', 'b', 'c']
# oops: returns None -- call to append() should not be assigned
mylist = mylist.append('d')
mylist = mylist.append('e') # AttributeError: 'NoneType'
# object has no attribute 'append'
mylist = ['a', 'b', 'c']
mylist.append('d') # now mylist equals ['a', 'b', 'c', 'd']
There are a number of additional list methods to manipulate a list, though they are less often used.
mylist = ['a', 'hello', 5, 9]
popped = mylist.pop(0) # str, 'a'
# (argument specifies the index of the item to remove)
mylist.remove(5) # remove an element by value
print(mylist) # ['hello', 9]
mylist.insert(0, 10)
print(mylist) # [10, 'hello', 9]
It's helpful to contrast these containers and lists.
It's easy to remember how to use one of these containers by considering how they differ in behavior.
A tuple is an immutable ordered sequence of values.
var2 = (1, 2, 3, 'a', 'b') # initialize a tuple of values
Subscripting allows us to read individual items from a tuple.
mytuple = (1, 2, 3, 'a', 'b') # initialize a tuple of values
xx = mytuple[3] # 'a'
Note that indexing starts at 0, so index 1 is the 2nd item, index 2 is the 3rd item, etc.
Slicing a tuple returns a new tuple.
var2 = (1, 2, 3, 'a', 'b') # initialize a tuple of values
subtuple1 = var2[0:3] # (1, 2, 3)
subtuple2 = var2[2:4] # (3, 'a')
subtuple3 = var2[3:] # ('a', 'b')
Remember the rules of slicing, same as lists and strings:
Concatenation works in the same way as lists and strings.
var = ('a', 'b', 'c')
var2 = ('d', 'e', 'f')
var3 = var + var2 # ('a', 'b', 'c', 'd', 'e', 'f')
A set is an unordered, unique collection of values.
Initialize a Set
myset = set() # initialize an empty set (note empty curly
# are reserved for dicts)
myset = {'a', 9999, 4.3, 'a'} # initialize a set with elements
print(myset) # {9999, 4.3, 'a'}
myset = set() # initialize an empty set
myset.add(4.3) # note well method call not assigned
myset.add('a')
print(myset) # {'a', 4.3} (order is not
# necessarily maintained)
# Get Length of a set or tuple (compare to len() of a list or string)
myset = {1, 2, 3, 'a', 'b'}
yy = len(myset) # 5 (# of elements in myset)
# Test for membership in a set or tuple
mytuple = (1, 2, 3, 'a', 'b')
if 'b' in mytuple: # this is True for mytuple
print("'b' can be found in mytuple")
print('b' in mytuple) # "True": the 'in' operator
# actually returns True or False
The 'for' loop allows us to traverse a set or tuple and work with each item.
mytuple = (1, 2, 3, 'a', 'b') # could also be a set here
for var in mytuple:
print(var) # prints 1, then 2, then 3,
# then a, then b
Summary functions offer a speedy answer to basic analysis questions: how many? How much? Highest value? Lowest value?
Whether a set or tuple, these operations work in the same way.
mytuple = (1, 3, 5, 7, 9) # initialize a tuple
myset = {1, 3, 5, 7, 9} # initialize a set
print(len(mytuple)) # 5 (count of items)
print(sum(myset)) # 25 (sum of values)
print(min(myset)) # 1 (smallest value)
print(max(mytuple)) # 9 (largest value)
Regardless of type, sorted() returns a list of sorted values.
mytuple = (4, 9, 1.2, -5, 200, 20) # could also be a set here
smyl = sorted(mytuple) # [-5, 1.2, 4, 9, 20, 200]
This technique forms the core of much of what we do.
In order to work with data, the usual steps are:
We call this process Extract-Transform-Load, or ETL. ETL is at the heart of what core Python does best.
This "summary algorithm" is very similar to building a float sum from a file source.
build a list of company names
company_list = [] # initialize an empty list
fh = open('revenue.csv') # 'file' object
for line in fh: # str, 'Haddad's,PA,239.50'
elements = line.split(':') # list, ["Haddad's", 'PA', '239.50']
company_list.append(elements[0]) # add the name for this row
# to company_list
print(company_list) # list, ["Haddad's", 'Westfield', 'The Store', "Hipster's",
# 'Dothraki Fashions', "Awful's", 'The Clothiers']
fh.close()
Just as we did when counting lines of a file or summing up values, we can use a 'for' loop over a file to collect values.
This "summary algorithm" uses a set collect unique items from repeating data.
state_set = set() # initialize an empty list
fh = open('revenue.csv') # 'file' object
for line in fh: # str, 'Haddad's,PA,239.50'
elements = line.split(':') # list, ["Haddad's", 'PA', '239.50']
state_set.add(elements[1]) # add the state for this row
# to state_set
print(state_set) # set, {'PA', 'NY', 'NJ'} (your order may be different)
chosen_state = input('enter a state: ')
if chosen_state in state_set:
print('that state was found in the file')
else:
print('that state was not found')
fh.close()
Data files can be rendered as lists of lines, and slicing can manipulate them holistically rather than by using a counter.
fh = open('student_db.txt')
file_lines_list = fh.readlines() # a list of lines in the file
print(file_lines_list)
# [ "id:address:city:state:zip",
# "jk43:23 Marfield Lane:Plainview:NY:10023",
# "ZXE99:315 W. 115th Street, Apt. 11B:New York:NY:10027",
# "jab44:23 Rivington Street, Apt. 3R:New York:NY:10002" ... (list continues) ]
wanted_lines = file_lines_list[1:] # take all but 1st element
# (i.e., 1st line)
for line in wanted_lines:
print(line.rstrip()) # jk43:23 Marfield Lane:
# Plainview:NY:10023
# axe99:315 W. 115th Street,
# Apt. 11B:New York:NY:10027
# jab44:23 Rivington Street,
# Apt. 3R:New York:NY:10002
# etc.
fh.close()
Once we have read a file as a single string, we can "chop it up" any way we like.
# read(): file text as a single strings
fh = open('guido.txt') # 'file' object
text = fh.read() # read() method called on
# file object returns a string
fh.close() # close the file
print(text)
print(len(text)) # 207 (number of characters in the file)
# single string, entire text:
# 'For three months I did my day job, \nand at night and
# whenever I got a \nchance I kept working on Python. \n
# After three months I was to the \npoint where I could
# tell people, \n"Look here, this is what I built."'
String .split() on a whole file string returns a list of words.
file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built." """
words = file_text.split() # split entire file on whitespace (spaces or newlines)
print(words)
# ['For', 'three', 'months', 'I', 'did', 'my', 'day', 'job,',
# 'and', 'at', 'night', 'and', 'whenever', 'I', 'got', 'a',
# 'chance', 'I', 'kept', 'working', 'on', 'Python.', 'After',
# 'three', 'months', 'I', 'was', 'to', 'the', 'point', 'where',
# 'I', 'could', 'tell', 'people,', '“Look', 'here,', 'this',
# 'is', 'what', 'I', 'built.”']
print(len(words)) # 42 (number of words in the file)
String .splitlines() will split any string on the newlines, delivering a list of lines from the file.
file_text = """For three months I did my day job,
and at night and whenever I got a
chance I kept working on Python.
After three months I was to the
point where I could tell people,
"Look here, this is what I built."" """
lines = file_text.splitlines()
print(lines)
# ['For three months I did my day job, ', 'and at night and whenever I got a ',
# 'chance I kept working on Python. ', 'After three months I was to the ',
# 'point where I could tell people, ', '“Look here, this is what I built.”']
print(len(lines)) # 6 (number of lines in the file)
for: read (newline ('\n') marks the end of a line)
fh = open('students.txt') # file object allows looping
# through a series of strings
for my_file_line in fh: # my_file_line is a string
print(my_file_line) # prints each line of students.txt
fh.close() # close the file
read(): read entire file as a single string
fh = open('students.txt') # file object allows reading
text = fh.read() # read() method called on file
# object returns a string
fh.close() # close the file
print(text) # entire text as a single string
readlines(): read as a list of strings (each string a line)
fh = open('students.txt')
file_lines = fh.readlines() # file.readlines() returns
# a list of strings
fh.close() # close the file
print(file_lines) # entire text as a list of lines
We don't have call to write to a file in this course, but it's important to know how
wfh = open('newfile.txt', 'w') # open for writing
# (will overwrite an existing file)
wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')
wfh.write('this is a line of text\n')
wfh.close()
This function allows us to iterate over an integer sequence.
counter = range(10)
for i in counter:
print(i) # prints integers 0 through 9
for i in range(3, 8): # prints integers 3 through 7
print(i)
If we need an literal list of integers, we can simply pass the iterable to a list:
intlist = list(range(5))
print(intlist) # [0, 1, 2, 3, 4]
A dictionary (or dict) is a collection of unique key/value pairs of objects.
mydict = {} # empty dict
mydict = {'a':1, 'b':2, 'c':3} # dict with str keys and int values
print(mydict['a']) # look up 'a' to get 1
Pairs describe data relationships that we often want to consider:
You yourself may consider data in pairs, even in your personal life:
There are a few main ways dictionaries are used:
Dicts are marked by curly braces. Keys and values are separated with colons.
initialize a dict
mydict = {} # empty dict
mydict = {'a':1, 'b':2, 'c':3} # dict with str keys and int values
We use subscript syntax to assign a value to a key.
mydict = {'a':1, 'b':2, 'c':3}
mydict['d'] = 4 # setting a new key and value
print(mydict) # {'a': 1, 'c': 3, 'b': 2, 'd': 4}
We also use subscript syntax to retrieve a value.
mydict = {'a':1, 'b':2, 'c':3, 'd': 4}
dval = mydict['d'] # value for 'd' is 4
xxx = mydict['c'] # value for 'c' is 3
You might notice that this subscripting is very close in syntax to list subscripting. The only difference is that instead of an integer index we are using the dict key (most often a string).
This exception is raised when we request a key that does not exist in the dict.
mydict = {'a': 1, 'b': 2, 'c': 3}
val = mydict['d'] # KeyError: 'd'
Like the IndexError exception, which is raised if we ask for a list item that doesn't exist, KeyError is raised if we ask for a dict key that doesn't exist.
If we're not sure whether a key is in the dict, before we subscript we can check to confirm.
mydict = {'a': 1, 'b': 2, 'c': 3}
if 'a' in mydict:
print("'a' is a key in mydict")
Dictionaries can be sorted by value to produce a ranking.
We loop through keys and then use subscripting to get values.
mydict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
for key in mydict: # a
val = mydict[key]
print(key) # a
print(val) # 1
print()
# b
# 2
# (etc.)
Note that plain 'for' looping over a dict delivers the keys:
for key in mydict:
print(key) # prints a, then b, then c...
With any container or iterable (list, tuple, file), sorted() returns a list of sorted elements.
namelist = ['jo', 'pete', 'michael', 'zeb', 'avram']
slist = sorted(namelist) # ['avram', 'jo', 'michael', 'pete', 'zeb']
Remember that no matter what container is passed to sorted(), the function returns a list. Also remember that the reverse=True argument to sorted() can be used to sort the items in reverse order.
sorted() returns a sorted list of a dict's keys.
bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'alice': 184}
sorted_keys = sorted(bowling_scores)
print(sorted_keys) # [ 'alice', 'jeb', 'mike', 'zeb' ]
for key in sorted_keys:
print(f'{key}={bowling_scores[key]}')
A special "sort criteria" argument can cause Python to sort a dict's keys by its values.
bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'alice': 184}
sorted_keys = sorted(bowling_scores, key=bowling_scores.get)
print(sorted_keys) # ['zeb', 'jeb', 'alice', 'mike']
for player in sorted_keys:
print(f"{player} scored {bowling_scores[player]}")
## zeb scored 98
## jeb scored 123
## alice scored 184
## mike scored 202
The key= argument allows us to specify an alternate criteria by which we might sort the keys. The .get() method takes a key and returns a value from the dict, which is what we are asking sorted() to do with each key when sorting by value. However, this complex sorting is more advanced a topic than we cabn cover here.
multi-target assignment performs the assignments in one statement
csv_line = "Haddad's,PA,239.50"
row = csv_line.split(',') # ["Haddad's", 'PA', '239.50']
codata = ["Haddad's", 'PA', '239.50']
company, state, revenue = codata
print(company) # "Haddad's"
print(revenue) # 239.50
csv_line = 'jk43:23 Marfield Ln.:Plainview:NY:10024'
stuid, street, city, state, zip = csv_line.split(':')
print(stuid) # 'jk43'
print(city) # 'Plainview'
As with all containers, we loop through a data source, select and add to a dict.
ids_names = {} # initialize an
# empty dict
fh = open('student_db.txt')
for line in fh:
stuid, street, city, state, zip = line.split(':')
ids_names[stuid] = state # key id is paired to
# student's state
print("here is the state for student 'jb29': ")
print(ids_names['jb29']) # NJ
fh.close()
A "counting" or "summing" dictionary answers the question "how many of each" or "how much of each".
Aggregations may answer the following questions:
The dict is used to store this information. Each unique key in the dict will be associated with a count or a sum, depending on how many we found in the data source or the sum of values associated with each key in the data source.
A "counting" dict increments the value associated with each key, and adds keys as new ones are found.
Customarily we loop through data, using the dictionary to keep a tally as we encounter items.
state_count = {} # initialize an empty dict
fh = open('revenue.csv')
for line in fh:
items = line.split(',') # ["Haddad's", 'PA', '239.50']
state = items[1] # str, 'PA'
if state not in state_count:
state_count[state] = 0
state_count[state] = state_count[state] + 1
print(state_count) # {'PA': 2, 'NJ': 2, 'NY': 3}
print("here is the count of states from revenue.csv: ")
for state in state_count:
print(f"{state}: {state_count[state]} occurrences")
print("here is the count for 'NY': ")
print(state_count['NY']) # 3
fh.close()
A "summing" dict sums the value associated with each key, and adds keys as new ones are found.
As with a counting dict, we loop through data, using the dictionary to keep a tally as we encounter items.
state_sum = {} # initialize an empty dict
fh = open('revenue.csv')
for line in fh:
items = line.split(',') # ["Haddad's", 'PA', '239.50']
state = items[1] # str, 'PA'
value = float(items[2]) # float, 239.5
if state not in state_sum:
state_sum[state] = 0
state_sum[state] = state_sum [state] + value
print(state_sum) # {'PA': 263.45, 'NJ': 265.4, 'NY': 133.16}
print("here is the sum for 'NY': ")
print(state_sum['NY']) # 133.16
fh.close()
len() counts the pairs in a dict.
mydict = {'a': 1, 'b': 2, 'c': 3}
print(len(mydict)) # 3 (number of keys in dict)
This method may be used to retrieve a value without checking the dict to see if the key exists.
mydict = {'a': 1, 'b': 2, 'c': 3}
xx = mydict.get('a', 0) # 1 (key exists so paired value is returned)
yy = mydict.get('zzz', 0) # 0 (key does not exist so the
# default value is returned)
You may use any value as the default. This method is sometimes used as an alternative to testing for a key in a dict before reading it -- avoiding the KeyError exception that occurs when trying to read a nonexistent key.
The .keys() method gives access to the keys in a dict.
mydict = {'a': 1, 'b': 2, 'c': 3}
these_keys = mydict.keys()
for key in these_keys:
print(key)
print(list(these_keys)) # ['a', 'c', 'b']
The .values() method gives views on the dict.
mydict = {'a': 1, 'b': 2, 'c': 3}
values = list(mydict.values()) # [1, 2, 3]
if 'c' in mydict.values():
print("'c' was found")
for value in mydict.values():
print(value)
The values cannot be used to get the keys - it's a one-way lookup from the keys. However, we might want to check for membership in the values, or sort or sum the values, or some other less-used approach.
.items() gives key/value pairs as 2-item tuples.
mydict = {'a': 1, 'b': 2, 'c': 3}
print(list(mydict.items())) # [('a', 1), ('c', 3), ('b', 2)]
for key, value in mydict.items():
print(key, value) # a 1
# b 2
# c 3
.items() is usually used as another approach for looping through a dict. With each iteration for 'for', or each item when converted to a list, we see a 2-item tuple. The first item is a key, and the second a value. When looping with 'for', since each iteration produces a 2-item (key/value) tuple, we can assign the key and value to variable names and use them immediately, rather than resorting to subscripting. This is usually easier and it is also more efficient.
dict items() can give us a list of 2-item tuples. dict() can convert this list back to a dictionary.
mydict = {'a': 1, 'b': 2, 'c': 3}
these_items = list(mydict.items()) # [('a', 1), ('c', 3), ('b', 2)]
newdict = dict(these_items)
print(newdict) # {'a': 1, 'b': 2, 'c': 3}
2-item tuples can be sorted and sliced, so they are a handy alternate structure.
zip() zips up parallel lists into tuples; dict() can convert this to dict.
list1 = ['a', 'b', 'c', 'd']
list2 = [ 1, 2, 3, 4 ]
tupes = list(zip(list1, list2))
print(tupes) # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print(dict(tupes)) # {'a': 1, 'b': 2, 'c': 3, 'd': 4}
Occasionally we are faced with two lists that relate to each other one a 1-to-1 basis... or, we sometimes even shape our data into this form. Paralell lists like these can be zipped into multi-item tuples.
Introduction: unanticipated vs. anticipated errors
Think of errors as being of two general kinds -- unanticipated and anticipated:
Exampls of anticipated errors:
If the user enters a key that is not in the dict, we can expect this error.
mydict = {'1972': 3.08, '1973': 1.01, '1974': -1.09}
uin = input('please enter a year: ') # user enters 2116
print(f'mktrf for {uin} is {mydict[uin]}')
# Traceback (most recent call last):
# File "/Users/david/test.py", line 5, in <module>
# print(f'mktrf for {uin} is {mydict[uin]}')
# ~~~~~~^^^^^
# KeyError: '9999'
If we ask the user for a number, but anticipate they might not give us one.
uin = input('please enter an integer: ')
intval = int(uin) # user enters 'hello'
print('{uin} doubled is {intval*2}')
# Traceback (most recent call last):
# File "/Users/david/test.py", line 3, in <module>
# intval = int(uin) # user enters 'hello'
# ^^^^^^^^
# ValueError: invalid literal for int() with base 10: 'hello'
If we attempt to open a file but it has been moved or deleted.
filename = 'thisfile.txt'
fh = open(filename)
# Traceback (most recent call last):
# File "/Users/david/test.py", line 3, in <module>
# fh = open(filename)
# ^^^^^^^^^^^^^^
# FileNotFoundError: [Errno 2] No such file or directory: 'thisfile.txt'
Up to now we have managed anticipated errors by testing to make sure an action will be succesful.
Examples of testing for anticipated errors:
So far we have been dealing with anticipated errors by checking first -- for example, using .isdigit() to make sure a user's input is all digits before converting to int().
However, there is an alternative to "asking for permission": begging for forgiveness.
the try block and except block
try:
uin = input('please enter an integer: ') # user enters 'hello'
intval = int(uin) # int() raises a ValueError
# ('hello' is not a valid value)
print('{uin} doubled is {intval*2}')
except ValueError:
exit('sorry, I needed an int') # the except block cancels the
# ValueError and takes action
It's important to witness the exception and where it occurs before attempting to trap it.
It's strongly recommended that you follow a specific procedure in order to trap an exception:
Multiple exceptions can be trapped using a tuple of exception types.
companies = ['Alpha', 'Beta', 'Gamma']
user_index = input('please enter a ranking: ') # user enters '4' or 'hello'
try:
list_idx = int(user_index) - 1
print(f'company at ranking {user_index} is {companies[list_idx]}')
except (ValueError, IndexError):
exit(f'max index is {len(companies) - 1}')
Here we trap two anticipated errors: if the user types a non-number and a ValueError exception is raised, or an invalid list index and an IndexError is raised, the except: block will be executed.
The same try: block can be followed by multiple except: blocks, which we can use to specialize our response to the exception type.
companies = ['Alpha', 'Beta', 'Gamma']
user_index = input('please enter a ranking: ') # user enters '4'
try:
list_idx = int(user_index) - 1
print(f'company at ranking {user_index} is {companies[list_idx]}')
except ValueError:
exit('please enter a numeric ranking')
except IndexError:
exit(f'max index is {len(companies) - 1}')
The exception raised will be matched against each type, and the first one found will excecute its block.
When we don't specify an exception, Python will trap any exception. This is a bad practice.
ui = input('please enter a number: ')
try:
fval = float(ui)
except: # AVOID!! Should be 'except ValueError:'
exit('please enter a number - thank you')
However, this is a bad practice. Why?
There are certain limited circumstances under which we might use except: by itself, or except Exception. These might include wrapping the whole program execution in a try: block and trapping any exception that is raised so the error can be logged and the program doesn't need to exit as a result.
The Command Line (also known as "Command Prompt" or "Terminal Prompt") gives us access to the Operating System's files and programs.
Before the graphical user interface was invented, programmers used a text-based interface called the command line to run programs and read and write files. Programmers still make heavy use of the command line because it provides a much more efficient way to communicate with the operating system than Windows File Explorer or Mac Finder. It is the "power user's" way of talking to the OS, and it should be considered essential for anyone wanting to develop their programming skills. To reach the command line, you must search for and open one of these programs:
On Windows -- search for Command Prompt:
Microsoft Windows [Version 10.0.18363.1016] # these 2 lines may look different (c) 2019 Microsoft Corporation. All rights reserved. C:\Users\david> < -- command line
On Mac -- search for Terminal:
Last login: Thu Sep 3 13:46:14 on ttys001 Davids-MBP-3:~ david % < -- command line
Your command line will look similar to those shown above, but will have different names and directory paths (for example, your username instead of 'david'). Your prompt may also feature a dollar sign (%) instead of a percent sign. After opening the command line program on your computer, note the blinking cursor: this is the OS awaiting your next command.
Your command line session works from one directory location at a time.
When you first launch the command line program, you are placed at a specific directory within your filesystem. We call this the "present working directory". You may "move around" the system, and when you do, your pwd will change. By default, your initial pwd is your home directory -- the directory at which all your individual files are stored. This directory is usually named after your username, and can be found at /Users/[username] or C:\Users\[username]. On Windows: Your present working directory is always displayed as the command prompt.
C:\Users\david>
On Mac:
Your present working directory can be shown by using the pwd command:
Davids-MBP-3:~ david % pwd /Users/david
As we move around the filesystem, we will see the present working directory change. You must always be mindful of the pwd as it is your current location and it will affect how you can access other files and programs in the filesystem.
We can list out the contents (files and folders) of any directory.
On Mac, use the 'ls' command to see the files and folders in the present working directory:
Davids-MBP-3:~ david % ls Applications Desktop Documents Downloads Dropbox Library Movies Music Public PycharmProjects Sites archive ascii_test.py requests_demo.py static.zip
On Windows, use the 'dir' command to see the files and folders in the present working directory:
C:\Users\david> dir Volume Serial Number is 0246-9FF7 Directory of C:\Users\david 08/29/2020 11:37 AM <DIR> . 08/29/2020 11:37 AM <DIR> .. 05/29/2020 06:27 PM <DIR> .astropy 05/29/2020 06:35 PM <DIR> .config 05/29/2020 06:36 PM <DIR> .matplotlib 08/07/2020 10:33 AM 1,460 .python_history 08/29/2020 11:28 AM <DIR> 3D Objects 08/29/2020 11:28 AM <DIR> Contacts 08/29/2020 12:50 PM <DIR> Desktop 08/29/2020 11:28 AM <DIR> Documents 09/02/2020 10:25 AM <DIR> Downloads 08/29/2020 11:28 AM <DIR> Favorites 08/29/2020 11:28 AM <DIR> Links 08/29/2020 11:28 AM <DIR> Music 08/29/2020 11:29 AM <DIR> OneDrive 08/29/2020 11:28 AM <DIR> Pictures 08/29/2020 12:46 PM <DIR> PycharmProjects 08/29/2020 11:28 AM <DIR> Saved Games 08/29/2020 11:28 AM <DIR> Searches 08/29/2020 11:28 AM <DIR> Videos 1 File(s) 1,460 bytes 20 Dir(s) 7,049,539,584 bytes free
The 'change directory' command moves us 'up' or 'down' the tree.
To move around the filesystem (i.e. to change the present working directory), we use the cd ("change directory") command. In the examples below, note how the present working directory changes after we move. [Please note: in the paths below you'll see that my class project directory python_data_ipy/ is in my Downloads/ directory (i.e., at /Users/david/Downloads/python_data_ipy). If you want your output and directory moves to match mine, you can put yours there -- or if you can substitute your own directory path for the one I'm using.]
on Mac:
Davids-MBP-3:~ david % pwd /Users/david Davids-MBP-3:~ david % cd Downloads Davids-MBP-3:~ david % pwd /Users/david/Downloads
on Windows:
C:\Users\david> cd Downloads C:\Users\david\Downloads>
So using the ls or dir command together with the cd command, we can travel from directory to directory, listing out the contents of each directory to decide where to go next (for Windows in the below examples, simply substitute the dir command for ls -- also note that Windows output for dir will look different than below):
Davids-MBP-3:Downloads david % pwd /Users/david/Downloads Davids-MBP-3:Downloads david % ls dir on Windows python_data_ipy [... likely other files/folders as well ...] Davids-MBP-3:Downloads david % cd python_data_ipy Davids-MBP-3:python_data_ipy david % pwd /Users/david/Downloads/python_data_ipy Davids-MBP-3:python_data_ipy david % ls dir on Windows session_00_test_project session_01_objects_types session_02_funcs_condits_methods session_03_strings_lists_files session_04_containers_lists_sets session_05_dictionaries session_06_multis session_07_functions_power_tools session_08_files_dirs_stdout session_09_funcs_modules session_10_classes username.txt Davids-MBP-3:python_data_ipy david % cd session_06_multis/ Davids-MBP-3:session_06_multis david % ls dir on Windows warmup_exercises inclass_exercises notebooks_inclass_warmup [...several more files and folders, may be in a different order...] Davids-MBP-3:session_06_multis david % cd inclass_exercises Davids-MBP-3:inclass_exercises david % ls dir on Windows inclass_6.1.py inclass_6.2.py inclass_6.3.py inclass_6.4.py inclass_6.5.py inclass_6.6.py ... Davids-MBP-3:inclass_exercises david % pwd /Users/david/Downloads/python_data_ipy/session_06_multis/inclass_exercises
The '..' (double dot) indicates the parent and can move us one directory "up".
As you saw, we can move "down" the directory tree by using the name of the next directory -- this extends the path (Mac paths will of course look different; use pwd to confirm your present working directory):
C:\Users\david> cd Desktop C:\Users\david\Desktop>
But if we'd like to travel up the directory tree, we use the special directory shortcut .. which signifies the parent directory:
C:\Users\david\Desktop> cd .. C:\Users\david\> cd .. C:\Users\>
We can also travel directly to an inner folder by using the full path. In order to complete the next exercise, I'll travel to an inner folder within my project directory (again, yours may be different depending on where you put the project folder):
C:\Users\> cd david\Downloads\python_data_ipy\session_06_multis\inclass_exercises
This is the "true" way to ask Python to execute our script.
Every developer should be able execute scripts through the command line, without having to use an IDE like PyCharm or Jupyter. If you are in the same directory as the script, you can execute a program by running Python and telling Python the name of the script:
On Windows:
C:\Users\david\Downloads\python_data_ipy\session_06_multis\inclass_exercises\> python inclass_6.1.py
On Mac:
Davids-MBP-3:inclass_exercises david % python3 inclass_6.1.py
Unless you've changed it, you won't see any result from running this program, because it does not print anything. Make a change and run it again to see the result! Each week we'll try to spend a few minutes traveling to and executing one or more Python programs from the command line.
JavaScript Object Notation is a simple "data interchange" format for sending or storing structured data as text.
Fortunately for us, JSON resembles Python in many ways, making it easy to read and understand.
{ "key1": ["a", "b", "c"], "key2": { "innerkey1": 5, "innerkey2": "woah" }, "key3": false, "key4": null }
The json.load() function decodes the contents of a JSON file.
Here's a program to read the structure shown earlier, read from a file that contains it:
import json # we use this module to read JSON
fh = open('sample.json')
mys = json.load(fh) # load from a file, convert into Python container
fh.close()
print((type(mys))) # dict (the outer container of this struct)
print(mys['key2']['innerkey2']) # woah
The json.loads() function decodes the contents of a JSON string.
In this example, we show what we must do if we don't have access to a file, but instead receive the data as a string, as in the case with web request.
import json # we use this module to read JSON
import requests # use 'pip install' to install
response = requests.get('https://davidbpython.com/mystruct.json')
text = response.text # file data as a single string
mys = json.loads(text) # load from a file, convert into Python container
print((type(mys))) # dict (the outer container of this struct)
print((mys['key2']['innerkey2'])) # woah
The requests module allows your Python program to act like a web browser, making web requests (i.e., with a URL) and downloading the response. If you try this program and receive a ModuleNotFoundError, you must run pip install at the command line to install it.
A nested object can be confusing to read.
If we have an multidimensional object that is squished together and hard to read, we can use .dumps() with indent=4
import json
obj = {'a': {'x': 1, 'y': 2, 'z': 3}, 'b': {'x': 1, 'y': 2, 'z': 3}, 'c': {'x': 1, 'y': 2, 'z': 3} }
print((json.dumps(obj, indent=4)))
this prints:
{ "a": { "x": 1, "y": 2, "z": 3 }, "b": { "x": 1, "y": 2, "z": 3 } }
We can use json.dump() write to a JSON file.
Dumping a Python structure to JSON
import json
wfh = open('newfile.json', 'w') # open file for writing
obj = {'a': 1, 'b': 2}
json.dump(obj, wfh)
wfh.close()
Core principles
Here are the main components of a properly formatted program:
""" tip_calculator.py -- calculate tip for a restaurant bill
Author: David Blaikie dbb212@nyu.edu
Last modified: 9/19/2017
"""
import sys # part of Python distribution (installed with Python)
import pandas as pd # installed "3rd party" modules
import myownmod as mm # "local" module (part of local codebase)
# constant message strings are not required to be placed
# here, but in professional programs they are kept
# separate from the logic, often in separate "config" files
MSG1 = 'A {}% tip (${}) was added to the bill, for a total of ${}.'
MSG2 = 'With {} in your party, each person must pay ${}.'
# sys.argv[0] is the program's pathname (e.g. /this/that/other.py)
# os.path.basename() returns just the program name (e.g. other.py)
USAGE_STRING = "Usage: {os.path.basename(sys.argv[0])} [total amount] [# in party] [tip percentage]
def usage(msg):
""" print an error message, usage: string and exit
Args: msg (str): an error message
Returns: None (exits from here)
Raises: N/A (does not explicitly raise an exception)
"""
sys.stderr.write(f'Error: {msg}')
exit(USAGE_STRING)
def validate_normalize_input(args):
""" verify command-line input
Args: N/A (reads from sys.argv)
Returns:
bill_amt (float): the bill amount
party_size (int): the number of people
tip_pct (float): the percent tip to be applied, in 100’s
Raises: N/A (does not explicitly raise an exception)
"""
if not len(sys.argv) == 4:
usage('please enter all required arguments')
try:
bill_amt = float(sys.argv[1])
party_size = int(sys.argv[2])
tip_pct = float(sys.argv[3])
except ValueError:
usage('arguments must be numbers')
return bill_amt, party_size, tip_pct
def perform_calculations(bill_amt, party_size, tip_pct):
"""
calculate tip amount, total bill and person's share
Args:
bill_amount (float): the total bill
party_size (int): the number in party
tip_pct (float): the tip percentage in 100’s
Returns:
tip_amt (float): the tip in $
total_bill (float): the bill including tip
person_share (float): equal share of bill per person
Raises:
N/A (does not specifically raise an exception)
"""
tip_amt = bill_amt * tip_pct * .01
total_bill = bill_amt + tip_amt
person_share = total_bill / party_size
return tip_amt, total_bill, person_share
def report_results(pct, tip_amt, total_bill, size, person_share):
""" print results in formatted strings
Args:
pct (float): the tip percentage in 100’s
tip_amt (float): the tip in $
total_bill (float): the bill including tip
size (int): the party slize
person_share (float): equal share of bill per person
Returns:
None (prints result)
Raises:
N/A
"""
print(MSG1.format(pct, tip_amt, total_bill))
print(MSG2.format(size, person_share))
def main(args):
""" execute script
Args: args (list): the command-line arguments
Returns: None
Raises: N/A
"""
bill, size, pct = validate_normalize_input(args)
tip_amt, total_bill, person_share = perform_calculations(bill, size,
pct)
report_results(pct, tip_amt, total_bill, size, person_share)
if __name__ == '__main__': # 'main body' code
main(sys.argv[1:])
The code inside the if __name__ == '__main__' block is intended to be the call that starts the program. If this Python script is imported, the main() function will not be called, because the if test will only be true if the script is executed, and will not be true if it is imported. We do this in order to allow the script's functions to be imported and used without actually running the script -- we may want to test the script's functions (unit testing) or make use of a function from the script in another program. Whether we intend to import a script or not, it is considered a "best practice" to build all of our programs in this way -- with a "main body" of statements collected under function main(), and the call to main() inside the if __name__ == '__main__' gate. This structure will be required for all assignments submitted for the remainder of the course.
User-defined functions are a block of code that can be executed by name.
def add(val1, val2):
valsum = val1 + val2
return valsum
ret = add(5, 10) # int, 15
ret2 = add(0.3, 0.9) # float, 1.2
A function is a block of code:
A user-defined function is simply a named code block that can be executed any number of times.
def print_hello():
print("Hello, World!")
print_hello() # prints 'Hello, World!'
print_hello() # prints 'Hello, World!'
print_hello() # prints 'Hello, World!'
The argument is the input to a function.
def print_hello(greeting, person): # note we do not
full_greeting = f'{greeting}, {person}!' # refer to 'name1'
print(full_greeting) # 'place2', etc.
# inside the function
name1 = 'Hello'
place1 = 'World'
print_hello(name1, place1) # prints 'Hello, World!'
name2 = 'Bonjour'
place2 = 'Python'
print_hello(name2, place2) # prints 'Bonjour, Python!'
A function's return value is passed back from the function using the return statement.
def print_hello(greeting, person):
full_greeting = f'{greeting}, {person}!'
return full_greeting
msg = print_hello('Bonjour', 'parrot')
print(msg) # 'Bonjour, parrot!'
In this unit we will complete our tour of the core Python data processing features.
So far we have explored the reading and parsing of data; the loading of data into built-in structures; and the aggregation and sorting of these structures. This unit explores advanced tools for container processing. list comprehensions and set comparisons are two "power tools" which can do basic things we have been able to do before -- like looping through a list and doing the same thing to each element in a list, loop through and select items from a list, and compare two collections to see what is common or different between them.
set operations
a = {'a', 'b', 'c'}
b = {'b', 'c', 'd'}
print(a.difference(b)) # {'a'}
print(a.union(b)) # {'a', 'b', 'c', 'd'}
print(a.intersection(b)) # {'b', 'c'}
print(a.symmetric_difference(b)) # {'a', 'd'}
list comprehensions
a = ['hello', 'there', 'harry']
print([ var.upper() for var in a if var.startswith('h') ])
# ['HELLO', 'HARRY']
ternary assignment
rev_sort = True if user_input == 'highest' else False
pos_val = x if x >= 0 else x * -1
conditional assignment
val = this or that # 'this' if this is True else 'that'
val = this and that # 'this' if this is False else 'that'
We have used the set to create a unique collection of objects. The set also allows comparisons of sets of objects. Methods like set.union (complete member list of two or more sets), set.difference (elements found in this set not found in another set) and set.intersection (elements common to both sets) are fast and simple to use.
set_a = {1, 2, 3, 4}
set_b = {3, 4, 5, 6}
print(set_a.union(set_b)) # {1, 2, 3, 4, 5, 6} (set_a + set_b)
print(set_a.difference(set_b)) # {1, 2} (set_a - set_b)
print(set_a.intersection(set_b)) # {3, 4} (what is common between them?)
List comprehensions abbreviate simple loops into one line.
Consider this loop, which filters a list so that it contains only positive integer values:
myints = [0, -1, -5, 7, -33, 18, 19, 55, -100]
myposints = []
for el in myints:
if el > 0:
myposints.append(el)
print(myposints) # [7, 18, 19, 55]
This loop can be replaced with the following one-liner:
myposints = [ el for el in myints if el > 0 ]
See how the looping and test in the first loop are distilled into the one line? The first el is the element that will be added to myposints - list comprehensions automatically build new lists and return them when the looping is done.
The operation is the same, but the order of operations in the syntax is different:
# this is pseudo code
# target list = item for item in source list if test
Hmm, this makes a list comprehension less intuitive than a loop. However, once you learn how to read them, list comprehensions can actually be easier and quicker to read - primarily because they are on one line. This is an example of a filtering list comprehension - it allows some, but not all, elements through to the new list.
Consider this loop, which doubles the value of each value in it:
nums = [1, 2, 3, 4, 5]
dblnums = []
for val in nums:
dblnums.append(val*2)
print(dblnums) # [2, 4, 6, 8, 10]
This loop can be distilled into a list comprehension thusly:
dblnums = [ val * 2 for val in nums ]
This transforming list comprehension transforms each value in the source list before sending it to the target list:
# this is pseudo code
# target list = item transform for item in source list
We can of course combine filtering and transforming:
vals = [0, -1, -5, 7, -33, 18, 19, 55, -100]
doubled_pos_vals = [ i*2 for i in vals if i > 0 ]
print(doubled_pos_vals) # [14, 36, 38, 110]
If they only replace simple loops that we already know how to do, why do we need list comprehensions? As mentioned, once you are comfortable with them, list comprehensions are much easier to read and comprehend than traditional loops. They say in one statement what loops need several statements to say - and reading multiple lines certainly takes more time and focus to understand.
Some common operations can also be accomplished in a single line. In this example, we produce a list of lines from a file, stripped of whitespace:
stripped_lines = [ i.rstrip() for i in open(r'FF_daily.txt').readlines() ]
Here, we're only interested in lines of a file that begin with the desired year (1972):
totals = [ i for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]
If we want the MktRF values (the leftmost floating-point value on each line) for our desired year, we could gather the bare amounts this way:
mktrf_vals = [ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ]
And in fact we can do part of an earlier assignment in one line -- the sum of MktRF values for a year:
mktrf_sum = sum([ float(i.split()[1]) for i in open('FF_daily.txt').readlines() if i.startswith('1972') ])
From experience I can tell you that familiarity with these forms make it very easy to construct and also to decode them very quickly - much more quickly than a 4-6 line loop.
Remember that dictionaries can be expressed as a list of 2-element tuples, converted using items(). Such a list of 2-element tuples can be converted back to a dictionary with dict():
mydict = {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': 1, 'f': 4}
my_items = list(mydict.items()) # my_items is now [('a',5), ('b',0), ('c',-3), ('d',2), ('e',1), ('f',4)]
mydict2 = dict(my_items) # mydict2 is now {'a':5, 'b':0, 'c':-3, 'd':2, 'e':1, 'f':4}
It becomes very easy to filter or transform a dictionary using this structure. Here, we're filtering a dictionary by value - accepting only those pairs whose value is larger than 0:
mydict = {'a': 5, 'b': 0, 'c': -3, 'd': 2, 'e': -22, 'f': 4}
filtered_dict = dict([ (i, j) for (i, j) in mydict.items() if j > 0 ])
Here we're switching the keys and values in a dictionary, and assigning the resulting dict back to mydict, thus seeming to change it in-place:
mydict = dict([ (j, i) for (i, j) in mydict.items() ])
The Python database module returns database results as tuples. Here we're pulling two of three values returned from each row and folding them into a dictionary.
# 'tuple_db_results' simulates what a database returns
tuple_db_results = [
('joe', 22, 'clerk'),
('pete', 34, 'salesman'),
('mary', 25, 'manager'),
]
names_jobs = dict([ (name, role) for name, age, role in tuple_db_results ])
sys.argv is a list that holds string arguments entered at the command line
a python script get_args.py
import sys # import the sys library
print('first arg: ' + sys.argv[1]) # print first command line arg
print('second arg: ' + sys.argv[2]) # print second command line arg
running the script from the command line, with two arguments
$ python myscript.py hello there first arg: hello second arg: there
sys.argv[0] will always contain the name of our program.
a python script print_args.py
import sys
print(sys.argv)
(passing 3 arguments)
$ python print_args.py hello there budgie ['myscript2.py', 'hello', 'there', 'budgie']
running the script from the command line (passing no arguments)
$ python print_args.py ['myscript2.py']
Since we read arguments from a list, we can trigger an IndexError if we try to read an argument that wasn't passed.
a python script addtwo.py
import sys
firstint = int(sys.argv[1])
secondint = int(sys.argv[2])
mysum = firstint + secondint
print(f'the sum of the two values is {mysum}')
passing 2 arguments
$ python addtwo.py 5 10 the sum of the two values is 15
passing no arguments
$ python addtwo.py Traceback (most recent call last): File "addtwo.py", line 3, in <module> firstint = int(sys.argv[1]) IndexError: list index out of range
How to handle this exception? Test the len() of sys.argv, or trap the exception.
With these we can see whether a file is a plain file, or a directory.
import os # os ('operating system') module talks
# to the os (for file access & more)
mydirectory = '/Users/david'
items = os.listdir(mydirectory)
for item in items:
item_path = os.path.join(mydirectory, item)
if os.path.isdir(item_path):
print(f"{item}: directory")
elif os.path.isfile(item_path):
print(f"{item}: file")
# photos: directory
# backups: directory
# college_letter.docx: file
# notes.txt: file
# finances.xlsx: file
This function tests to see if a file exists on the filesystem.
import os
fn = input('please enter a file or directory name: ')
if not os.path.exists(fn):
print('item does not exist')
elif os.path.isfile(fn):
print('item is a file')
elif os.path.isdir(fn):
print('item is a directory')
os.path.getsize() takes a filename and returns the size of the file in bytes
import os # os ('operating system') module
# talks to the os (for file access & more)
mydirectory = '/Users/david'
items = os.listdir(mydirectory)
for item in items:
item_path = os.path.join(mydirectory, item)
item_size = os.path.getsize(item_path)
print(f"{item_path}: {item_size} bytes")
moving and renaming a file are essentailly the same thing
import os
filename = 'file1.txt'
new_filename = 'newname.txt'
os.rename(filename, new_filename)
import os
filename = 'file1.txt' # or could be a filepath incluing directory
move_to_dir = 'old/'
os.rename(filename, os.path.join(move_to_dir, filename)) # file1.txt, old/file1.txt
import shutil
filename = 'file1.txt'
backup_filename = 'file1.txt_bk' # must be a filepath, including filename
shutil.copyfile(filename, backup_filename)
import shutil
filename = 'file1.txt'
target_dir = 'backup' # can be a filepath or just a directory name
shutil.copy(filename, target_dir) # dst can be a folder; use shutil.copy2()
This function is named after the unix utility mkdir.
import os
os.mkdir('newdir')
If your directory has files, shutil.rmtree must be used.
import os
import shutil
os.mkdir('newdir')
wfh = open('newdir/newfile.txt', 'w') # creating a file in the dir
wfh.write('some data')
wfh.close()
os.rmdir('newdir') # OSError: [Errno 66] Directory not empty: 'newdir'
shutil.rmtree('newdir') # success
import shutil
shutil.copytree('olddir', 'newdir')
Regardless of what files and folders are in the directory to be copied, all files and folders (and indeed all folders and files within) will be copied to the new name or location.
Opening an existing file for writing truncates the file.
fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()
Appending is usually used for log files.
fh = open('new_file.txt', 'w')
fh.write("here's a line of text\n")
fh.write('I add the newlines explicitly if I want to write to the file\n')
fh.close()
Again, note that we are explicitly adding newlines to the end of each line.
The pwd is the location from which we run our programs.
import os
cwd = os.getcwd() # str (your current directory)
print(cwd)
this tree can be found among your course files.
dir1 ├── file1.txt ├── test1.py │ ├── dir2a │ ├── file2a.txt │ ├── test2a.py │ │ │ ├── dir3a │ │ ├── file3a.txt │ │ ├── test3a.py │ │ │ │ │ └── dir4 │ │ ├── file4.txt │ │ └── test4.py └── dir2b ├── file2b.txt ├── test2b.py │ └── dir3b ├── file3b.txt └── test3b.py
These paths locate files relative to the present working directory.
If the file you want to open is in the same directory as the script you're executing, use the filename alone:
fh = open('filename.txt')
To reach the parent directory, prepend the filename with ../
:
fh = open('../filename.txt')
To reach the child directory, prepend the filename with the name of the child directory.
fh = open('childdir/filename.txt')
To reach a sibling directory, prepend the filename with ../ and the name of the child directory.
fh = open('childdir/filename.txt')
To reach a sibling directory, we must go "up, then down" by using ../ to go to the parent, then the sibling directory name to go down to the child.
These paths locate files from the root of the filesystem.
In Windows, absolute paths begin with a drive letter, usually C:\:
""" test3a.py: open and read a file """
filepath = r'C:\Users\david\Desktop\python_data\dir1\file1.txt'
fh = open(filepath)
print(fh.read())
(Note that r'' should be used with any Windows paths that contain backslashes.)
On the Mac, absolute paths begin with a forward slash:
""" test3a.py: open and read a file """
filepath = '/Users/david/Desktop/python_data/dir1/file1.txt'
fh = open(filepath)
print(fh.read())
(The above paths assume that the python_data folder is in the Desktop directory; your may have placed yours elsewhere on your system. Of course, the above paths also assume that my home directory is called david/; yours is likely different.)
This function joins together directory and file strings with slashes appropriate with the current operating system.
dirname = '/Users/david'
filename = 'journal.txt'
filepath = os.path.join(dirname, filename) # '/Users/david/journal.txt'
filepath2 = os.path.join(dirname, 'backup', filename) # '/Users/david/backup/journal.txt'
os.listdir() can read the contents of any directory.
import os
mydirectory = '/Users/david'
items = os.listdir(mydirectory)
for item in items: # 'photos'
item_path = os.path.join(mydirectory, item)
print(item_path) # /Users/david/photos/
# /Users/david/backups/
# /Users/david/college_letter.docx
# /Users/david/notes.txt
# /Users/david/finances.xlsx
Note the os.path.join() call. This is a standard algorithm for looping through a directory -- each item must be joined to the directory to ensure that the filepath is correct.
Several exceptions can indicate a file or directory misfire.
exception type | example trigger |
---|---|
FileNotFoundError | attempt to open a file not in this location |
FileExistsError | attempt to create a directory (or in some cases a file) that already exists |
IsADirectoryError | attempt to open() a file that is already a directory |
NotADirectoryError | attempt to os.listdir() a directory that is not a directory |
PermissionError | attempt to read or write a file or directory to which you haven't the permissions |
WindowsError, OSError | these exception types are sometimes raised in place of one or more of the above when on a Windows computer |
os.walk() visits every directory in a directory tree so we can list files and folders.
import os
root_dir = '/Users/david'
for root, dirs, files in os.walk(root_dir):
for tdir in dirs: # loop through dirs in this directory
print(os.path.join(root, tdir)) # print full path to tdir
for tfile in files: # loop through files in this dir
print(os.path.join(root, tfile)) # print full path to file
At each iteration, these three variables are assigned these values:
User-defined functions help us organize our code -- and our thinking.
Let's now return to functions from the point of view of code organization. Functions are useful because they:
def add(val1, val2):
mysum = val1 + val2
return mysum
a = add(5, 10) # int, 15
b = add(0.2, 0.2) # float, 0.4
Review what we've learned about functions:
When a function does not return anything, it returns None.
def do(arg):
print(f'{arg} doubled is {arg * 2}')
# no return statement returns None
x = do(5) # (prints '5 doubled is 10')
print(x) # None
Actually, since do() does not return anything useful, then we should not call it with an assignment (i.e., x = above), because no useful value will be returned. If you should call a function and find that its return value is None, it often means that it was not meant to be assigned because there is no useful return value.
The None value is the "value that means 'no value'".
zz = None
print(zz) # None
print(type(zz)) # <class 'NoneType'>
aa = 'None' # a string -- not the None value!
Positional arguments are required to be passed, and assigned by position.
def greet(firstname, lastname):
print(f"Hello, {firstname} {lastname}!")
greet('Joe', 'Wilson') # passed two arguments: correct
greet('Marie') # TypeError: greet() missing 1 required positional argument: 'lastname'
Keyword args are not required, and if not passed return a default value.
def greet(lastname, firstname='Citizen'):
print(f"Hello, {firstname} {lastname}!")
greet('Kim', firstname='Joe') # Hello, Joe Kim!
greet('Kim') # Hello, Citizen Kim!
Variable names initialized inside a function are local to the function.
def myfunc():
a = 10
return a
var = myfunc()
print(var) # 10
print(a) # NameError ('a' does not exist here)
Any variable defined outside a function is global.
var = 'hello global'
def myfunc():
print(var)
myfunc() # hello global
Functions that do not touch outside variables, and do not create "side effects" (for example, calling exit(), print() or input()), are considered "pure" -- and are preferred.
"Pure" functions have the following characteristics:
"Outside" (Global) variables are ones defined outside the function -- they should be avoided.
wrong way: referring to an outside variable inside a function
val = '5' # defined outside any function
def doubleit():
dval = int(val) * 2 # BAD: refers to "global" variable 'val'
return dval
new_val = doubleit()
right way: passing outside variables as arguments
val = '5' # defined outside any function
def doubleit(arg):
dval = int(arg) * 2 # GOOD: refers to same value as 'val',
return dval # but accessed through local
# argument 'arg'
new_val = doubleit(val) # passing variable to function -
# correct way to get a value into the function
print(), input(), exit() all "touch" the outside world and in many cases should be avoided inside functions.
Although it is of course possible (and sometimes practical) to use these built-in functions inside our function, we should avoid them if we are interested in making a function "pure".
Here are some positive reasons to strive for purity.
You may notice that these "impure" practices do not cause errors. So why should we avoid them?
exit() should not be called inside a function.
def doubleit(arg):
if not arg.isdigit():
raise ValueError('arg must be all digits') # GOOD: error signaled with raise
dval = int(arg) * 2
return dval
val = input('what is your value? ')
new_val = doubleit(val)
'raise' creates an error condition (exception) that usually terminates program execution.
To raise an exception, we simply follow raise with the type of error we would like to raise, and an optional message:
raise IndexError('I am now raising an IndexError exception')
You may raise any existing exception (you may even define your own). Here is a list of common exceptions:
Exception Type | Reason |
---|---|
TypeError | the wrong type used in an expression |
ValueError | the wrong value used in an expression |
FileNotFoundError | a file or directory is requested that doesn't exist |
IndexError | use of an index for a nonexistent list/tuple item |
KeyError | a requested key does not exist in the dictionary |
Globals should be used inside functions only in select circumstances.
STATE_TAX = .05 # ALL CAPS designates a "constant"
def calculate_bill(bill_amount, tip_pct):
tax = bill_amount * STATE_TAX # int, 5
tip = bill_amount * tip_pct # float, 20.0
total_amount = bill_amount + tax + tip # float, 125.0
return total_amount
total = calculate_bill(100, .20) # float, 125.0
Four kinds of variables: (L)ocal, (E)nclosing, (G)lobal and (B)uiltin.
filename = 'pyku.txt' # 'filename': global
# 'get_text': global (function name is a
# variable as well)
def get_text(fname): # 'fname': local
fh = open(fname) # 'fh': local; 'open': builtin
text = fh.read() # 'text': local
return text
txt = get_text(filename) # 'txt': global
print(txt) # 'print': builtin
Core principles.
Here are the main components of a properly formatted program:
See the tip_calculator.py file in your files directory for an example and notes below.
Python comes with hundreds of preinstalled modules.
import sys # find and import the sys
import json # find and importa the json
print(sys.copyright) # the .copyright attribute points
# to a string with the copyright notice
# Copyright (c) 2001-2023 Python...
obj = json.loads('{"a": 1, "b": 2}') # the .loads attribute points to
# a function that reads str JSON data
print(type(obj)) # <class 'dict'>
These patterns are purely for convenience when needed.
Abbreviating the name of a module:
import json as js # 'json' is now referred to as 'js'
obj = js.loads('{"a": 1, "b": 2}')
Importing a module variable into our program directly:
from json import loads # making the 'loads' function part of the global namespace
obj = loads('{"a": 1, "b": 2}')
Please note that this does not import only a part of the module: the entire module code is still imported.
Each module has a specific focus.
A module of our own design may be saved as a .py file.
messages.py: a simple Python module that prints messages
import sys
def print_warning(msg):
print(f'Warning! {msg}')
test.py: a Python script that imports messages.py
import messages
# accessing the print_warning() function
messages.print_warning('Look out!') # Warning! Look out!
Python must be told where to find our own custom modules.
To view the currently used module search paths, we can use sys.path
import sys
print(sys.path) # shows a list of strings, each a directory
# where modules can be found
Like the PATH for programs, this variable tells Python where to find modules.
Modules included with Python are installed when Python is installed -- they are always available.
Python provides hundreds of supplementary modules to perform myriad tasks. The modules do not need to be installed because they come bundled in the Python distribution, that is they are installed at the time that Python itself is installed. The documentation for the standard library is part of the official Python docs.
This index contains links to all modules ever added by anyone to the index.
Search for any module's home page at the PyPI website:
https://pypi.python.org/pypi
Take some care when installing modules -- it is possible to install nefarious code.
- [demo: searching for powerpoint module, verifying ]
Third-party modules must be downloaded and installed into your Python distribution.
Commands to use at the command line:
pip search pandas # searches for pandas in the PyPI repository pip install pandas # installs pandas
The math module handles advanced math calculations.
These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)
A quick look at the module's attributes gives us an idea of what is included:
import math
print(dir(math))
# ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
# 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
# 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
# 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
# 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
# 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
# 'tanh', 'tau', 'trunc']
For example, here are some simple geometry calculations using math:
import math
print(math.pi) # 3.141592653589793
radius = 3
circumference = 2 * math.pi * radius # 18.84955592153876
area = math.pi * radius * radius # 28.274333882308138
This module provides basic statistical analysis.
Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.
import statistics as stats # set a convenient name for the module
values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]
# average value
meanval = stats.mean(values) # 4.083333333333333
# "middle" value in a list of sorted values (list does not need to be sorted)
medianval = stats.median(values) # 4.0
# average distance of each value from the mean
standev = stats.stdev(values) # 1.781640374554423
# square of the standard deviation
varianceval = stats.variance(values) # 3.1742424242424243
# most common value
modeval = stats.mode(values) # 6
This module provides useful lists of characters.
import string
print(string.ascii_letters) # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.ascii_lowercase) # abcdefghijklmnopqrstuvwxyz
print(string.ascii_uppercase) # ABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.digits) # 0123456789
print(string.hexdigits) # 0123456789abcdefABCDEF
print(string.punctuation) # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
print(string.whitespace) # \t\n\r\x0b\x0c' (prints as invisible characters)
The zipfile module builds, unpacks and inspects .zip archives.
import zipfile as zp
myzip = zp.ZipFile('myzip.zip', 'w')
# add names of files (of course these must exist)
myzip.write('file1.txt')
myzip.write('file2.pdf')
myzip.write('file3.doc')
myzip.close() # builds and writes zip file
print('done')
After running the above code and referencing real files, check the session files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.
The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.
time can be used to sleep (or pause execution) for a set number of seconds:
import time
# pause execution # of seconds
time.sleep(5)
We can also use time to show the current time:
# current time and date
print(time.ctime()) # Sat May 23 17:10:55 2020
At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).
# read current time in seconds
secs = time.time() # 1590257729.297496 (includes milliseconds)
# calculate 24 hours, in seconds (subtract 86,400 seconds)
yestersecs = secs - (60 * 60 * 24)
# show the current time minus 24 hours
print(time.ctime(yestersecs)) # Fri May 22 17:10:55 2020
# a "time struct"
print(time.localtime(yestersecs))
# time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
# tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
# tm_yday=143, tm_isdst=1)
The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.
The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.
import datetime as dt
# build a 'date' object from year, month, day
mydate1 = dt.date(2019, 9, 3)
# build a 'date' object representing today
mydate2 = dt.date.today()
# build a datetime object from year, month, day, hour, minute and second
mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)
# build a datetime object representing right now
mydatetime2 = dt.datetime.now()
# build a datetime object from a formatted string
mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')
# build a "timedelta" (time interval) object: 3 days, 2 hours
myinterval = dt.timedelta(days=3, seconds=7200)
# date objects and intervals can be calculated like math
newdate = mydatetime3 + myinterval
print(newdate) # 2019-03-06 00:02:00
# render a date object in a string format
print(newdate.strftime('%Y-%m-%d (%H:%M)')) # 2019-03-06 (02:00)
The random module generates pseudorandom numbers.
'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.
import random
# random float from 0 to 1
myfloat = random.random() # 0.22845730036901912
# random integer from 1 to 10
num = random.randint(1, 10)
# random choice from a list
x = ['a', 'b', 'c']
choice = random.choice(x) # 'b'
The csv module reads and writes CSV files.
import csv
# reading a CSV file
fh = open('dated_file.csv')
reader = csv.reader(fh)
for row in reader:
print(row)
fh.close()
# writing to a CSV file
wfh = open('newfile.csv', 'w', newline='')
writer = csv.writer(wfh)
writer.writerow(['a', 'b', 'c'])
writer.writerow(['d', 'e', 'f'])
writer.writerow(['g', 'b', 'i'])
wfh.close() # required - otherwise you may not see the writes
(newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits. (With Jupyter notebooks or the Python interactive interpreter, the unclosed file will not see changes until after the interpreter is closed.)
The sqlite3 module allows file-based writing and reading of relational tables.
# connecting
import sqlite3
conn = sqlite3.connect('mydatabase.db') # open an existing, or create a new file
cur = conn.cursor()
#creating a table
cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")
# insert rows into a table
rows = [
[ 'Joe', 23, 23.9],
[ 'Marie', 19, 7.95 ],
[ 'Zoe', 29, 17.5 ]
]
for row in rows:
cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)
conn.commit() # essential to see the write
# selecting data from a table
cur = conn.cursor()
cur.execute('SELECT name, years, balance FROM mytable')
for row in cur:
print(row) # ('Joe', 23, 23.9)
# ('Marie', 19, 7.95)
# ('Zoe', 29, 17.5)
requests
requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.
import requests
# make URL request; download the response
response = requests.get('http://www.nytimes.com')
# the HTTP response code (200 OK, 404 not found, 500 error, etc.)
status_code = response.status_code
# the text of the response
page_text = response.text
# decoding the text of the response (if necessary)
page_text = page_text.encode('utf-8')
print(f'status code: {status_code}')
print('======================= page text =======================')
print(page_text)
If requests is not available on your system, urllib provides similar functionality.
import urllib
# make URL request; download the text of the response
read_object = urllib.request.urlopen('http://www.nytimes.com')
# a file-like object, can also 'for' loop or use .readlines()
text = read_object.read()
# decoding the text of the response (if necessary)
text = text.decode('utf-8')
SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
my_url = 'http://www.nytimes.com'
read_object = urllib.request.urlopen(my_url, context=ctx)
The bs4 module can parse HTML to extract data from web pages.
This module must be installed separately.
import bs4
fh = open('dormouse.html')
text = fh.read()
fh.close()
soup = bs4.BeautifulSoup(text, 'html.parser')
# show all plain text in a page
print(soup.get_text())
# retrieve first tag with this name (a <title> tag)
tag = page.title
# same, using <B>.find()
tag = page.find('title')
# find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
link1_a_tags = soup.find_all('a', {'id': 'link1'})
# find all <a> tags (hyperlinks)
tags = soup.find_all('a')
The re module can recognize patterns in text and extract portions of text based on patterns.
import re
line = 'a phone number: 213-298-1990'
matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)
print(matchobj.group(1)) # '213-298-1990'
The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.
The textwrap module allows you to wrap text at a certain width.
import textwrap
text = "This is some really long text that we would like to wrap. Wouldn't you know it, there's a module for that! "
# returns a list of lines
# text is limited to 10 characters width
items = textwrap.wrap(text, 10)
# join lines together into multi-line string with new width
print('\n'.join(items))
The pandas module enables table manipulations similar to those done by excel relational databases.
The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.
pandas can read and write to and from a multitude of formats
import pandas as pd
import sqlite3
# read from multiple formats to a DataFrame
df = pd.read_csv('dated_file.csv')
# df = pd.read_excel('dated_file.xls')
# df = pd.read_json('dated_file.json')
# write DataFrame to multiple formats
df.to_csv('new_file.csv')
# df.to_excel('new_file.xls')
# df.to_json('new_file.json')
# read from database through query
conn = sqlite3.connect('testdb.db')
df = pd.read_sql('SELECT * FROM test', conn)
pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:
df = pd.read_csv('dated_file.csv')
# select rows thru a filter
df2 = df[ df[3] > 18 ] # all rows where the field in column '3' (4th column) is > 18
# sum, average, etc. a column
df.tax.mean() # average values in 'tax' column
df.revenue.sum() # sum values in 'revenue' column
# create a new column
df['col99'] = df.col1 + df.revenue # new column sums 'col1' and 'revenue' field from each line
# groupby aggregation
dfgb = df.state.groupby.sum().revenue # show sum of revenue for each state
pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.
# groupby bar chart
dfgb.plot().bar()
# weather temp line chart
weather_df.temp.plot().line()
This slide deck contains basic documentation on some of the most useful modules in the Python standard distribution. There are many more!
As you know, a module is Python code stored in a separate file or files that we can import into our code, to help us do specialized work. The Python documentation lists modules that come installed with Python (collectively, these modules are known as the "Standard Library"). Every module demonstrated below has many features and options. You can refer to documentation, or an article or blog post, to learn more about each.
The math module handles advanced math calculations.
These calculations include functions for calculating factorials, ceiling and floor, and logarithmic, geometric, and trigonometric values (sin, cosin, tan, etc.)
A quick look at the module's attributes gives us an idea of what is included:
import math
print(dir(math))
# ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
# 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign',
# 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial',
# 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose',
# 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2',
# 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan',
# 'tanh', 'tau', 'trunc']
For example, here are some simple geometry calculations using math:
import math
print(math.pi) # 3.141592653589793
radius = 3
circumference = 2 * math.pi * radius # 18.84955592153876
area = math.pi * radius * radius # 28.274333882308138
This module provides basic statistical analysis.
Some of our earliest exercises calculated mean, median, and standard deviation. These operations are more easily performed through this module's functions.
import statistics as stats # set a convenient name for the module
values = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 6]
# average value
meanval = stats.mean(values) # 4.083333333333333
# "middle" value in a list of sorted values (list does not need to be sorted)
medianval = stats.median(values) # 4.0
# average distance of each value from the mean
standev = stats.stdev(values) # 1.781640374554423
# square of the standard deviation
varianceval = stats.variance(values) # 3.1742424242424243
# most common value
modeval = stats.mode(values) # 6
This module provides useful lists of characters.
import string
print(string.ascii_letters) # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.ascii_lowercase) # abcdefghijklmnopqrstuvwxyz
print(string.ascii_uppercase) # ABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.digits) # 0123456789
print(string.hexdigits) # 0123456789abcdefABCDEF
print(string.punctuation) # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
print(string.whitespace) # \t\n\r\x0b\x0c' (prints as invisible characters)
The zipfile module builds, unpacks and inspects .zip archives.
import zipfile as zp
myzip = zp.ZipFile('myzip.zip', 'w')
# add names of files (of course these must exist)
myzip.write('file1.txt')
myzip.write('file2.pdf')
myzip.write('file3.doc')
myzip.close() # builds and writes zip file
print('done')
After running the above code and referencing real files, check this unit's files directory -- you should see a new .zip file added. You can also use zipfile to unpack and check the manifest (contents) of a zip file.
The time module handles time-related functions such as telling the current time, calculating time and for sleeping for a period of time.
time can be used to sleep (or pause execution) for a set number of seconds:
import time
# pause execution # of seconds
time.sleep(5)
We can also use time to show the current time:
# current time and date
print(time.ctime()) # Sat May 23 17:10:55 2020
At a very basic level it's possible to manipulate time through arithmetic (though complex calculations of date and time are more easily handled with the datetime module).
# read current time in seconds
secs = time.time() # 1590257729.297496 (includes milliseconds)
# calculate 24 hours, in seconds (subtract 86,400 seconds)
yestersecs = secs - (60 * 60 * 24)
# show the current time minus 24 hours
print(time.ctime(yestersecs)) # Fri May 22 17:10:55 2020
# a "time struct"
print(time.localtime(yestersecs))
# time.struct_time(tm_year=2020, tm_mon=5, tm_mday=22,
# tm_hour=17, tm_min=10, tm_sec=55, tm_wday=4,
# tm_yday=143, tm_isdst=1)
The "time struct" is a custom object that provides day of week, day of year and whether the time reflects daylight savings.
The datetime module handles the calculation of dates and times, reading dates from string in any format, and writing dates to string in any format.
import datetime as dt
# build a 'date' object from year, month, day
mydate1 = dt.date(2019, 9, 3)
# build a 'date' object representing today
mydate2 = dt.date.today()
# build a datetime object from year, month, day, hour, minute and second
mydatetime1 = dt.datetime(2019, 9, 3, 12, 5, 30)
# build a datetime object representing right now
mydatetime2 = dt.datetime.now()
# build a datetime object from a formatted string
mydatetime3 = dt.datetime.strptime('2019-03-03', '%Y-%m-%d')
# build a "timedelta" (time interval) object: 3 days, 2 hours
myinterval = dt.timedelta(days=3, seconds=7200)
# date objects and intervals can be calculated like math
newdate = mydatetime3 + myinterval
print(newdate) # 2019-03-06 00:02:00
# render a date object in a string format
print(newdate.strftime('%Y-%m-%d (%H:%M)')) # 2019-03-06 (02:00)
The random module generates pseudorandom numbers.
'Pseudorandom' means that computers, being "determinative", are not capable of true randomness. The module tries its best to give out number sequences that will not repeat.
import random
# random float from 0 to 1
myfloat = random.random() # 0.22845730036901912
# random integer from 1 to 10
num = random.randint(1, 10)
# random choice from a list
x = ['a', 'b', 'c']
choice = random.choice(x) # 'b'
The csv module reads and writes CSV files.
import csv
# reading a CSV file
fh = open('dated_file.csv')
reader = csv.reader(fh)
for row in reader:
print(row)
fh.close()
# writing to a CSV file
wfh = open('newfile.csv', 'w', newline='')
writer = csv.writer(wfh)
writer.writerow(['a', 'b', 'c'])
writer.writerow(['d', 'e', 'f'])
writer.writerow(['g', 'b', 'i'])
wfh.close() # essential - otherwise you may not see the writes until the program exits
(newline='' is necessary when opening the file to neutralize an issue in Windows regarding the '\r\n' line ending that Windows uses. While not needed on Mac or Linux, this added argument does no harm.) As with all file writing, it's essential to close a write filehandle; otherwise, you may not see the write in the file until after the program exits.
The sqlite3 module allows file-based writing and reading of relational tables.
# connecting
import sqlite3
conn = sqlite3.connect('mydatabase.db') # open an existing, or create a new file
cur = conn.cursor()
#creating a table
cur.execute("CREATE TABLE mytable (name TEXT, years INT, balance FLOAT)")
# insert rows into a table
rows = [
[ 'Joe', 23, 23.9],
[ 'Marie', 19, 7.95 ],
[ 'Zoe', 29, 17.5 ]
]
for row in rows:
cur.execute("INSERT INTO mytable VALUES (?, ?, ?)", row)
conn.commit() # essential to see the write
# selecting data from a table
cur = conn.cursor()
cur.execute('SELECT name, years, balance FROM mytable')
for row in cur:
print(row) # ('Joe', 23, 23.9)
# ('Marie', 19, 7.95)
# ('Zoe', 29, 17.5)
requests
requests (which must be installed separately) is generally preferred over urllib, which comes installed with the standard distribution of Python. requests simply provides a more convenient interface, i.e. more convenient commands to accomplish the same tasks.
import requests
# make URL request; download the response
response = requests.get('http://www.nytimes.com')
# the HTTP response code (200 OK, 404 not found, 500 error, etc.)
status_code = response.status_code
# the text of the response
page_text = response.text
# decoding the text of the response (if necessary)
page_text = page_text.encode('utf-8')
print(f'status code: {status_code}')
print('======================= page text =======================')
print(page_text)
If requests is not available on your system, urllib provides similar functionality.
import urllib
# make URL request; download the text of the response
read_object = urllib.request.urlopen('http://www.nytimes.com')
# a file-like object, can also 'for' loop or use .readlines()
text = read_object.read()
# decoding the text of the response (if necessary)
text = text.decode('utf-8')
SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
my_url = 'http://www.nytimes.com'
read_object = urllib.request.urlopen(my_url, context=ctx)
The bs4 module can parse HTML to extract data from web pages.
This module must be installed separately.
import bs4
fh = open('dormouse.html')
text = fh.read()
fh.close()
soup = bs4.BeautifulSoup(text, 'html.parser')
# show all plain text in a page
print(soup.get_text())
# retrieve first tag with this name (a <title> tag)
tag = page.title
# same, using <B>.find()
tag = page.find('title')
# find first <a> tag with specific tag parameters (<A HREF="mysite", id="link1">)
link1_a_tags = soup.find_all('a', {'id': 'link1'})
# find all <a> tags (hyperlinks)
tags = soup.find_all('a')
The re module can recognize patterns in text and extract portions of text based on patterns.
import re
line = 'a phone number: 213-298-1990'
matchobj = re.search('(\d\d\d)\-(\d\d\d)\-(\d\d\d\d)', line)
print(matchobj.group(1)) # '213-298-1990'
The regular expression spec is a declarative language that is implemented by many programming languages (JavaScript Java, Ruby, Perl, etc.). To fully understand and use them, you will need to complete a course or tutorial that covers them in detail.
The subprocess module allows your program to launch other programs / applications.
import subprocess
# execute another program; read from STDIN and write to STDOUT
subprocess.call(['ls', 'path/to/my/dir'])
# execute another Python script
subprocess.call(['python', 'hello.py'])
# execute another program and capture output
out = subprocess.check_output(['python', 'hello.py'])
The textwrap module allows you to wrap text at a certain width.
import textwrap
text = "This is some really long text that we would like to wrap. Wouldn't you know it, there's a module for that! "
# returns a list of lines
# text is limited to 10 characters width
items = textwrap.wrap(text, 10)
# join lines together into multi-line string with new width
print('\n'.join(items))
The pandas module enables table manipulations similar to those done by excel relational databases.
The central object offered by pandas is the DataFrame, a 2-dimensional tabular structure similar to an Excel spreasheet (columns and rows, with column and row labels). This module must be installed separately.
pandas can read and write to and from a multitude of formats
import pandas as pd
import sqlite3
# read from multiple formats to a DataFrame
df = pd.read_csv('dated_file.csv')
# df = pd.read_excel('dated_file.xls')
# df = pd.read_json('dated_file.json')
# write DataFrame to multiple formats
df.to_csv('new_file.csv')
# df.to_excel('new_file.xls')
# df.to_json('new_file.json')
# read from database through query
conn = sqlite3.connect('testdb.db')
df = pd.read_sql('SELECT * FROM test', conn)
pandas can perform 'where clause' style selctions, sum or average columns, and perform GROUPBY database-style aggregations:
df = pd.read_csv('dated_file.csv')
# select rows thru a filter
df2 = df[ df[3] > 18 ] # all rows where '3' field is > 18
# sum, average, etc. a column
df.tax.mean() # average values in 'tax' column
df.revenue.sum() # sum values in 'revenue' column
# create a new column
df['col99'] = df.col1 + df.revenue # new column sums 'col1' and 'revenue' field from each line
# groupby aggregation
dfgb = df.state.groupby.sum().revenue # show sum of revenue for each state
pandas is tightly integrated with matplotlib, a full featured plotting library. The resulting images can be displayed in a Jupyter notebook, or saved as an image file.
# groupby bar chart
dfgb.plot().bar()
# weather temp line chart
weather_df.temp.plot().line()
Classes allow us to create a custom type of object -- that is, an object with its own behaviors and its own ways of storing data. Consider that each of the objects we've worked with previously has its own behavior, and stores data in its own way: dicts store pairs, sets store unique values, lists store sequential values, etc. An object's behaviors can be seen in its methods, as well as how it responds to operations like subscript, operators, etc. An object's data is simply the data contained in the object or that the object represents: a string's characters, a list's object sequence, etc.
First let's look at object types that demonstrate the convenience and range of behaviors of objects.
A date object can be set to any date and knows how to calculate dates into the future or past. To change the date, we use a timedelta object, which can be set to an "interval" of days to be added to or subtracted from a date object.
from datetime import date, timedelta
dt = date(1926, 12, 30) # create a new date object set to 12/30/1926
td = timedelta(days=3) # create a new timedelta object: 3 day interval
dt = dt + timedelta(days=3) # add the interval to the date object: produces a new date object
print(dt) # '1927-01-02' (3 days after the original date)
dt2 = date.today() # as of this writing: set to 2016-08-01
dt2 = dt2 + timedelta(days=1) # add 1 day to today's date
print(dt2) # '2016-08-02'
print(type(dt)) # <type 'datetime.datetime'>
print(type(td)) # <type 'datetime.timedelta'>
Now let's imagine a useful object -- this proposed class will allow you to interact with a server programmatically. Each server object represents a server that you can ping, restart, copy files to and from, etc.
import time
from sysadmin import Server
s1 = Server('blaikieserv')
if s1.ping():
print('{} is alive '.format(s1.hostname))
s1.restart() # restarts the server
s1.copyfile_up('myfile.txt') # copies a file to the server
s1.copyfile_down('yourfile.txt') # copies a file from the server
print(s1.uptime()) # blaikieserv has been alive for 2 seconds
Method calls on the instance refer to functions defined in the class.
class Greeting:
""" greets the user """
def greet(self):
print('hello, user!')
c = Greeting()
c.greet() # hello, user!
print(type(c)) # <class '__main__.Greeting'>
Each class object or instance is of a type named after the class. In this way, class and type are almost synonymous.
Data is stored in each instance through its attributes, which can be written and read just like dictionary keys and values.
class Something:
""" just makes 'Something' objects """
obj1 = Something()
obj2 = Something()
obj1.var = 5 # set attribute 'var' to int 5
obj1.var2 = 'hello' # set attribute 'var2' to str 'hello'
obj2.var = 1000 # set attribute 'var' to int 1000
obj2.var2 = [1, 2, 3, 4] # set attribute 'var2' to list [1, 2, 3, 4]
print(obj1.var) # 5
print(obj1.var2) # hello
print(obj2.var) # 1000
print(obj2.var2) # [1, 2, 3, 4]
obj2.var2.append(5) # appending to the list stored to attribute var2
print(obj2.var2) # [1, 2, 3, 4, 5]
In fact the attribute dictionary is a real dict, stored within a "magic" attribute of the instance:
print(obj1.__dict__) # {'var': 5, 'var2': 'hello'}
print(obj2.__dict__) # {'var': 1000, 'var2': [1, 2, 3, 4, 5]}
Data can also be stored in a class through class attributes or through variables defined in the class.
class MyClass:
""" The MyClass class holds some data """
var = 10 # set a variable in the class (a class variable)
MyClass.var2 = 'hello' # set an attribute directly in the class object
print(MyClass.var) # 10 (attribute was set as variable in class block)
print(MyClass.var2) # 'hello' (attribute was set as attribute in class object)
print(MyClass.__dict__) # {'var': 10,
# '__module__': '__main__',
# '__doc__': ' The MyClass class holds some data ',
# 'var2': 'hello'}
The additional __module__ and __doc__ attributes are automatically added -- __module__ indicates the active module (here, that the class is defined in the script being run); __doc__ is a special string reserved for documentation on the class).
If an attribute can't be found in an object, it is searched for in the class.
class MyClass:
classval = 10 # class attribute
a = MyClass()
b = MyClass()
b.classval = 99 # instance attribute of same name
print(a.classval) # 10 - still class attribute
print(b.classval) # 99 - instance attribute
del b.classval # delete instance attribute
print(b.classval) # 10 -- now back to class attribute
print(MyClass.classval) # 10 -- class attributes are accessible through Class as well
Object methods or instance methods allow us to work with the instance's data.
class Do:
def printme(self):
print(self) # <__main__.Do object at 0x1006de910>
x = Do()
print(x) # <__main__.Do object at 0x1006de910>
x.printme()
Note that x and self have the same hex code. This indicates that they are the very same object.
Since instance methods pass the instance, and we can store values in instance attributes, we can combine these to have a method modify an instance's values.
class Sum:
def add(self, val):
if not hasattr(self, 'x'):
self.x = 0
self.x = self.x + val
myobj = Sum()
myobj.add(5)
myobj.add(10)
print(myobj.x) # 15
These methods are used to read and write instance attributes in a controlled way.
class Counter:
def setval(self, val): # arguments are: the instance, and the value to be set
if not isinstance(val, int):
raise TypeError('arg must be a string')
self.value = val # set the value in the instance's attribute
def getval(self): # only one argument: the instance
return self.value # return the instance attribute value
def increment(self):
self.value = self.value + 1
a = Counter()
b = Counter()
a.setval(10) # although we pass one argument, the implied first argument is a itself
a.increment()
a.increment()
print(a.getval()) # 12
b.setval('hello') # TypeError
The initializer of an instance allows us to set the initial attribute values of the instance.
class MyCounter:
def __init__(self, initval): # self is implied 1st argument (the instance)
try:
initval = int(initval) # test initval to be an int,
except ValueError: # set to 0 if incorrect
initval = 0
self.value = initval # initval was passed to the constructor
def increment_val(self):
self.value = self.value + 1
def get_val(self):
return self.value
a = MyCounter(0)
b = MyCounter(100)
a.increment_val()
a.increment_val()
a.increment_val()
b.increment_val()
b.increment_val()
print(a.get_val()) # 3
print(b.get_val()) # 102
When a class inherits from another class, attribute lookups can pass to the parent class when accessed from the child.
class Animal:
def __init__(self, name):
self.name = name
def eat(self, food):
print('{} eats {}'.format(self.name, food))
class Dog(Animal):
def fetch(self, thing):
print('{} goes after the {}!'.format(self.name, thing))
class Cat(Animal):
def swatstring(self):
print('{} shreds the string!'.format(self.name))
def eat(self, food):
if food in ['cat food', 'fish', 'chicken']:
print('{} eats the {}'.format(self.name, food))
else:
print('{}: snif - snif - snif - nah...'.format(self.name))
d = Dog('Rover')
c = Cat('Atilla')
d.eat('wood') # Rover eats wood.
c.eat('dog food') # Atilla: snif - snif - snif - nah...
Same-named methods in two different classes can share a conceptual similarity.
class Animal:
def __init__(self, name):
self.name = name
def eat(self, food):
print('{} eats {}'.format(self.name, food))
class Dog(Animal):
def fetch(self, thing):
print('{} goes after the {}!'.format(self.name, thing))
def speak(self):
print('{}: Bark! Bark!'.format(self.name))
class Cat(Animal):
def swatstring(self):
print('{} shreds the string!'.format(self.name))
def eat(self, food):
if food in ['cat food', 'fish', 'chicken']:
print('{} eats the {}'.format(self.name, food))
else:
print('{}: snif - snif - snif - nah...'.format(self.name))
def speak(self):
print('{}: Meow!'.format(self.name))
for a in (Dog('Rover'), Dog('Fido'), Cat('Fluffy'), Cat('Precious'), Dog('Rex'), Cat('Kittypie')):
a.speak()
# Rover: Bark! Bark!
# Fido: Bark! Bark!
# Fluffy: Meow!
# Precious: Meow!
# Rex: Bark! Bark!
# Kittypie: Meow!
A class method can be called through the instance or the class, and passes the class as the first argument. We use these methods to do class-wide work, such as counting instances or maintaining a table of variables available to all instances. A static method can be called through the instance or the class, but knows nothing about either. In this way it is like a regular function -- it takes no implicit argument. We can think of these as 'helper' functions that just do some utility work and don't need to involve either class or instance.
class MyClass:
def myfunc(self):
print("myfunc: arg is {}".format(self))
@classmethod
def myclassfunc(klass): # we spell it differently because 'class' will confuse the interpreter
print("myclassfunc: arg is {}".format(klass))
@staticmethod
def mystaticfunc():
print("mystaticfunc: (no arg)")
a = MyClass()
a.myfunc() # myfunc: arg is <__main__.MyClass instance at 0x6c210>
MyClass.myclassfunc() # myclassfunc: arg is __main__.MyClass
a.myclassfunc() # [ same ]
a.mystaticfunc() # mystaticfunc: (no arg)
Here is an example from Learning Python, which counts instances that are constructed:
class Spam:
numInstances = 0
def __init__(self):
Spam.numInstances += 1
@staticmethod
def printNumInstances():
print("instances created: ", Spam.numInstances)
s1 = Spam()
s2 = Spam()
s3 = Spam()
Spam.printNumInstances() # instances created: 3
s3.printNumInstances() # instances created: 3
Class methods are often used as class "Factories", producing customized objects based on preset values. Here's an example from the RealPython blog that uses a class method as a factory method to produce variations on a Pizza object:
class Pizza:
def __init__(self, ingredients):
self.ingredients = ingredients
def __repr__(self):
return f'Pizza({self.ingredients!r})'
@classmethod
def margherita(cls):
return cls(['mozzarella', 'tomatoes'])
@classmethod
def prosciutto(cls):
return cls(['mozzarella', 'tomatoes', 'ham'])
marg = Pizza.margherita()
print(marg.ingredients) # ['mozzarella', 'tomatoes']
schute = Pizza.prosciutto()
print(schute.ingredients) # ['mozzarella', 'tomatoes']