Python 3

home

Modules

Introduction: Modules

Modules are files that contain reusable Python code: we often refer to them as "libraries" because they contain code that can be used in other scripts. It is possible to import such library code directly into our programs through the import statement -- this simply means that the functions in the module are made available to our program. Modules consist principally of functions that do useful things and are grouped together by subject. Here are some examples:


So when we import a module in our program, we're simply making other Python code (in the form of functions) available to our own programs. In a sense we're creating an assemblage of Python code -- some written by us, some by other people -- and putting it together into a single program. The imported code doesn't literally become part of our script, but it is part of our program in the sense that our script can call it and use it. We can also define our own modules -- collections of Python functions and/or other variables that we would like to make available to our other Python programs. We can even prepare modules designed for others to use, if we feel they might be useful. In this way we can collaborate with other members of our team, or even the world, by using code written by others and by providing code for others to use.





Objectives for the Unit: Modules





Summary Statement: import modulename

Using import, we can import an entire Python module into our own code.


messages.py: a Python module that prints messages

import sys

def print_warning(msg):
    """write a message to STDOUT"""
    sys.stdout.write(f'warning:  {msg}\n')

def log_message(msg):
    """write a message to the log file"""
    try:
        fh = open('log.txt', 'a')
        fh.write(str(msg) + '\n')
    except FileNotFoundError:
        print_warning('log file not readable')

    fh.close()

test.py: a Python script that imports messages.py

#!/usr/bin/env python

import messages

print("test program running...")

messages.log_message('this is an important message')
messages.print_warning("I think we're in trouble.")

The global variables in the module become attributes of the module. The module's variables are accessible through the name of the module, as its attributes.





Summary statement: import modulename as convenientname

A module can be renamed at the point of import.


import pandas as pd
import datetime as dt

users = pd.read_table('myfile.data', sep=',', header=None)

print("yesterday's date:  {dt.date.today() - dt.timedelta(days=1)}")




Summary statement: from modulename import variablename

Individual variables can be imported by name from a module.


#!/usr/bin/env python

from messages import print_warning, log_message

print("test program running...")

log_message('this is an important message')
print_warning("I think we're in trouble.")




Summary: module search path

Python must be told where to find our own custom modules.


Python's standard module directories When it encounters an import, Python searches for the module in a selected list of standard module directories. It does not search through the entire filesystem for modules. Modules like sys and os are located in one of these standard directories. Our own custom module directories Modules that we create should be placed in one or more directories that we designate for this purpose. In order to let Python know about our own module directories, we have a couple of options: PYTHONPATH environment variable The standard approach to adding our own module directories to the list of those that Python searches is to create or modify the PYTHONPATH environment variable. This colon-separated list of paths indicates any paths to search in addition to the ones Python normally searches.





Setting the PYTHONPATH System Environment Variable

This is the standard way to extend the module search path.


The standard approach to allow Python to search for one of our modules in a directory of our choice is to set the PYTHONPATH environment variable. Then, anytime we run a Python script, this variable will be consulted and its directories added to the module search path. Here's how to set the PYTHONPATH environment variable: Windows

  1. In the search box, type env
  2. choose Edit the system environment variables
  3. in the open dialog window, click Environment Variables
  4. In the the top window (User variables) look for PYTHONPATH (you are not likely to see it).
  5. If you do not see PYTHONPATH:
    • Click New... and type PYTHONPATH in the Variable name: blank.
    • For Variable value: enter or browse to the directory where you would like to put your module files. This should be a full path starting with C:\.
    • When correctly entered, click OK.


  6. If you do see PYTHONPATH:
    • Select PYTHONPATH and click Edit....
    • After the existing path in Variable value: type a colon (:) and an additional path where you would like to put your module files. This should be a full path starting with C:\.
    • When correctly entered, click OK.
Mac / Linux / Unix
  1. Open a Finder window and make sure it is showing your home directory (this should be marked by a little house at the very top of the window). If it is not, find your home directory on the left nav bar.
  2. Use Cmd-Shift-. (period) to reveal hidden files in the folder window (hitting these keys again will hide them once more).
  3. Look for a file called .bash_profile.
  4. If you see an already-existing .bash_profile:
    • Open the file in a text editor.
    • Search for PYTHONPATH.
    • If you don't see PYTHONPATH, add a new line to the file: export PYTHONPATH=
    • If you see PYTHONPATH, add a colon (:) to the end of that line
    • At the end of the same line that starts with PYTHONPATH=, type or paste the path you wish to add. This should be a full path, starting with the root slash (/)
    • Make sure there are no spaces in any part of the line starting with PYTHONPATH=
  5. If you don't see .bash_profile:
    • Create a new text file in your editor and save it in your home directory with the name .bash_profile
    • Mac will warn you that you are creating a "system file" and that the file will be hidden. This is correct
    • Inside the file, type export PYTHONPATH=followed by the path you wish to add. This should be a full path, starting with the root slash (/)
    • Make sure there are no spaces in any part of the line starting with PYTHONPATH=





Sidebar: Manipulating sys.path

This list can be dynamically manipulated (although this is unusual.)


The sys.path variable (in the sys module) holds the list of paths that Python will search when importing modules. In a pinch, you can add to this list to allow Python to search directories in addition to the ones it usually looks.


manipulating sys.path

import sys

print(sys.path)

    # ['', '/Users/david/Dropbox/tech/lib',
    #  '/Users/david/Dropbox/tech/apps',
    #  '/Users/david/Dropbox/tech/apps/ta/app',
    #  '/Users/david/Dropbox/tech/apps/ta/ta/ta',
    #  '/Users/david/lib',
    #  '/Users/david/Dropbox/tech/apps/ta/app/lib',
    #  '/Users/david/robotrade/git/fintech/lib',
    #  '/Library/Frameworks/Python.framework/Versions/3.12/lib/python312.zip',
    #  '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12',
    #  '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload',
    #  '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages']

sys.path.append('/path/to/my/pylib')

import mymod     # if mymod.py is in /path/to/my/pylib, it will be found

Once a python script is running, Python makes the PYTHONPATH search path available in a list called sys.path. Since it is a list, it can be manipulated; you are free to add whatever paths you wish However, please note that this kind of manipulation is rare because the needed changes are customarily made to the PYTHONPATH environment variable.





Proper Code Organization

Core principles


Here are the main components of a properly formatted program:


""" tip_calculator.py -- calculate tip for a restaurant bill
    Author:  David Blaikie dbb212@nyu.edu
    Last modified:  9/19/2017
"""

import sys             # part of Python distribution (installed with Python)
import pandas as pd    # installed "3rd party" modules
import myownmod as mm  # "local" module (part of local codebase)


# constant message strings are not required to be placed
# here, but in professional programs they are kept
# separate from the logic, often in separate "config" files
MSG1 = 'A {}% tip (${}) was added to the bill, for a total of ${}.'
MSG2 = 'With {} in your party, each person must pay ${}.'


# sys.argv[0] is the program's pathname (e.g. /this/that/other.py)
# os.path.basename() returns just the program name (e.g. other.py)
USAGE_STRING = "Usage:  {os.path.basename(sys.argv[0])}   [total amount] [# in party] [tip percentage]


def usage(msg):
    """ print an error message, usage: string and exit

    Args:     msg (str):  an error message
    Returns:  None (exits from here)
    Raises:   N/A (does not explicitly raise an exception)

    """
    sys.stderr.write(f'Error:  {msg}')
    exit(USAGE_STRING)


def validate_normalize_input(args):
    """ verify command-line input

    Args:     N/A (reads from sys.argv)

    Returns:
        bill_amt (float):  the bill amount
        party_size (int):  the number of people
        tip_pct (float):   the percent tip to be applied, in 100’s

    Raises:  N/A (does not explicitly raise an exception)

    """
    if not len(sys.argv) == 4:
        usage('please enter all required arguments')

    try:
        bill_amt = float(sys.argv[1])
        party_size = int(sys.argv[2])
        tip_pct = float(sys.argv[3])
    except ValueError:
        usage('arguments must be numbers')

    return bill_amt, party_size, tip_pct


def perform_calculations(bill_amt, party_size, tip_pct):
    """
    calculate tip amount, total bill and person's share

    Args:
        bill_amount (float):  the total bill
        party_size (int):  the number in party
        tip_pct (float):  the tip percentage in 100’s

    Returns:
        tip_amt (float):  the tip in $
        total_bill (float):  the bill including tip
        person_share (float):  equal share of bill per person

    Raises:
        N/A (does not specifically raise an exception)
    """

    tip_amt = bill_amt * tip_pct * .01
    total_bill = bill_amt + tip_amt
    person_share = total_bill / party_size

    return tip_amt, total_bill, person_share


def report_results(pct, tip_amt, total_bill, size, person_share):
    """ print results in formatted strings

    Args:
        pct (float):  the tip percentage in 100’s
        tip_amt (float):  the tip in $
        total_bill (float):  the bill including tip
        size (int):  the party slize
        person_share (float):  equal share of bill per person
    Returns:
        None (prints result)

    Raises:
        N/A
    """

    print(MSG1.format(pct, tip_amt, total_bill))
    print(MSG2.format(size, person_share))


def main(args):
    """ execute script

    Args:     args (list):  the command-line arguments
    Returns:  None
    Raises:   N/A

    """

    bill, size, pct = validate_normalize_input(args)
    tip_amt, total_bill, person_share = perform_calculations(bill, size,
                                                             pct)

    report_results(pct, tip_amt, total_bill, size, person_share)


if __name__ == '__main__':            # 'main body' code

    main(sys.argv[1:])

The code inside the if __name__ == '__main__' block is intended to be the call that starts the program. If this Python script is imported, the main() function will not be called, because the if test will only be true if the script is executed, and will not be true if it is imported. We do this in order to allow the script's functions to be imported and used without actually running the script -- we may want to test the script's functions (unit testing) or make use of a function from the script in another program. Whether we intend to import a script or not, it is considered a "best practice" to build all of our programs in this way -- with a "main body" of statements collected under function main(), and the call to main() inside the if __name__ == '__main__' gate. This structure will be required for all assignments submitted for the remainder of the course.





Summary: writing informal testing code for library modules

If our code is meant only to have its functions imported, we can include testing code that runs if the module is executed directly.


""" calcutils.py:  calculation utility functions """

def doubleit(val):
    if not isinstance(val, (int, float)):
        raise TypeError('must be int or float')

    dval = val * 2
    return val


def halveit(val):
    if not isinstance(val, (int, float)):
        raise TypeError('must be int or float')

    hval = val / 2
    return hval


if __name__ == '__main__':

    assert doubleit(5) == 10, f'doubleit(5) == {doubleit(5)} (should be 10)'

    assert halveit(5) == 2.5, f'test failed:  halveit(5) == {halveit(5)} (should be 2.5)'

In cases where our module is not meant to be executed, we might not need an if __name__ == '__main__' block, which is used to execute code only if the module is being executed directly. A module like the above would usually only be imported, so this if test would usually return False. However, we may make use of the if test by placing code that tests our module's functions to see that they work properly. In the above we have written two tests to make sure that doubleit() and halveit() are working as expected. The assert statement features a test and an error message. The test checks to see if the test returns True -- if it does not, Python will raise an AssertionError alerting the user that the test failed. (In fact, if you run this code you'll see that one of the functions is flawed, and running the test alerts the user to the issue.) Please note that the above is only an informal way to run tests. When formal code testing is needed for a project, the tests are customarily saved in a different file, and modules like unittest or pytest are used to help facilitate the testing process.





Summary: raising exceptions

Causing an exception to be raised is the principal way a module signals an error to the importing script.


A file called mylib.py

def get_yearsum(user_year):

    user_year = int(user_year)
    if user_year < 1929 or user_year > 2013:
      raise ValueError(f'year {user_year} out of range')

    # calculate value for the year

    return 5.9         # returning a sample value (for testing purposes only)

An exception raised by us is indistinguishable from one raised by Python, and we can raise any exception type we wish. This allows the user of our function to handle the error if needed (rather than have the script fail):


import mylib

while True:

    year = input('please enter a year:  ')

    try:
        mysum = mylib.get_yearsum(year)
        break
    except ValueError:
        print('invalid year:  try again')

print('mysum is', mysum)




Summary: installing modules

Third-party modules must be downloaded and installed into our Python distribution.


Unix

$ sudo pip search pandas         # searches for pandas in the PyPI repository
$ sudo pip install pandas        # installs pandas

Installation on Unix requires something called root permissions, which are permissions that the Unix system administrator uses to make changes to the system. The below commands include sudo, which is a way to temporarily be granted root permissions.


Windows

C:\\Windows > pip search pandas   # searches for pandas in the PyPI repo
C:\\Windows > pip install pandas  # installs pandas

PyPI: the Python Package Index The Python Package Index at https://pypi.python.org/pypi is a repository of software for the Python programming language. There are more than 70,000 projects uploaded there, from serious modules used by millions of developers to half-baked ideas that someone decided to share prematurely. Usually, we encounter modules in the field -- shared through blog posts and articles, word of mouth and even other Python code. But the PPI can be used directly to try to find modules that support a particular purpose.





Summary: the Python standard distribution of modules

Modules included with Python are installed when Python is installed -- they are always available.


Python provides hundreds of supplementary modules to perform myriad tasks. The modules do not need to be installed because they come bundled in the Python distribution, that is they are installed at the time that Python itself is installed. The documentation for the standard library is part of the official Python docs.


Please see the Useful Modules slide deck for a selection of modules from the Standard Distribution as well as from external sources.





[pr]