Python 3

home

Python Data Model

Python's Data Model: Overview

The Data Model specifies how objects, attributes, methods, etc. function and interact in the processing of data.


The Python Language Reference provides a clear introduction to Python's lexical analyzer, data model, execution model, and various statement types. This session covers the basics of Python's data model. Mastery of these concepts allows you to create objects that behave (i.e., using the same interface -- operators, looping, subscripting, etc.) like standard Python objects, as well as in becoming conversant on StackOverflow and other discussion sites.





Special / "Private" / "Magic" Attributes

All objects contain "private" attributes that may be methods that are indirectly called, or internal "meta" information for the object.


The __dict__ attribute shows any attributes stored in the object.

>>> list.__dict__.keys()
['__getslice__', '__getattribute__', 'pop', 'remove', '__rmul__', '__lt__', '__sizeof__',
 '__init__', 'count', 'index', '__delslice__', '__new__', '__contains__', 'append',
 '__doc__', '__len__', '__mul__', 'sort', '__ne__', '__getitem__', 'insert',
 '__setitem__', '__add__', '__gt__', '__eq__', 'reverse', 'extend', '__delitem__',
 '__reversed__', '__imul__', '__setslice__', '__iter__', '__iadd__', '__le__', '__repr__',
 '__hash__', '__ge__']

The dir() function will show the object's available attributes, including those available through inheritance.

>>> dir(list)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__',
 '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
 '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__',
 '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__',
 '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__',
 '__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count',
 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

In this case, dir(list) includes attributes not found in list.__dict__. What class(es) does list inherit from? We can use __bases__ to see:

>>> list.__bases__
(object,)

This is a tuple of classes from which list inherits - in this case, just the super object object.


>>> object.__dict__.keys()
['__setattr__', '__reduce_ex__', '__new__', '__reduce__', '__str__', '__format__',
 '__getattribute__', '__class__', '__delattr__', '__subclasshook__', '__repr__',
 '__hash__', '__sizeof__', '__doc__', '__init__']

Of course this means that any object that inherits from object will have the above attributes. Most if not all built-in objects inherit from object. (In Python 3, all classes inherit from object.) (Note that of course the term "private" in this context does not refer to unreachable data as would be used in C++ or Java.)





Object Inspection And Modification Built-in Functions

Object Inspection

isinstance()Checks to see if this object is an instance of a class (or parent class)
issubclass()Checks to see if this class is a subclass of another
callable()Checks to see if this object is callable
hasattr()Checks to see if this object has an attribute of this name


Object Attribute Modification

setattr()sets an attribute in an object (using a string name)
getattr()retrieves an attribute from an object (using a string name)
delattr()deletes an attribute from an object (using a string name)





The inspect module

Provides convenient access to object attributes and internals.


inspect.getmembers() is similar to dir(), but instead of just a list of attribute names, it produces a list of 2-item tuples with the name as 1st item and the object as 2nd item:

import inspect

class Do():
    classvar = 5
    othervar = ['a', 'b', 'c']

lot = inspect.getmembers(Do)      # pass the class itself to .getmembers()

for items in lot:
    print(items)

The resulting list of tuples starts with attributes inherited from object, and ends with class variables defined in the class:

('__class__', )
('__delattr__', )
('__dir__', )

... continues ...

('__subclasshook__', )
('__weakref__', )
('classvar', 5)
('othervar', ['a', 'b', 'c'])

inspect.signature() produces an object that shows the arguments to a function or method:

import inspect

sig = inspect.signature(round)

print(sig)             # '(number, ndigits=None)'

Printing the Signature object or converting it to string shows the arguments as they might appear in the function or method definition.


Many other inspecting functions are available, and the docs are pretty accessible:

https://docs.python.org/3/library/inspect.html




Special Attributes: "operator overloading"

Some special attributes are methods, usually called implictly as the result of function calls, the use of operators, subscripting or slicing, etc.


We can replace any operator and many functions with the corresponding "magic" methods to achieve the same result:

var = 'hello'
var2 = 'world'

print(var + var2)         # helloworld
print(var.__add__(var2))  # helloworld

print(len(var))           # 5
print(var.__len__())      # 5

if 'll' in var:
    print('yes')

if var.__contains__('ll'):
    print('yes')

Here is an example of a new class, Number, that reproduces the behavior of a number in that you can add, subtract, multiply, divide them with other numbers.

class Number(object):
    def __init__(self, start):
        self.data = start
    def __sub__(self, other):
        return Number(self.data - other)
    def __add__(self, other):
        return Number(self.data + other)
    def __mul__(self, other):
        return Number(self.data * other)
    def __div__(self, other):
        return Number(self.data / float(other))
    def __repr__(self):
        print("Number value: ", end=' ')
    return str(self.data)

X = Number(5)
X = X - 2
print(X)               # Number value: 3

Of course this means that existing built-in objects make use of these methods -- you can find them listed from the object's dir() listing.





Special Attributes: Reimplementing __repr__ and __str__

__str__ is invoked when we print an object or convert it with str(); __repr__ is used when __str__ is not available, or when we view an object at the Python interpreter prompt.


class Number(object):
    def __init__(self, start):
        self.data = start
    def __str__(self):
        return str(self.data)
    def __repr__(self):
        return 'Number(%s)' % self.data

X = Number(5)
print(X)          # 5  (uses __str__ -- without repr or str, would be <__main__.Y object at 0x105d61190>

__str__ is intended to display a human-readable version of the object; __repr__ is supposed to show a more "machine-faithful" representation.





Special attributes available in class design

Here is a short listing of attributes available in many of our standard objects.


You view see many of these methods as part of the attribute dictionary through dir(). There is also a more exhaustive exhaustive list with explanations provided by Rafe Kettler.

object construction and destruction:

__init__object constructor
__del__del x (invoked when reference count goes to 0)
__new__special 'metaclass' constructor


object rendering:

__repr__"under the hood" representation of object (in Python interpreter)
__str__string representation (i.e., when printed or with str()

object comparisons:

__lt__<
__le__<=
__eq__==
__ne__!=
__gt__>
__ge__>=
__nonzero__(bool(), i.e. when used in a boolean test)

calling object as a function:

__call__when object is "called" (i.e., with ())

container operations:

__len__handles len() function
__getitem__subscript access (i.e. mylist[0] or mydict['mykey'])
__missing__handles missing keys
__setitem__handles dict[key] = value
__delitem__handles del dict[key]
__iter__handles looping
__reversed__handles reverse() function
__contains__handles 'in' operator
__getslice__handles slice access
__setslice__handles slice assignment
__delslice__handles slice deletion

attribute access (discussed in upcoming session):

__getattr__object.attr read: attribute may not exist
__getattribute__object.attr read: attribute that already exists
__setattr__object.attr write
__delattr__object.attr deletion (i.e., del this.that)

'descriptor' class methods (discussed in upcoming session)

__get__when an attribute w/descriptor is read
__set__when an attribute w/descriptor is written
__delete__when an attribute w/descriptor is deleted with del

numeric types:

__add__addition with +
__sub__subtraction with - __mul__multiplication with *
__div__division with \/
__floordiv__"floor division", i.e. with //
__mod__modulus





Iterator Protocol

The protocol specifies methods to be implemented to make our objects iterable.


"Iterable" simply means able to be looped over or otherwise treated as a sequence or collection. The for loop is the most obvious feature that iterates, however a great number of functions and other features perform iteration, including list comprehensions, max(), min(), sorted(), map(), filter(), etc., because each of these must consider every item in the collection.


We can make our own objects iterable by implementing __iter__ and next, and by raising the StopIteration exception

class Counter:
    def __init__(self, low, high):
        self.current = low
        self.high = high

    def __iter__(self):
        return self

    def __next__(self):                   # Python 3: def __next__(self)
        if self.current > self.high:
            raise StopIteration
        else:
            self.current += 1
            return self.current - 1


for c in Counter(3, 8):
    print(c)




"Introspection" Special Attributes

The name, module, file, arguments, documentation, and other "meta" information for an object can be found in special attributes.


Below is a partial listing of special attributes; available attributes are discussed in more detail on the data model documentation page.

user-defined functions

__doc__doc string
__name__this function's name
__module__module in which this func is defined
__defaults__default arguments
__code__the "compiled function body" of bytecode of this function. Code objects can be inspected with the inspect module and "disassembled" with the dis module.
__globals__global variables available from this function
__dict__attributes set in this function object by the user


user-defined methods

im_classclass for this method
__self__instance object
__module__name of the module

modules

__dict__globals in this module
__name__name of this module
__doc__docstring
__file__file this module is defined in

classes

__name__class name
__module__module defined in
__bases__classes this class inherits from
__doc__docstring

class instances (objects)

im_classclass
im_selfthis instance





Variable Naming Conventions

Underscores are used to designate variables as "private" or "special".


lower-case separated by underscoresmy_nice_var"public", intended to be exposed to users of the module and/or class
underscore before the name_my_private_var"non-public", *not* intended for importers to access (additionally, "from modulename import *" doesn't import these names)
double-underscore before the name__dont_inherit"private"; its name is "mangled", available only as _classname__dont_inherit
double-underscores before and after the name __magic_me__"magic" attribute or method, specific to Python's internal workings
single underscore after the name file_used to avoid overwriting built-in names (such as the file() function)


class GetSet(object):

    instance_count = 0

    __mangled_name = 'no privacy!'

    def __init__(self,value):
        self._attrval = value
        instance_count += 1

    def getvar(self):
        print('getting the "var" attribute')
        return self._attrval

    def setvar(self, value):
        print('setting the "var" attribute')
        self._attrval = value

cc = GetSet(5)
cc.var = 10
print(cc.var)
print(cc.instance_count)

print(cc._attrval)                 # "private", but available:  10
print(cc.__mangled_name)           # "private", apparently not available...
print(cc._GetSet__mangled_name)    # ...and yet, accessible through "mangled" name

cc.__newmagic__ = 10              # MAGICS ARE RESERVED BY PYTHON -- DON'T DO THIS




Subclassing Builtin Objects

Inheriting from a class (the base or parent class) makes all methods and attributes available to the inheriting class (the child class).


class NewList(list):     # an empty class - does nothing but inherit from list
    pass

x = NewList([1, 2, 3, 'a', 'b'])
x.append('HEEYY')

print(x[0])   # 1
print(x[-1])  # 'HEEYY'

Overriding Base Class Methods


This class automatically returns a default value if a key can't be found -- it traps and works around the KeyError that would normally result.

class DefaultDict(dict):

    def __init__(self, default=None):
        dict.__init__(self)
        self.default = default

    def __getitem__(self, key):
        try:
            return dict.__getitem__(self, key)
        except KeyError:
            return self.default
    def get(self, key, userdefault):
        if not userdefault:
            userdefault = self.default
        return dict.get(self, key, userdefault)

xx = DefaultDict()

xx['c'] = 5

print(xx['c'])          # 5
print(xx['a'])          # None

Since the other dict methods related to dict operations (__setitem__, extend(), keys(), etc.) are present in the dict class, any calls to them also work because of inheritance. WARNING! Avoiding method recursion Note the bolded statements in DefaultDict above (as well as MyList below) -- are calling methods in the parent in order to avoid infinite recursion. If we were to call DefaultDict.get() from inside DefaultDict.__getitem__(), Python would again call DefaultDict.__getitem__() in response, and an infinite loop of calls would result. We call this infinite recursion


The same is true for MyList.__getitem__() and MyList.__setitem__() below.

    # from DefaultDict.__getitem__()
    dict.get(self, key, userdefault)       # why not self.get(key, userdefault)?

    # from MyList.__getitem__()
    return list.__getitem__(self, index)   # why not self[index]?

    # from MyList.__setitem__()                   # (from example below)
    list.__setitem__(self, index, value)   # why not self[index] = value?

Another example -- a custom list that indexes items starting at 0:

class MyList(list):         # inherit from list
    def __getitem__(self, index):
        if index > 0: index = index - 1
        return list.__getitem__(self, index)  # this method is called when we access
                                                 # a value with subscript (x[1], etc.)
    def __setitem__(self, index, value):
        if index == 0:  raise IndexError
        if index > 0: index = index - 1
        list.__setitem__(self, index, value)

x = MyList(['a', 'b', 'c'])  # __init__() inherited from builtin list

print(x)                      # __repr__() inherited from builtin list

x.append('spam');            # append() inherited from builtin list

print(x[1])                   # 'a' (MyList.__getitem__
                             #      customizes list superclass method)
                             # index should be 0 but it is 1!

print(x[4])                   # 'spam' (index should be 3 but it is 4!)

So MyList acts like a list in most respects, but its index starts at 0 instead of 1 (at least where subscripting is concerned -- other list methods would have to be overridden to complete this 1-indexing behavior).





Reading a file with 'with'

A file is automatically closed upon exiting the 'with' block


A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.

with open('myfile.txt') as fh:
    for line in fh:
        print(line)

## at this point (outside the with block), filehandle fh has been closed.

The conventional approach:

fh = open('myfile.txt')
for line in fh:
    print(line)

fh.close()        # explicit close() of the file

Although open files do not block other processes from opening the same file, they do leave a small additional temporary file on the filesystem (called a file descriptor); if many files are left open (especially for long-running processes) these small additional files could accumulate. Therefore, files should be closed as soon as possible.





Implementing a 'with' context

Any object definition can include a 'with' context; what the object does when leaving the block is determined in its design.


A 'with' context is implemented using the magic methods __enter__() and __exit()__.

class CustomWith:
    def __init__(self):
        """ when object is created """
        print('new object')

    def __enter__(self):
        """ when 'with' block begins (normally same time as __init__()) """
        print('entering "with"')
        return self

    def __exit__(self, exc_type, exc_value, exc_traceback):
        """ when 'with' block is left """
        print('leaving "with"')

        # if an exception should occur inside the with block:
        if exc_type:
            print('oops, an exception')
            raise exc_type(exc_value)     # raising same exception (optional)

with CustomWith() as fh:
    print('ok')

print('done')

__enter__() is called automatically when Python enters the with block. This is usually also when the object is created with __init__() although it is possible to create __exit__() is called automatically when Python exits the with block. If an exception occurs inside the with block, Python passes the exception object, any value passed to the exception (usually a string error message) and a traceback string ("Traceback (most recent call last):...") In our above program, if an exception occurred (if type has a value) we are choosing to re-raise the same exception. Your program can choose any action at that point.





Internal Types

Some implicit objects can provide information on code execution.


Traceback objects


Traceback objects become available during an exception. Here's an example of inspection of the exception type using sys.exc_info()

import sys, traceback
try:
    some_code_i_wrote()
except BaseException as e:
    error_type, error_string, error_tb =  sys.exc_info()
    if not error_type == SystemExit:
        print('error type:    {}'.format(error_type))
        print('error string:  {}'.format(error_string))
        print('traceback:     {}'.format(''.join(traceback.format_exception(error_type, e, error_tb))))

Code objects In CPython (the most common distribution), a code object is a piece of compiled bytecode. It is possible to query this object / examine its attributes in order to learn about bytecode execution. A detailed exploration of code objects can be found here. Frame objects A frame object represents an execution frame (a new frame is entered each time a function is called). They can be found in traceback objects (which trace frames during execution).

f_backprevious stack frame
f_codecode object executed in this frame
f_localslocal variable dictionary
f_globalsglobal variable dictionary
f_builtinsbuilt-in variable dictionary


For example, this line placed within a function prints the function name, which can be useful for debugging -- here we're pulling a frame, grabbing the code object of that frame, and reading the attribute co_name to read it.

import sys

def myfunc():
    print('entering {}()'.format(sys._getframe().f_code.co_name ))

Calling this function, the frame object's function name is printed:

myfunc()         # entering myfunc()




[pr]