Python 3home |
The Data Model specifies how objects, attributes, methods, etc. function and interact in the processing of data.
The Python Language Reference provides a clear introduction to Python's lexical analyzer, data model, execution model, and various statement types. This session covers the basics of Python's data model. Mastery of these concepts allows you to create objects that behave (i.e., using the same interface -- operators, looping, subscripting, etc.) like standard Python objects, as well as in becoming conversant on StackOverflow and other discussion sites.
All objects contain "private" attributes that may be methods that are indirectly called, or internal "meta" information for the object.
The __dict__ attribute shows any attributes stored in the object.
>>> list.__dict__.keys() ['__getslice__', '__getattribute__', 'pop', 'remove', '__rmul__', '__lt__', '__sizeof__', '__init__', 'count', 'index', '__delslice__', '__new__', '__contains__', 'append', '__doc__', '__len__', '__mul__', 'sort', '__ne__', '__getitem__', 'insert', '__setitem__', '__add__', '__gt__', '__eq__', 'reverse', 'extend', '__delitem__', '__reversed__', '__imul__', '__setslice__', '__iter__', '__iadd__', '__le__', '__repr__', '__hash__', '__ge__']
The dir() function will show the object's available attributes, including those available through inheritance.
>>> dir(list) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
In this case, dir(list) includes attributes not found in list.__dict__. What class(es) does list inherit from? We can use __bases__ to see:
>>> list.__bases__ (object,)
This is a tuple of classes from which list inherits - in this case, just the super object object.
>>> object.__dict__.keys() ['__setattr__', '__reduce_ex__', '__new__', '__reduce__', '__str__', '__format__', '__getattribute__', '__class__', '__delattr__', '__subclasshook__', '__repr__', '__hash__', '__sizeof__', '__doc__', '__init__']
Of course this means that any object that inherits from object will have the above attributes. Most if not all built-in objects inherit from object. (In Python 3, all classes inherit from object.) (Note that of course the term "private" in this context does not refer to unreachable data as would be used in C++ or Java.)
isinstance() | Checks to see if this object is an instance of a class (or parent class) |
issubclass() | Checks to see if this class is a subclass of another |
callable() | Checks to see if this object is callable |
hasattr() | Checks to see if this object has an attribute of this name |
setattr() | sets an attribute in an object (using a string name) |
getattr() | retrieves an attribute from an object (using a string name) |
delattr() | deletes an attribute from an object (using a string name) |
Provides convenient access to object attributes and internals.
inspect.getmembers() is similar to dir(), but instead of just a list of attribute names, it produces a list of 2-item tuples with the name as 1st item and the object as 2nd item:
import inspect
class Do():
classvar = 5
othervar = ['a', 'b', 'c']
lot = inspect.getmembers(Do) # pass the class itself to .getmembers()
for items in lot:
print(items)
The resulting list of tuples starts with attributes inherited from object, and ends with class variables defined in the class:
('__class__',) ('__delattr__', ) ('__dir__', ) ... continues ... ('__subclasshook__', ) ('__weakref__', ) ('classvar', 5) ('othervar', ['a', 'b', 'c'])
inspect.signature() produces an object that shows the arguments to a function or method:
import inspect
sig = inspect.signature(round)
print(sig) # '(number, ndigits=None)'
Printing the Signature object or converting it to string shows the arguments as they might appear in the function or method definition.
Many other inspecting functions are available, and the docs are pretty accessible:
https://docs.python.org/3/library/inspect.html
Some special attributes are methods, usually called implictly as the result of function calls, the use of operators, subscripting or slicing, etc.
We can replace any operator and many functions with the corresponding "magic" methods to achieve the same result:
var = 'hello'
var2 = 'world'
print(var + var2) # helloworld
print(var.__add__(var2)) # helloworld
print(len(var)) # 5
print(var.__len__()) # 5
if 'll' in var:
print('yes')
if var.__contains__('ll'):
print('yes')
Here is an example of a new class, Number, that reproduces the behavior of a number in that you can add, subtract, multiply, divide them with other numbers.
class Number(object):
def __init__(self, start):
self.data = start
def __sub__(self, other):
return Number(self.data - other)
def __add__(self, other):
return Number(self.data + other)
def __mul__(self, other):
return Number(self.data * other)
def __div__(self, other):
return Number(self.data / float(other))
def __repr__(self):
print("Number value: ", end=' ')
return str(self.data)
X = Number(5)
X = X - 2
print(X) # Number value: 3
Of course this means that existing built-in objects make use of these methods -- you can find them listed from the object's dir() listing.
__str__ is invoked when we print an object or convert it with str(); __repr__ is used when __str__ is not available, or when we view an object at the Python interpreter prompt.
class Number(object):
def __init__(self, start):
self.data = start
def __str__(self):
return str(self.data)
def __repr__(self):
return 'Number(%s)' % self.data
X = Number(5)
print(X) # 5 (uses __str__ -- without repr or str, would be <__main__.Y object at 0x105d61190>
__str__ is intended to display a human-readable version of the object; __repr__ is supposed to show a more "machine-faithful" representation.
Here is a short listing of attributes available in many of our standard objects.
You view see many of these methods as part of the attribute dictionary through dir(). There is also a more exhaustive exhaustive list with explanations provided by Rafe Kettler.
__init__ | object constructor |
__del__ | del x (invoked when reference count goes to 0) |
__new__ | special 'metaclass' constructor |
__repr__ | "under the hood" representation of object (in Python interpreter) |
__str__ | string representation (i.e., when printed or with str() |
__lt__ | < |
__le__ | <= |
__eq__ | == |
__ne__ | != |
__gt__ | > |
__ge__ | >= |
__nonzero__ | (bool(), i.e. when used in a boolean test) |
__call__ | when object is "called" (i.e., with ()) |
__len__ | handles len() function |
__getitem__ | subscript access (i.e. mylist[0] or mydict['mykey']) |
__missing__ | handles missing keys |
__setitem__ | handles dict[key] = value |
__delitem__ | handles del dict[key] |
__iter__ | handles looping |
__reversed__ | handles reverse() function |
__contains__ | handles 'in' operator |
__getslice__ | handles slice access |
__setslice__ | handles slice assignment |
__delslice__ | handles slice deletion |
__getattr__ | object.attr read: attribute may not exist |
__getattribute__ | object.attr read: attribute that already exists |
__setattr__ | object.attr write |
__delattr__ | object.attr deletion (i.e., del this.that) |
__get__ | when an attribute w/descriptor is read |
__set__ | when an attribute w/descriptor is written |
__delete__ | when an attribute w/descriptor is deleted with del |
__add__ | addition with + | ||
__sub__ | subtraction with - | __mul__ | multiplication with * |
__div__ | division with \/ | ||
__floordiv__ | "floor division", i.e. with // | ||
__mod__ | modulus |
The protocol specifies methods to be implemented to make our objects iterable.
"Iterable" simply means able to be looped over or otherwise treated as a sequence or collection. The for loop is the most obvious feature that iterates, however a great number of functions and other features perform iteration, including list comprehensions, max(), min(), sorted(), map(), filter(), etc., because each of these must consider every item in the collection.
We can make our own objects iterable by implementing __iter__ and next, and by raising the StopIteration exception
class Counter:
def __init__(self, low, high):
self.current = low
self.high = high
def __iter__(self):
return self
def __next__(self): # Python 3: def __next__(self)
if self.current > self.high:
raise StopIteration
else:
self.current += 1
return self.current - 1
for c in Counter(3, 8):
print(c)
The name, module, file, arguments, documentation, and other "meta" information for an object can be found in special attributes.
Below is a partial listing of special attributes; available attributes are discussed in more detail on the data model documentation page.
__doc__ | doc string |
__name__ | this function's name |
__module__ | module in which this func is defined |
__defaults__ | default arguments |
__code__ | the "compiled function body" of bytecode of this function. Code objects can be inspected with the inspect module and "disassembled" with the dis module. |
__globals__ | global variables available from this function |
__dict__ | attributes set in this function object by the user |
im_class | class for this method |
__self__ | instance object | __module__ | name of the module |
__dict__ | globals in this module |
__name__ | name of this module |
__doc__ | docstring |
__file__ | file this module is defined in |
__name__ | class name |
__module__ | module defined in |
__bases__ | classes this class inherits from |
__doc__ | docstring |
im_class | class |
im_self | this instance |
Underscores are used to designate variables as "private" or "special".
lower-case separated by underscores | my_nice_var | "public", intended to be exposed to users of the module and/or class |
underscore before the name | _my_private_var | "non-public", *not* intended for importers to access (additionally, "from modulename import *" doesn't import these names) |
double-underscore before the name | __dont_inherit | "private"; its name is "mangled", available only as _classname__dont_inherit |
double-underscores before and after the name | __magic_me__ | "magic" attribute or method, specific to Python's internal workings |
single underscore after the name | file_ | used to avoid overwriting built-in names (such as the file() function) |
class GetSet(object):
instance_count = 0
__mangled_name = 'no privacy!'
def __init__(self,value):
self._attrval = value
instance_count += 1
def getvar(self):
print('getting the "var" attribute')
return self._attrval
def setvar(self, value):
print('setting the "var" attribute')
self._attrval = value
cc = GetSet(5)
cc.var = 10
print(cc.var)
print(cc.instance_count)
print(cc._attrval) # "private", but available: 10
print(cc.__mangled_name) # "private", apparently not available...
print(cc._GetSet__mangled_name) # ...and yet, accessible through "mangled" name
cc.__newmagic__ = 10 # MAGICS ARE RESERVED BY PYTHON -- DON'T DO THIS
Inheriting from a class (the base or parent class) makes all methods and attributes available to the inheriting class (the child class).
class NewList(list): # an empty class - does nothing but inherit from list
pass
x = NewList([1, 2, 3, 'a', 'b'])
x.append('HEEYY')
print(x[0]) # 1
print(x[-1]) # 'HEEYY'
Overriding Base Class Methods
This class automatically returns a default value if a key can't be found -- it traps and works around the KeyError that would normally result.
class DefaultDict(dict):
def __init__(self, default=None):
dict.__init__(self)
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default
def get(self, key, userdefault):
if not userdefault:
userdefault = self.default
return dict.get(self, key, userdefault)
xx = DefaultDict()
xx['c'] = 5
print(xx['c']) # 5
print(xx['a']) # None
Since the other dict methods related to dict operations (__setitem__, extend(), keys(), etc.) are present in the dict class, any calls to them also work because of inheritance. WARNING! Avoiding method recursion Note the bolded statements in DefaultDict above (as well as MyList below) -- are calling methods in the parent in order to avoid infinite recursion. If we were to call DefaultDict.get() from inside DefaultDict.__getitem__(), Python would again call DefaultDict.__getitem__() in response, and an infinite loop of calls would result. We call this infinite recursion
The same is true for MyList.__getitem__() and MyList.__setitem__() below.
# from DefaultDict.__getitem__()
dict.get(self, key, userdefault) # why not self.get(key, userdefault)?
# from MyList.__getitem__()
return list.__getitem__(self, index) # why not self[index]?
# from MyList.__setitem__() # (from example below)
list.__setitem__(self, index, value) # why not self[index] = value?
Another example -- a custom list that indexes items starting at 0:
class MyList(list): # inherit from list
def __getitem__(self, index):
if index > 0: index = index - 1
return list.__getitem__(self, index) # this method is called when we access
# a value with subscript (x[1], etc.)
def __setitem__(self, index, value):
if index == 0: raise IndexError
if index > 0: index = index - 1
list.__setitem__(self, index, value)
x = MyList(['a', 'b', 'c']) # __init__() inherited from builtin list
print(x) # __repr__() inherited from builtin list
x.append('spam'); # append() inherited from builtin list
print(x[1]) # 'a' (MyList.__getitem__
# customizes list superclass method)
# index should be 0 but it is 1!
print(x[4]) # 'spam' (index should be 3 but it is 4!)
So MyList acts like a list in most respects, but its index starts at 0 instead of 1 (at least where subscripting is concerned -- other list methods would have to be overridden to complete this 1-indexing behavior).
A file is automatically closed upon exiting the 'with' block
A 'best practice' is to open files using a 'with' block. When execution leaves the block, the file is automatically closed.
with open('myfile.txt') as fh:
for line in fh:
print(line)
## at this point (outside the with block), filehandle fh has been closed.
The conventional approach:
fh = open('myfile.txt')
for line in fh:
print(line)
fh.close() # explicit close() of the file
Although open files do not block other processes from opening the same file, they do leave a small additional temporary file on the filesystem (called a file descriptor); if many files are left open (especially for long-running processes) these small additional files could accumulate. Therefore, files should be closed as soon as possible.
Any object definition can include a 'with' context; what the object does when leaving the block is determined in its design.
A 'with' context is implemented using the magic methods __enter__() and __exit()__.
class CustomWith:
def __init__(self):
""" when object is created """
print('new object')
def __enter__(self):
""" when 'with' block begins (normally same time as __init__()) """
print('entering "with"')
return self
def __exit__(self, exc_type, exc_value, exc_traceback):
""" when 'with' block is left """
print('leaving "with"')
# if an exception should occur inside the with block:
if exc_type:
print('oops, an exception')
raise exc_type(exc_value) # raising same exception (optional)
with CustomWith() as fh:
print('ok')
print('done')
__enter__() is called automatically when Python enters the with block. This is usually also when the object is created with __init__() although it is possible to create __exit__() is called automatically when Python exits the with block. If an exception occurs inside the with block, Python passes the exception object, any value passed to the exception (usually a string error message) and a traceback string ("Traceback (most recent call last):...") In our above program, if an exception occurred (if type has a value) we are choosing to re-raise the same exception. Your program can choose any action at that point.
Some implicit objects can provide information on code execution.
Traceback objects
Traceback objects become available during an exception. Here's an example of inspection of the exception type using sys.exc_info()
import sys, traceback
try:
some_code_i_wrote()
except BaseException as e:
error_type, error_string, error_tb = sys.exc_info()
if not error_type == SystemExit:
print('error type: {}'.format(error_type))
print('error string: {}'.format(error_string))
print('traceback: {}'.format(''.join(traceback.format_exception(error_type, e, error_tb))))
Code objects In CPython (the most common distribution), a code object is a piece of compiled bytecode. It is possible to query this object / examine its attributes in order to learn about bytecode execution. A detailed exploration of code objects can be found here. Frame objects A frame object represents an execution frame (a new frame is entered each time a function is called). They can be found in traceback objects (which trace frames during execution).
f_back | previous stack frame |
f_code | code object executed in this frame |
f_locals | local variable dictionary |
f_globals | global variable dictionary |
f_builtins | built-in variable dictionary |
For example, this line placed within a function prints the function name, which can be useful for debugging -- here we're pulling a frame, grabbing the code object of that frame, and reading the attribute co_name to read it.
import sys
def myfunc():
print('entering {}()'.format(sys._getframe().f_code.co_name ))
Calling this function, the frame object's function name is printed:
myfunc() # entering myfunc()