Python 3

home

Benchmarking and Efficiency

Efficiency: Introduction

Runtime Efficiency refers to two things: memory efficiency (whether a lot of RAM memory is being used up during a process) and time efficiency (how long execution takes). And these are often related -- it takes time to allocate memory. As a "scripting" language, Python is more convenient, but less efficient than "programming" languages like C and Java: * Parsing, compilation and execution take place during runtime (C and Java are compiled ahead of time) * Memory is allocated based on anticipation of what your code will do at runtime (C in particular requires the developer to indicate what memory will be needed) * Python handles expanded memory requests seamlessly -- "no visible limits" (C and Java make use of "finite" resources, they do not expand indefinitely) Achieving runtime efficiency requires a tradeoff with required development time -- so we either spend more of our own (developer) time making our programs more efficient so they run faster and use less memory, or we spend less time developing our programs, and allow them to run slower (as Python handles memory allocation for us). Of course just the choice of a convenient scripting language (like Python) over a more efficient programming language (like Java or C++) itself favors rapid development, ease of use, etc. over runtime efficiency: in many applications, efficiency is not a consideration because there's plenty of memory, and enough time to get the job done. Nevertheless, advanced Python developers may be asked to increase the efficiency (faster or using less memory) of their programs -- possibly because the data has grown past anticipated limits, the program's responsibilities and complexity has been extended, or an unknown inefficiency is bogging down execution. In this section we'll discuss the more efficient container structures and ways to analyze the speed of the various units in our programs. Collections: high performance container datatypes * array: type-specific list * deque: "double-ended queue" * Counter: a counting dictionary * defaultdict: a dict with automatic default for missing keys timeit: unit timer to compare time efficiency of various Python algorithms cProfile: overall time profile of a Python program

Benchmarking a Python function with timeit

The timeit module provides a simple way to time blocks of Python code.

We use timeit to help decide whether varying ways of accomplishing a task might make our programs more efficient. Here we compare execution time of four approaches to joining a range of integers into a very large string ("1-2-3-4-5...", etc.)

from timeit import timeit

# 'straight concatenation' approach
def joinem():
    x = '1'
    for num in range(2, 101):
        x = x + '-' + str(num)
    return x

print(timeit('joinem()', setup='from __main__ import joinem', number=10000))

# 0.457356929779             # setup= is discussed below


# generator comprehension
print(timeit('"-".join(str(n) for n in range(100))', number=10000))

# 0.338698863983


# list comprehension
print(timeit('"-".join([str(n) for n in range(100)])', number=10000))

# 0.323472976685


# map() function
print(timeit('"-".join(map(str, range(100)))', number=10000))

# 0.160399913788

Here map() appears to be fastest, probably because built-in functions are compiled in C. Repeating a test You can conveniently repeat a test multiple times by calling a method on the object returned from timeit(). Repetitions give you a much better idea of the time a function might take by averaging several.

from timeit import repeat

print(repeat('"-".join(map(str, range(100)))', number=10000, repeat=3))

# [0.15206599235534668, 0.1909959316253662, 0.2175769805908203]


print(repeat('"-".join([str(n) for n in range(100)])', number=10000, repeat=3))

# [0.35890698432922363, 0.327725887298584, 0.3285980224609375]


print(repeat('"-".join(map(str, range(100)))', number=10000, repeat=3))

# [0.14228010177612305, 0.14016509056091309, 0.14458298683166504]

setup= parameter for setup before a test Some tests make use of a variable that must be initialized before the test:

print(timeit('x.append(5)', setup='x = []', number=10000))

# 0.00238704681396

Additionally, timeit() does not share the program's global namespace, so imports and even global variables must be imported if required by the test:

print(timeit('x.append(5)', setup='import collections as cs; x = cs.deque()', number=10000))

# 0.00115013122559

Here we're testing a function, which as a global needs to be imported from the __main__ namespace:

def testme(maxlim):
    return [ x*2 for x in range(maxlim) ]

print(timeit('testme(5000)', setup='from __main__ import testme', number=10000))

# 10.2637062073

Keep in mind that a function tested in isolation may not return the same results as a function using a different dataset, or a function that is run as part of a larger program (that has allocated memory differently at the point of the function's execution). The cProfile module can test overall program execution.

array

The array is a type-specific list.

The array container provides a list of a uniform type. An array's type must be specified at initialization. A uniform type makes an array more efficient than a list, which can contain any type.

from array import array

myarray = array('i', [1, 2])

myarray.append(3)

print(myarray)           # array('i', [1, 2, 3])

print(myarray[-1])       # acts like a list
for val in myarray:
    print(val)

myarray.append(1.3)     # error

Available array types:

Type code	C Type	Python Type	Minimum size in bytes
'c'	char	character	1
'b'	signed char	int	1
'B'	unsigned char	int	1
'u'	Py_UNICODE	Unicode character	2
'h'	signed short	int	2
'H'	unsigned short	int	2
'i'	signed int	int	2
'I'	unsigned int	long	2
'l'	signed long	int	4
'L'	unsigned long	long	4
'f'	float	float	4
'd'	double	float	8

Collections: deque

A "double-ended queue" provides fast adds/removals.

The collections module provides a variety of specialized container types. These containers behave in a manner similer to the builtin ones with which we are familiar, but with additional functionality based around enhancing convenience and efficiency. lists are optimized for fixed-length operations, i.e., things like sorting, checking for membership, index access, etc. They are not optimized for appends, although this is of course a common use for them. A deque is designed specifically for fast adds -- to the beginning or end of the sequence:

from collections import deque

x = deque([1, 2, 3])

x.append(4)               # x now [1, 2, 3, 4]
x.appendleft(0)           # x now [0, 1, 2, 3, 4]

popped = x.pop()          # removes '4' from the end

popped2 = x.popleft()     # removes '1' from the start

A deque can also be sized, in which case appends will push existing elements off of the ends:

x = deque(['a', 'b', 'c'], 3)      # maximum size:  3
x.append(99)                       # now: deque(['b', 'c', 99])  ('a' was pushed off of the start)
x.appendleft(0)                    # now: deque([0, 'b', 'c'])   (99 was pushed off of the end)

Collections: Counter

Counter provides a counting dictionary.

This structure inherits from dict and is designed to allow an integer count as well as a default 0 value for new keys. So instead of doing this:

c = {}
if 'a' not in c:
    c['a'] = 0
else:
    c['a'] = c['a'] + 1

We can do this:

from collections import Counter

c = Counter()
c['a'] = c['a'] + 1

Counter also has related functions that return a list of its keys repeated that many times, as well as a list of tuples ordered by frequency:

from collections import Counter

c = Counter({'a': 2, 'b': 1, 'c': 3, 'd': 1})

for key in c.elements():
    print(key, end=' ')            # c c c a a b b

print(','.join(c.elements()))   # c,c,c,a,a,b,b


print(c.most_common(2))   # [('c', 3), ('a', 2)]
                         # 2 arg says "give me the 2 most common"

c.clear()                # set all counts to 0 (but not remove the keys)

And, you can use Counter's implementation of the math operators to work with multiple counters and have them sum their values:

c = Counter({'a': 1, 'b': 2})
d = Counter({'a': 10, 'b': 20})

print(c + d)                     # Counter({'b': 22, 'a': 11})

Collections: defaultdict

defaultdict is a dict that provides a default object for new keys.

Similar to Counter, defaultdict allows for a default value if a key doesn't exist; but it will accept a function that provides a default value.

A defaultdict with a default list value for each key

from collections import defaultdict

ddict = defaultdict(list)

ddict['a'].append(1)
ddict['b']

print(ddict)                    # defaultdict(<type 'list'>, {'a': [1], 'b': []})

A defaultdict with a default dict value for each key

ddict = defaultdict(dict)

print(ddict['a'])         # {}    (key/value is created, assigned to 'a')

print(list(ddict.keys()))       # dict_keys(['a'])

ddict['a']['Z'] = 5
ddict['b']['Z'] = 5
ddict['b']['Y'] = 10

      # defaultdict(<class 'dict'>, {'a': {'Z': 5}, 'b': {'Z': 5, 'Y': 10}})

Profiling a Python program with cProfile

The profiler runs an entire script and times each unit (call to a function).

If a script is running slowly it can be difficult to identify the bottleneck. timeit() may not be adequate as it times functions in isolation, and not usually with "live" data. This test program (ptest.py) deliberately pauses so that some functions run slower than others:

import time

def fast():
    print("I run fast!")


def slow():
    time.sleep(3)
    print("I run slow!")


def medium():
    time.sleep(0.5)
    print("I run a little slowly...")


def main():
    fast()
    slow()
    medium()

if __name__ == '__main__':
    main()

We can profile this code thusly:

>>> import cProfile
>>> import ptest
>>> cProfile.run('ptest.main()')
I run fast!
I run slow!
I run a little slowly...
         8 function calls in 3.500 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.500    3.500 :1()
        1    0.000    0.000    0.500    0.500 ptest.py:15(medium)
        1    0.000    0.000    3.500    3.500 ptest.py:21(main)
        1    0.000    0.000    0.000    0.000 ptest.py:4(fast)
        1    0.000    0.000    3.000    3.000 ptest.py:9(slow)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    3.499    1.750    3.499    1.750 {time.sleep}

According to these results, the slow() and main() functions are the biggest time users. The overall execution of the module itself is also shown. Comparing our code to the results we can see that main() is slow only because it calls slow(), so we can then focus on the obvious culprit, slow(). It's also possible to insider profiling in our script around particular function calls so we can focus our analysis.

profile = cProfile.Profile()
profile.enable()
main()                         # or whatever function calls we'd prefer to focus on
profile.disable()

Command-line interface to cProfile

python -m cProfile -o output.bin ptest.py

The -m flag on any Python invocation can import a module automatically. -o directs the output to a file. The result is a binary file that can be analyzed using the pstats module (which we see results in largely the same output as run():

>>> import pstats
>>> p = pstats.Stats('output.bin')
>>> p.strip_dirs().sort_stats(-1).print_stats()
Thu Mar 20 18:32:16 2014    output.bin

         8 function calls in 3.501 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.501    3.501 ptest.py:1()
        1    0.001    0.001    0.500    0.500 ptest.py:15(medium)
        1    0.000    0.000    3.501    3.501 ptest.py:21(main)
        1    0.001    0.001    0.001    0.001 ptest.py:4(fast)
        1    0.001    0.001    3.000    3.000 ptest.py:9(slow)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    3.499    1.750    3.499    1.750 {time.sleep}


<pstats.Stats instance at 0x017C9030>

Caveat: don't optimize prematurely We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- Donald Knuth Common wisdom suggests that optimization should happen only once the code has reached a working, clear-to-finalized state. If you think about optimization too soon, you may do work that has to be undone later; or your optimizations may themselves get undone as you complete the functionality of your code. Note: some of these examples taken from the "Mouse vs. Python" blog.

Python Enhancement Packages

These packages provide varying approaches toward writing and running more efficient Python code.

•	PyPy: a "Just in Time" compiler for Python -- can speed up almost any Python code.
•	Cython: superset of the Python language that additionally supports calling of C functions and declaring C types -- good for building Python modules in C.
•	Pyrex: compiler that lets you combine Python code with C data types, compiling your code into a C extension for Python.
•	Weave: allows embedding of C code within Python code.
•	Shed Skin: an experimental module that can translate Python code into optimized C++.
While PyPy is a no-brainer for speeding up code, the other libraries listed here require a knowledge of C. The deepest analysis of Python will incorporate efficient C code and/or take into account its underlying C implementation. Python is written in C, and operations that we invoke in Python translate to actions being taken by the compiled C code. The most advanced Python developer will have a working knowledge in C, and study the C structures that Python employs.

[pr]