Python 3home |
Runtime Efficiency refers to two things: memory efficiency (whether a lot of RAM memory is being used up during a process) and time efficiency (how long execution takes). And these are often related -- it takes time to allocate memory. As a "scripting" language, Python is more convenient, but less efficient than "programming" languages like C and Java: * Parsing, compilation and execution take place during runtime (C and Java are compiled ahead of time) * Memory is allocated based on anticipation of what your code will do at runtime (C in particular requires the developer to indicate what memory will be needed) * Python handles expanded memory requests seamlessly -- "no visible limits" (C and Java make use of "finite" resources, they do not expand indefinitely) Achieving runtime efficiency requires a tradeoff with required development time -- so we either spend more of our own (developer) time making our programs more efficient so they run faster and use less memory, or we spend less time developing our programs, and allow them to run slower (as Python handles memory allocation for us). Of course just the choice of a convenient scripting language (like Python) over a more efficient programming language (like Java or C++) itself favors rapid development, ease of use, etc. over runtime efficiency: in many applications, efficiency is not a consideration because there's plenty of memory, and enough time to get the job done. Nevertheless, advanced Python developers may be asked to increase the efficiency (faster or using less memory) of their programs -- possibly because the data has grown past anticipated limits, the program's responsibilities and complexity has been extended, or an unknown inefficiency is bogging down execution. In this section we'll discuss the more efficient container structures and ways to analyze the speed of the various units in our programs. Collections: high performance container datatypes * array: type-specific list * deque: "double-ended queue" * Counter: a counting dictionary * defaultdict: a dict with automatic default for missing keys timeit: unit timer to compare time efficiency of various Python algorithms cProfile: overall time profile of a Python program
The timeit module provides a simple way to time blocks of Python code.
We use timeit to help decide whether varying ways of accomplishing a task might make our programs more efficient. Here we compare execution time of four approaches to joining a range of integers into a very large string ("1-2-3-4-5...", etc.)
from timeit import timeit
# 'straight concatenation' approach
def joinem():
x = '1'
for num in range(2, 101):
x = x + '-' + str(num)
return x
print(timeit('joinem()', setup='from __main__ import joinem', number=10000))
# 0.457356929779 # setup= is discussed below
# generator comprehension
print(timeit('"-".join(str(n) for n in range(100))', number=10000))
# 0.338698863983
# list comprehension
print(timeit('"-".join([str(n) for n in range(100)])', number=10000))
# 0.323472976685
# map() function
print(timeit('"-".join(map(str, range(100)))', number=10000))
# 0.160399913788
Here map() appears to be fastest, probably because built-in functions are compiled in C. Repeating a test You can conveniently repeat a test multiple times by calling a method on the object returned from timeit(). Repetitions give you a much better idea of the time a function might take by averaging several.
from timeit import repeat
print(repeat('"-".join(map(str, range(100)))', number=10000, repeat=3))
# [0.15206599235534668, 0.1909959316253662, 0.2175769805908203]
print(repeat('"-".join([str(n) for n in range(100)])', number=10000, repeat=3))
# [0.35890698432922363, 0.327725887298584, 0.3285980224609375]
print(repeat('"-".join(map(str, range(100)))', number=10000, repeat=3))
# [0.14228010177612305, 0.14016509056091309, 0.14458298683166504]
setup= parameter for setup before a test Some tests make use of a variable that must be initialized before the test:
print(timeit('x.append(5)', setup='x = []', number=10000))
# 0.00238704681396
Additionally, timeit() does not share the program's global namespace, so imports and even global variables must be imported if required by the test:
print(timeit('x.append(5)', setup='import collections as cs; x = cs.deque()', number=10000))
# 0.00115013122559
Here we're testing a function, which as a global needs to be imported from the __main__ namespace:
def testme(maxlim):
return [ x*2 for x in range(maxlim) ]
print(timeit('testme(5000)', setup='from __main__ import testme', number=10000))
# 10.2637062073
Keep in mind that a function tested in isolation may not return the same results as a function using a different dataset, or a function that is run as part of a larger program (that has allocated memory differently at the point of the function's execution). The cProfile module can test overall program execution.
The array is a type-specific list.
The array container provides a list of a uniform type. An array's type must be specified at initialization. A uniform type makes an array more efficient than a list, which can contain any type.
from array import array
myarray = array('i', [1, 2])
myarray.append(3)
print(myarray) # array('i', [1, 2, 3])
print(myarray[-1]) # acts like a list
for val in myarray:
print(val)
myarray.append(1.3) # error
Available array types:
Type code | C Type | Python Type | Minimum size in bytes |
---|---|---|---|
'c' | char | character | 1 |
'b' | signed char | int | 1 |
'B' | unsigned char | int | 1 |
'u' | Py_UNICODE | Unicode character | 2 |
'h' | signed short | int | 2 |
'H' | unsigned short | int | 2 |
'i' | signed int | int | 2 |
'I' | unsigned int | long | 2 |
'l' | signed long | int | 4 |
'L' | unsigned long | long | 4 |
'f' | float | float | 4 |
'd' | double | float | 8 |
A "double-ended queue" provides fast adds/removals.
The collections module provides a variety of specialized container types. These containers behave in a manner similer to the builtin ones with which we are familiar, but with additional functionality based around enhancing convenience and efficiency. lists are optimized for fixed-length operations, i.e., things like sorting, checking for membership, index access, etc. They are not optimized for appends, although this is of course a common use for them. A deque is designed specifically for fast adds -- to the beginning or end of the sequence:
from collections import deque
x = deque([1, 2, 3])
x.append(4) # x now [1, 2, 3, 4]
x.appendleft(0) # x now [0, 1, 2, 3, 4]
popped = x.pop() # removes '4' from the end
popped2 = x.popleft() # removes '1' from the start
A deque can also be sized, in which case appends will push existing elements off of the ends:
x = deque(['a', 'b', 'c'], 3) # maximum size: 3
x.append(99) # now: deque(['b', 'c', 99]) ('a' was pushed off of the start)
x.appendleft(0) # now: deque([0, 'b', 'c']) (99 was pushed off of the end)
Counter provides a counting dictionary.
This structure inherits from dict and is designed to allow an integer count as well as a default 0 value for new keys. So instead of doing this:
c = {}
if 'a' not in c:
c['a'] = 0
else:
c['a'] = c['a'] + 1
We can do this:
from collections import Counter
c = Counter()
c['a'] = c['a'] + 1
Counter also has related functions that return a list of its keys repeated that many times, as well as a list of tuples ordered by frequency:
from collections import Counter
c = Counter({'a': 2, 'b': 1, 'c': 3, 'd': 1})
for key in c.elements():
print(key, end=' ') # c c c a a b b
print(','.join(c.elements())) # c,c,c,a,a,b,b
print(c.most_common(2)) # [('c', 3), ('a', 2)]
# 2 arg says "give me the 2 most common"
c.clear() # set all counts to 0 (but not remove the keys)
And, you can use Counter's implementation of the math operators to work with multiple counters and have them sum their values:
c = Counter({'a': 1, 'b': 2})
d = Counter({'a': 10, 'b': 20})
print(c + d) # Counter({'b': 22, 'a': 11})
defaultdict is a dict that provides a default object for new keys.
Similar to Counter, defaultdict allows for a default value if a key doesn't exist; but it will accept a function that provides a default value.
A defaultdict with a default list value for each key
from collections import defaultdict
ddict = defaultdict(list)
ddict['a'].append(1)
ddict['b']
print(ddict) # defaultdict(<type 'list'>, {'a': [1], 'b': []})
A defaultdict with a default dict value for each key
ddict = defaultdict(dict)
print(ddict['a']) # {} (key/value is created, assigned to 'a')
print(list(ddict.keys())) # dict_keys(['a'])
ddict['a']['Z'] = 5
ddict['b']['Z'] = 5
ddict['b']['Y'] = 10
# defaultdict(<class 'dict'>, {'a': {'Z': 5}, 'b': {'Z': 5, 'Y': 10}})
The profiler runs an entire script and times each unit (call to a function).
If a script is running slowly it can be difficult to identify the bottleneck. timeit() may not be adequate as it times functions in isolation, and not usually with "live" data. This test program (ptest.py) deliberately pauses so that some functions run slower than others:
import time
def fast():
print("I run fast!")
def slow():
time.sleep(3)
print("I run slow!")
def medium():
time.sleep(0.5)
print("I run a little slowly...")
def main():
fast()
slow()
medium()
if __name__ == '__main__':
main()
We can profile this code thusly:
>>> import cProfile >>> import ptest >>> cProfile.run('ptest.main()') I run fast! I run slow! I run a little slowly... 8 function calls in 3.500 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 3.500 3.500:1( ) 1 0.000 0.000 0.500 0.500 ptest.py:15(medium) 1 0.000 0.000 3.500 3.500 ptest.py:21(main) 1 0.000 0.000 0.000 0.000 ptest.py:4(fast) 1 0.000 0.000 3.000 3.000 ptest.py:9(slow) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 2 3.499 1.750 3.499 1.750 {time.sleep}
According to these results, the slow() and main() functions are the biggest time users. The overall execution of the module itself is also shown. Comparing our code to the results we can see that main() is slow only because it calls slow(), so we can then focus on the obvious culprit, slow(). It's also possible to insider profiling in our script around particular function calls so we can focus our analysis.
profile = cProfile.Profile()
profile.enable()
main() # or whatever function calls we'd prefer to focus on
profile.disable()
Command-line interface to cProfile
python -m cProfile -o output.bin ptest.py
The -m flag on any Python invocation can import a module automatically. -o directs the output to a file. The result is a binary file that can be analyzed using the pstats module (which we see results in largely the same output as run():
>>> import pstats >>> p = pstats.Stats('output.bin') >>> p.strip_dirs().sort_stats(-1).print_stats() Thu Mar 20 18:32:16 2014 output.bin 8 function calls in 3.501 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 3.501 3.501 ptest.py:1() 1 0.001 0.001 0.500 0.500 ptest.py:15(medium) 1 0.000 0.000 3.501 3.501 ptest.py:21(main) 1 0.001 0.001 0.001 0.001 ptest.py:4(fast) 1 0.001 0.001 3.000 3.000 ptest.py:9(slow) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 2 3.499 1.750 3.499 1.750 {time.sleep} <pstats.Stats instance at 0x017C9030>
Caveat: don't optimize prematurely We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- Donald Knuth Common wisdom suggests that optimization should happen only once the code has reached a working, clear-to-finalized state. If you think about optimization too soon, you may do work that has to be undone later; or your optimizations may themselves get undone as you complete the functionality of your code. Note: some of these examples taken from the "Mouse vs. Python" blog.
These packages provide varying approaches toward writing and running more efficient Python code.
• | PyPy: a "Just in Time" compiler for Python -- can speed up almost any Python code. |
• | Cython: superset of the Python language that additionally supports calling of C functions and declaring C types -- good for building Python modules in C. |
• | Pyrex: compiler that lets you combine Python code with C data types, compiling your code into a C extension for Python. |
• | Weave: allows embedding of C code within Python code. |
• | Shed Skin: an experimental module that can translate Python code into optimized C++. |
While PyPy is a no-brainer for speeding up code, the other libraries listed here require a knowledge of C. The deepest analysis of Python will incorporate efficient C code and/or take into account its underlying C implementation. Python is written in C, and operations that we invoke in Python translate to actions being taken by the compiled C code. The most advanced Python developer will have a working knowledge in C, and study the C structures that Python employs. |