Performance measurement in Python 3

Performance measurement is the process of collecting and understanding information regarding the performance of some code.

In this blog I will cover the basics of the following modules in Python:

  • timeit

  • cProfile

  • pstats

  • memory_profiler

  • line_profiler

I will start with the timeit module. This module enables us to time small pieces of code.

Example:

import timeit

res1 = timeit.timeit('''
a = [i for i in range(50000)]
for i in a:
    pass
''', number = 100)

res2 = timeit.timeit('''
a = (i for i in range(50000))
for i in a:
    pass
''', number = 100)

print(res1)

print(res2)

#output
0.26830113399955735
0.3227190840007097

Running timeit.timeit(args) returns a float which represents the time taken to execute the given code. In the above example, the first argument is the code to be executed and the second argument is the number of times the code should be run. Executing the code multiple times and taking the average gives a more accurate result than running it a single time.

Interestingly, the above example also demonstrates that list comprehensions are faster than using generator expressions to iterate over a range of values, but the memory consumption is higher which is not profiled in the above example.

The cProfile module provides deterministic profiling of a python program. Calling cProfile.run(func_name, output_file_name) profiles the given function and writes the output to a specified file.

The following columns are included in the output:

  • ncalls : for the number of calls

  • tottime : for the total time spent in the given function (and excluding time made in calls to sub-functions)

  • percall : is the quotient of tottime divided by ncalls

  • cumtime : is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions

  • percall : is the quotient of cumtime divided by primitive calls

  • filename:lineno(function) : provides the respective data of each function

The pstats.Stats class is used to read the profiled data and format the results as required.

The following example profiles to_be_profiled() and writes the profile data into the file called cprofile_results. The pstats.Stats class is then used to format the profile results in commonly used formats (check the comments in the program).

import cProfile
import pstats
import time
import sys

def to_be_profiled():
    my_list1 = [i**2 for i in range(50000)]

    my_list2 = (i**2 for i in range(100000, 150000))

    sum = 0

    print("my_list1 = {} bytes".format(sys.getsizeof(my_list1)))
    print("my_list2 = {} bytes".format(sys.getsizeof(my_list2)))

    for i in my_list2:
        sum += i
        time.sleep(0.00001)
        my_list1.append(i)
    print(sum)

cProfile.run('to_be_profiled()', 'cprofile_results')

p = pstats.Stats('cprofile_results')
#sort by standard name
p.strip_dirs().sort_stats(-1).print_stats(10)
#sort by function name
p.sort_stats('name').print_stats(10)
#sort by cumulative time in a function
p.sort_stats('cumulative').print_stats(10)
#sort by time spent in a function
p.sort_stats('time').print_stats(10)

#output
my_list1 = 406496 bytes
my_list2 = 88 bytes
791660416675000
Sat Dec  3 21:23:38 2016    cprofile_results

         150013 function calls in 3.580 seconds

   Ordered by: standard name
   List reduced from 11 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002    3.580    3.580 <string>:1(<module>)
        1    0.117    0.117    3.578    3.578 performance.py:23(to_be_profiled)
        1    0.014    0.014    0.014    0.014 performance.py:24(<listcomp>)
    50001    0.078    0.000    0.078    0.000 performance.py:26(<genexpr>)
        1    0.000    0.000    3.580    3.580 {built-in method builtins.exec}
        3    0.001    0.000    0.001    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 {built-in method sys.getsizeof}
    50000    3.357    0.000    3.357    0.000 {built-in method time.sleep}
    50000    0.011    0.000    0.011    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Sat Dec  3 21:23:38 2016    cprofile_results

         150013 function calls in 3.580 seconds

   Ordered by: function name
   List reduced from 11 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.580    3.580 {built-in method builtins.exec}
        3    0.001    0.000    0.001    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 {built-in method sys.getsizeof}
    50000    3.357    0.000    3.357    0.000 {built-in method time.sleep}
    50001    0.078    0.000    0.078    0.000 performance.py:26(<genexpr>)
        1    0.014    0.014    0.014    0.014 performance.py:24(<listcomp>)
    50000    0.011    0.000    0.011    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.002    0.002    3.580    3.580 <string>:1(<module>)

Sat Dec  3 21:23:38 2016    cprofile_results

         150013 function calls in 3.580 seconds

   Ordered by: cumulative time
   List reduced from 11 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.580    3.580 {built-in method builtins.exec}
        1    0.002    0.002    3.580    3.580 <string>:1(<module>)
        1    0.117    0.117    3.578    3.578 performance.py:23(to_be_profiled)
    50000    3.357    0.000    3.357    0.000 {built-in method time.sleep}
    50001    0.078    0.000    0.078    0.000 performance.py:26(<genexpr>)
        1    0.014    0.014    0.014    0.014 performance.py:24(<listcomp>)
    50000    0.011    0.000    0.011    0.000 {method 'append' of 'list' objects}
        3    0.001    0.000    0.001    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 {built-in method sys.getsizeof}
        2    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}

Sat Dec  3 21:23:38 2016    cprofile_results

         150013 function calls in 3.580 seconds

   Ordered by: internal time
   List reduced from 11 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    50000    3.357    0.000    3.357    0.000 {built-in method time.sleep}
        1    0.117    0.117    3.578    3.578 performance.py:23(to_be_profiled)
    50001    0.078    0.000    0.078    0.000 performance.py:26(<genexpr>)
        1    0.014    0.014    0.014    0.014 performance.py:24(<listcomp>)
    50000    0.011    0.000    0.011    0.000 {method 'append' of 'list' objects}
        1    0.002    0.002    3.580    3.580 <string>:1(<module>)
        3    0.001    0.000    0.001    0.000 {built-in method builtins.print}
        1    0.000    0.000    3.580    3.580 {built-in method builtins.exec}
        2    0.000    0.000    0.000    0.000 {built-in method sys.getsizeof}
        2    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}

Moving on to the memory_profiler, the installation procedure is as follows:

#install memory_profiler module
pip install memory_profiler

#install psutil

pip install psutil

#install matplotlib
pip install matplotlib

sudo apt-get install python3-matplotlib

To install matplotlib, I first tried pip install matplotlib, but I was still facing issues plotting graphs. Then I tried, sudo apt-get install python-matplotlib which installed matplotlib for Python 2. Finally I tried sudo apt-get install python3-matplotlib and was able to plot graphs. So my installation procedure installed a lot of unnecessay modules and packages.

To see a line by line memory profile of a function, the memory_profiler is used. Importing is done using from memory_profiler import profile. To specify which function needs to be memory profiled, use the @profile decorator.

Example:

import sys
import cProfile
from memory_profiler import profile

@profile()
def mem_to_be_profiled():
    
    my_list1 = [i**2 for i in range(50000)]

    my_list2 = (i**2 for i in range(100000, 150000))

    sum = 0

    print("my_list1 = {} bytes".format(sys.getsizeof(my_list1)))
    print("my_list2 = {} bytes".format(sys.getsizeof(my_list2)))

    for i in my_list2:
        sum += i
        my_list1.append(i)
    print(sum)

mem_to_be_profiled()

#output
my_list1 = 406496 bytes
my_list2 = 88 bytes
791660416675000
Filename: mem.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     30.4 MiB      0.0 MiB   @profile()
     6                             def mem_to_be_profiled():
     7
     8     32.1 MiB      1.7 MiB       my_list1 = [i**2 for i in range(50000)]
     9
    10     34.8 MiB      2.7 MiB       my_list2 = (i**2 for i in range(100000, 150000))
    11
    12     32.1 MiB     -2.7 MiB       sum = 0
    13
    14     32.1 MiB      0.0 MiB       print("my_list1 = {} bytes".format(sys.getsizeof(my_list1)))
    15     32.1 MiB      0.0 MiB       print("my_list2 = {} bytes".format(sys.getsizeof(my_list2)))
    16
    17     34.8 MiB      2.7 MiB       for i in my_list2:
    18     34.8 MiB      0.0 MiB           sum += i
    19     34.8 MiB      0.0 MiB           my_list1.append(i)
    20     34.8 MiB      0.0 MiB       print(sum)

I was hoping to demonstrate that list comprehensions use more memory than generator expressions using memory profiling, but for some reason 2.7 MB of memory is allocated during the creation of a generator object and then it’s immediately released in the next line. If you know why this happens everytime, let me know in the comments.

Now comes the interesting part: plotting a graph of memory usage vs time. We will use the mprof executable for this.

Run the script and collect memory profile data:

mprof run mem.py

The above command stores the result in a file in the current directory.

Plot a graph of memory usage vs time:

mprof plot

The above command uses the most recently generated memory profile data.

Output screenshot:

Memory profile graph

Next I will cover the line_profiler module.

Installation:

pip install line_profiler

Use the @profile decorator to specify the function that needs to be profiled.

Running the script directly using python file.py throws an error since @profile is not defined,

To see the line by line profile of the function, use :

kernprof -l -v line.py

There are ways to get around this issue and run it as a normal script as well as profiling it using kernprof. Here is a Stackoverflow answer.

Example:

import sys

@profile
def to_be_profiled():
    
    my_list1 = [i**2 for i in range(50000)]

    my_list2 = (i**2 for i in range(100000, 150000))
    sum = 0
    print("my_list1 = {} bytes".format(sys.getsizeof(my_list1)))
    print("my_list2 = {} bytes".format(sys.getsizeof(my_list2)))

    for i in my_list2:
        sum += i
        my_list1.append(i)
    print(sum)


to_be_profiled()

#output
my_list1 = 406496 bytes
my_list2 = 88 bytes
791660416675000
Wrote profile results to line.py.lprof
Timer unit: 1e-06 s

Total time: 0.17311 s
File: line.py
Function: to_be_profiled at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def to_be_profiled():
     5
     6         1        17575  17575.0     10.2      my_list1 = [i**2 for i in range(50000)]
     7
     8         1            7      7.0      0.0      my_list2 = (i**2 for i in range(100000, 150000))
     9         1            1      1.0      0.0      sum = 0
    10         1           43     43.0      0.0      print("my_list1 = {} bytes".format(sys.getsizeof(my_li
st1)))
    11         1           24     24.0      0.0      print("my_list2 = {} bytes".format(sys.getsizeof(my_li
st2)))
    12
    13     50001        68438      1.4     39.5      for i in my_list2:
    14     50000        42867      0.9     24.8          sum += i
    15     50000        44122      0.9     25.5          my_list1.append(i)
    16         1           33     33.0      0.0      print(sum)

I only covered the basic usage of the performance measurement modules, to learn more refer to the links below.

References:

Code for today’s plog: