Profiling of code is a process where you find out which particular part of the code is taking how much time. This can be useful when optimizing code that is taking more time than expected.
The profiling can be of time (time complexity) or even it can be of memory (space complexity) as well sometimes. There are profilers that let us know memory usage at each statement of code.
Python distribution provides two profilers for measuring execution time of various functions.
Both profilers let us measure execution time of various Python functions present in our code/script/program.
Apart from these two, there are many other Python time profilers (line_profiler, py-spy, pyinstrument, yappi, pprofile, Scalene, Snakeviz, etc) that needs to be installed separately using 'pip' or 'conda'.
There are two ways of profiling code:
As a part of this tutorial, we'll be explaining how to use Python modules cProfile and profile to profile Python code/script/program. We'll be explaining the usage of both libraries with a few examples. We'll also cover the pstats module which can be used to format the output of profilers.
If you are interested in learning about other profilers available in python then please feel free to go through the references section to check out tutorials on other profilers.
Below, we have listed important sections of tutorial to give an overview of the material covered.
import cProfile
The cProfile has a function named run() which accepts any python statement. It'll pass that python statement to exec() function to execute and generate profiling results as well.
Below we are simply running a loop 10000 times and generating random numbers between 1-10 each time. We keep all random numbers in a list and then sum them up. We have passed this statement to the run() method of cProfile to see profiling results of where time is getting spent.
import random
cProfile.run("sum([random.randint(1,10) for i in range(10000)])")
We can understand from the above output that a total of 55957 function calls were made. The Ordered by: standard name indicates that the table displayed is sorted by string name printed at last.
The meaning of each column in the table is explained below:
The run() function also provides another parameter named filename to which we can provide filename where the output of profiling will be saved. We'll load this file later using the pstats module.
cProfile.run("sum([random.randint(1,10) for i in range(10000)])", filename="out.prof")
The run() method explained in the previous example has the limitation that we need to pass a single statement to it which needs to be profiled.
If we want more control over which part of the code should be profiled then Profile() constructor is a better option. It has a function named enable() and disable() which lets us enable and disable profiler so that only a particular part of the code can be profiled. This can help us avoid unnecessarily profiling whole code when we need to profile only certain parts of code.
As a part of this example, we have explained how to profile a particular part of Python code/script/program.
Below we have created two function slow_add() and fast_add() which takes two numbers and returns their sums. The slow_add() function has a sleep time of half-second so that it takes more time to perform the sum. We have then created the main_func() function where we are looping through numbers from 0-9 and calling slow_add() if the number is even else calling fast_add() passing the same number two times as parameters. We are adding the output of function calls to the arr list and returning it as well. We have declared our profiler outside of main_func() and we are enabling it before the execution of the loop and disabling it after the execution of the loop. This will prevent the profiling of other parts of the function.
import time
def slow_add(a, b):
time.sleep(0.5)
return a+b
def fast_add(a, b):
return a+b
prof = cProfile.Profile()
def main_func():
arr = []
prof.enable()
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
prof.disable()
return arr
main_func()
Once profiling is completed, we can call the print_stats() function to print profiling statistics.
prof.print_stats()
The Profiler object also has a method named dump_stats() which lets us dump profiling results to a file.
The profiling results written to a file can be read and displayed using pstats module. We have explained it in section below on pstats.
prof.dump_stats("main_func.prof")
From python v3.8, we can even use the Profile() object as a context manager. Below we have recreated code of main_func() but this time we have declared Profile() as context manager using with statement. We have then returned the profile object as well which can be used to print the results of profiling.
def main_func():
arr = []
with cProfile.Profile() as prof:
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
return arr, prof
out, prof = main_func()
print(out)
prof.print_stats()
Output:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
26 function calls in 2.504 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.000 0.000 2.504 0.501 <ipython-input-178-2cfef41b561a>:3(slow_add)
5 0.000 0.000 0.000 0.000 <ipython-input-178-2cfef41b561a>:7(fast_add)
5 2.503 0.501 2.503 0.501 {built-in method time.sleep}
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The python also lets us execute profiler from the command line directly on the python file. We have put the code of this example into a file named addition.py as shown below.
addition.py
import time
def slow_add(a, b):
time.sleep(0.5)
return a+b
def fast_add(a, b):
return a+b
def main_func():
arr = []
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
return arr
if __name__ == '__main__':
main_func()
Once we have put code to a python file, we can use the below command in the command prompt or Unix shell to profile that python file.
Command to Profile File:
Output
29 function calls in 2.504 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 2.504 2.504 addition.py:1(<module>)
1 0.000 0.000 2.504 2.504 addition.py:10(main_func)
5 0.000 0.000 2.504 0.501 addition.py:3(slow_add)
5 0.000 0.000 0.000 0.000 addition.py:7(fast_add)
1 0.000 0.000 2.504 2.504 {built-in method builtins.exec}
5 2.504 0.501 2.504 0.501 {built-in method time.sleep}
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
We can even save the output of profiling to file when performing profiling from the command line as explained below.
We can also sort the output of profiling based on a particular column of the table by giving that column name with -s parameter.
Python also provides the 'pstats' module which lets us format profiling results. It provides various methods that we have explained below to format results in different ways.
The "pstats" module lets us load the profiler results file using its Stats() constructor. We can pass a single file or the list of files containing profiling results to Stats() constructor.
Below we have loaded the prof_out.prof file created during the previous example. We can print profiling results from the Stats object by calling the print_stats() method on it. The print_stats() method accepts a list of arguments that can be used to filter results. It has been explained below with simple examples.
import pstats
p = pstats.Stats("prof_out.prof")
p.print_stats();
The pstats module also provides us with various methods that let us modify the output of profiling. One important method that it provides is sort_stats() which accepts column name from profiling results and then sorts data according to the column name. The list of column names is stored in pstats.SortKey.
Below we have sorted the results of profiling based on cumulative time and then have taken the first 3 results from it when printing.
p.sort_stats(pstats.SortKey.CUMULATIVE).print_stats(3);
Below we have called sort_stats() again but this time we are sorting profiling results based on total time and cumulative time.
p.sort_stats(pstats.SortKey.TIME, pstats.SortKey.CUMULATIVE).print_stats();
The below example demonstrates that we can pass a string to print_stats() and it'll filter profiling results which have that string present in it.
p = pstats.Stats("prof_out.prof")
p.print_stats("sleep");
The below example demonstrates another usage of print_stats() where we have passed float number to filter a total number of results to half. It'll multiply a total number of rows by that float number and print only that many rows.
p = pstats.Stats("prof_out.prof")
p.print_stats(0.5);
Another important method provided by Stats object is strip_dirs(). It'll remove all leading pathname information from filename in the last column of profiling results. This can help us if the result has long path information present which is truncating the output to the next line exceeding monitor limit.
p = pstats.Stats("prof_out.prof")
p.strip_dirs().print_stats();
We can even reverse the order of profiling results by calling the reverse_order() method on Stats object.
p = pstats.Stats("prof_out.prof")
p.sort_stats(pstats.SortKey.CUMULATIVE).reverse_order().print_stats();
The Stats object has two other methods which can be very useful to understand which function is calling which other function. The print_callers() provides us results that specify which function was called by which other function. The print_callees() function results inform us about a function called which other functions.
p = pstats.Stats("prof_out.prof")
p.print_callers();
p.print_callees();
Our fifth example explains the profiling of recursive functions. We have written a recursive function below which returns nth Fibonacci number. We have profiled this function by calling it with a parameter value of 25. We can see from the profiled output that there were 150052 total function calls and 4 primitive calls. We can even notice from ncalls of the first line that Fibonacci function has 150049 total calls and 1 primitive call.
def Fibonacci(n):
if n<=0:
print("Incorrect input")
elif n==1:
return 0
elif n==2:
return 1
else:
return Fibonacci(n-1)+Fibonacci(n-2)
cProfile.run("Fibonacci(25)")
The second profiler that we'll explain as a part of this tutorial is profile. We'll import it to get started.
import profile
The profile also has the same methods as that of cProfile. Our first example of profile is exactly the same as that of cProfile. We have used the run() method of profile to profile python statement which sums up 10000 random integers between 0-9.
import random
profile.run("sum([random.randint(1,10) for i in range(10000)])")
The output of profile is formatted exactly the same way as that of the output of cProfile.
Our second example explains how to use the Profile() constructor to profile code. We can call its print_stats() method once profiling is complete to see results. We can even dump the output of profiling to a file using the dump_stats() method.
import time
def slow_add(a, b):
time.sleep(0.5)
return a+b
def fast_add(a, b):
return a+b
def main_func():
arr = []
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
return arr
prof = profile.Profile()
prof.run("main_func()")
prof.print_stats()
prof.dump_stats("prof_out2.prof")
Please make a NOTE that we can format profiling results generated using "profile" module exactly same way as we did with "cProfile" profiling results above. We can use "pstats" module as explained earlier in different ways to format profiling results.
The profiler available through the profile module subtracts a small constant from each event during profiling hence removing extra overhead added to the profiling process by the profiler. The default value of this constant is 0. The process of removing this overhead of profiling is commonly referred to as calibration.
We can find out a better constant for a particular system by executing the below code. Below code executes python statements 10000 statements with a profiler and without a profiler. It then finds our overhead as a float. It performs this process five times and adds overhead found each time to a python list. We then take an average to find out final float which can be used as a constant that will be used by the profiler to remove extra overhead added by the profiler.
pr = profile.Profile()
cal = []
for i in range(5):
cal.append(pr.calibrate(10000, verbose=1))
bias = sum(cal)/5
bias
Once we have found out bias constant that we would like to remove for each event when using profiling, we can set it in profiler by setting the bias parameter when creating the Profile object as explained below.
We can then again run code with this calibrated profiler to see how it affects the measurements.
prof = profile.Profile(bias=bias)
prof.run("main_func()")
prof.print_stats()
We can see from the above results that time taken when using a calibrated profiler is a little less than when used without calibrating.
The jupyter notebook provides us with line and cell magic commands which can be used for profiling in notebooks. We can use %prun to perform profiling of one line code and %%prun to perform profiling of whole cell code.
If you are someone who is new to concept of Magic commands in Jupyter notebook then we'll recommend that you go through below link that covers it in detail. Basically, it provides access to some day-to-day utilities (available otherwise through shell/command prompt) through commands starting with single/double percent sign.
Below we are first using %prun to profile function call Fibonacci(25).
%prun Fibonacci(25)
150052 function calls (4 primitive calls) in 0.035 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
150049/1 0.035 0.000 0.035 0.035 <ipython-input-114-b552bd3bf56e>:1(Fibonacci)
1 0.000 0.000 0.035 0.035 {built-in method builtins.exec}
1 0.000 0.000 0.035 0.035 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Below we are profiling code from our previous main_func() function using %%prun cell magic command.
%%prun
arr = []
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
print(arr)
63 function calls in 2.504 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 2.504 0.501 2.504 0.501 {built-in method time.sleep}
3 0.001 0.000 0.001 0.000 socket.py:337(send)
1 0.000 0.000 2.504 2.504 <string>:1(<module>)
1 0.000 0.000 2.504 2.504 {built-in method builtins.exec}
5 0.000 0.000 2.504 0.501 <ipython-input-28-f6d5027c3479>:3(slow_add)
3 0.000 0.000 0.001 0.000 iostream.py:197(schedule)
2 0.000 0.000 0.001 0.000 iostream.py:386(write)
1 0.000 0.000 0.001 0.001 {built-in method builtins.print}
3 0.000 0.000 0.000 0.000 threading.py:1080(is_alive)
3 0.000 0.000 0.000 0.000 threading.py:1038(_wait_for_tstate_lock)
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
3 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.lock' objects}
2 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
2 0.000 0.000 0.000 0.000 iostream.py:310(_is_master_process)
5 0.000 0.000 0.000 0.000 <ipython-input-28-f6d5027c3479>:7(fast_add)
2 0.000 0.000 0.000 0.000 iostream.py:323(_schedule_flush)
3 0.000 0.000 0.000 0.000 iostream.py:93(_event_pipe)
3 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects}
2 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
3 0.000 0.000 0.000 0.000 threading.py:507(is_set)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The profile profiler can be used from the command line/shell exactly the same way as that of cProfile. Below we are first profiling the addition.py file created in earlier examples.
Command to Profile File:
Output
30 function calls in 0.003 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10 0.000 0.000 0.000 0.000 :0(append)
1 0.000 0.000 0.001 0.001 :0(exec)
1 0.003 0.003 0.003 0.003 :0(setprofile)
5 0.000 0.000 0.000 0.000 :0(sleep)
1 0.000 0.000 0.000 0.000 addition.py:1(<module>)
1 0.000 0.000 0.000 0.000 addition.py:10(main_func)
5 0.000 0.000 0.000 0.000 addition.py:3(slow_add)
5 0.000 0.000 0.000 0.000 addition.py:7(fast_add)
1 0.000 0.000 0.003 0.003 profile:0(<code object <module> at 0x7f17e69a2270, file "addition.py", line 1>)
0 0.000 0.000 profile:0(profiler)
We can even save the output of profiling to file when performing profiling from the command line as explained below.
This ends our small tutorial explaining how to profile Python code using cProfile and profile profilers.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to