Back to blog

Profiling in Python: Top Tools and 5 Tips for Success

Ofer Dekel

Product Manager, Intel Granulate

What Is Python Profiling? 

Profiling can inform you how much memory or CPU time a program or instruction consumes. Profiling Python code involves modifying the program’s executable binary form or source code and using an analyzer to investigate the code.

It is common for a non-optimized program to spend most of its CPU cycle in a specific subroutine. Profiling can help analyze how the code behaves and uses the available resources. Developers usually profile Python programs to optimize performance or address unusual bugs, such as memory leaks.

The information obtained from profiling allows developers to identify bottlenecks, gain a better understanding of the code, and fix problems.

This is part of a series of articles about optimizing Python

In this article:

Why Is Python Code Profiling Important?

Code profiling is the software engineering practice of analyzing bottlenecks programmatically. It involves analyzing memory consumption and function calls’ number and execution time. This analysis is important for the rigorous detection of slow or resource-inefficient parts of software programs, enabling program optimization.

Profiling is useful for many software programs, including machine learning (ML) and data science systems. Developers use it when building extract, transform, and load (ETL) and ML models. Python’s Pandas library allows you to perform profiling with ETL—this includes analyzing Pandas operations such as reading data, data frame merging, GroupBy, type casting, and imputing missing values.

New call-to-action

In ML software, identifying bottlenecks is a crucial task. For example, a Python script might read data and perform model training and prediction operations. The machine learning pipeline would include steps to load the data, perform a GroupBy operation, partition the data for testing and training, fit several ML models, make predictions for the models on the test data, and measure the model’s performance. It may take a few minutes to run the first version deployed.

But what if the execution time for the script increases after updating the data? How can you identify the step in your pipeline that caused this problem? Code analysis helps detect the parts of your code causing the problem, allowing you to fix it.

Top Python Profiling Tools 

Here are some of the leading tools used for Python code profiling.

Continuous Profiling by Granulate

Granulate’s profiler is a free, open source production profiler that helps teams identify performance bottlenecks that hurt performance and raise costs. The tool specializes in continuous profiling and supports practically all programming languages and runtimes—not just Python.

It’s extremely easy to install and start using and its user interface is intuitive and flexible, letting you select different time periods, filter processes, etc. One of the main qualities of profiler’s UI is that it shows a unified view of your entire system, not just isolated areas, written in a specific programming language. Granulate’s profiler additionally lets you share graphs and profiled data with other team members by inviting them to the profile, or by exporting it as an SVG image.

The profiler tracks CPU and memory utilization as well, displaying them in nice charts for easy viewing. From there, you can monitor for occasional spikes or observe a moment in time when a known performance drop occurred. A great feature of Granulate’s profiler interface is that it allows you to select a problematic area on a chart and switch to the flame graph, from where you can identify the root cause of the issue:


The Python library offers two stopwatch functions, Time and Timeit, useful for profiling the time between code snippets. The Time module uses the perf_counter function to retrieve a timestamp from the operating system’s timer. You can call the time.perf_counter function before and after an action to determine the difference. It provides a low-overhead way to measure the time to execute code but is only a stopwatch. 

You can use the Timeit module to benchmark Python code. Call timeit.timeit to run a code snippet multiple times (1 million by default) and obtain the total time. This function is most useful for determining the performance of a single function in a tight loop—i.e., to determine the fastest way to perform an action many times. However, Timeit is best suited for micro-benchmarking individual code blocks or lines. 

Both modules are suitable only for isolated snippets of code, not an entire program.  


This program-wide analyzer from the Python standard library traces all function calls to create a list of the most frequently called functions and their average runtime. It is best suited for profiling applications in development.

The main advantages of cProfile are: 

  • It’s part of the standard library and is available in Python-stock installations. 
  • It profiles various call behavior statistics to help determine if a function is slow or if another is slowing it down.
  • It is freely cProfile constrainable, meaning it can profile select functions or entire programs. 

However, cProfile generates many statistics by default, so finding the right ones can be challenging. It also creates overhead by trapping all function calls, making it unsuitable for applications in production. 


This tool samples the program’s call-stack state at regular intervals rather than recording each call. Py-spy’s core components are in Rust. It runs out-of-process, making it safe for profiling code in production. Unlike other profilers, Py-spy can analyze multi-threaded and sub-processed applications. It can work with C extensions if compiled with symbols. 

You can inspect an application with Py-spy using the record command or top command. The first creates a flame graph at the end of the run, and the second generates an interactive, real-time display of the application’s inside workings. 

However, Py-spy is best suited to profiling entire programs or components from the outside, not specific functions.


Yappi stands for Yet Another Python Profiler and offers many top features of other profilers Yappi is the default profiler for PyCharm. 

You can use Yappi by decorating code with instructions to invoke and generate reports for profiling. You choose between CPU time and wall time (a basic stopwatch). The first option uses native APIs to measure the time the CPU was engaged while executing code, providing an accurate picture of the operation’s runtime. 

Yappi can retrieve statistics from threads in a way that doesn’t require decorating the threaded code. The yappi.get_thread_stats() function retrieves stats from all recorded thread activity that you can filter and sort on a granular basis. Yappi can profile coroutines and greenlets, which is useful for async metaphors in Python.

New call-to-action

5 Python Profiling Best Practices 

Here are some best practices to help make the most of this repetitive software development task. 

1. Create a Regression Test Suite

Before optimizing code, you must ensure the changes won’t negatively impact its functioning. Building a test suite is the best way to do this, especially for a large code base. The code coverage must be high to ensure the best results. Regression tests allow you to attempt optimizations without worrying about breaking your code.

2. Ensure the Code is Straightforward 

Functional code is often easy to refactor because it structures functions in a way that avoids side effects. Functions should avoid local, mutable states to be easy to understand or change. 

3. Be Patient

Profiling can be slow, so developers shouldn’t expect to run a profiler and find problems quickly. The problems you profile are usually issues you can’t fix with simple debugging. You need to browse through data, map it, and narrow down potential sources of the problem before you find it.

4. Collect as Much Data as Possible

You should collect all the data available before analyzing your software, depending on its size and type. Useful data sources include profilers, server logs from web apps, custom logs, and system resource snapshots.

5. Pre-process the Data

Once you have the data from your logs, profilers, and other sources, you may need to pre-process it before analysis. For example, unstructured data can be helpful even if the profiler cannot understand it. Parsing data and putting it in a database system like MySQL or MongoDB can add meaning and simplify querying. 

This step is called ETL—you extract data from the sources, transform it into a meaningful format, and load it to a system for querying.

Optimize application performance.

Save on cloud costs.

Start Now
Back to blog