Back to blog

Python Performance: Optimization Tips and Faster Python Versions

Ofer Dekel

Product Manager, Intel Granulate

What Is Python? 

Python is a popular, interpreted language used for data science, web applications, and many other use cases. Python is a higher-level language than, for example, C++ and Java. This means it abstracts computer details for the programmer, including memory management, pointers, and threads. 

These abstractions improve developer productivity and make Python easier to learn and use, which is a primary reason for its popularity. However, this also means the instructions written by coders require more resources to execute as machine code, and Python programs are slower than those of lower-level languages. 

It is possible to optimize Python and significantly improve its performance. In addition, new Python implementations are adding innovations that are making Python more performant.

In this article:

Why Is Python Slow? 

When executing a .py file in Python, the interpreter implicitly compiles the source code to bytecode. Python bytecode consists of simpler instructions that are in turn executed by the Python virtual machine, rather than directly by the CPU. The bytecode is translated into machine language and only then executed directly on a specific CPU. 

New call-to-action

Here are a few reasons that Python bytecode tends to run slow: 

  • Python is dynamically typed—the Python virtual machine cannot heavily optimize bytecode execution because the language is dynamically typed, and has to constantly check for type correctness and allocate memory accordingly. In statically typed languages like Java and C#, variable types remain the same, and can be resolved at compile time rather than at runtime.
  • Python is single threaded—in CPython there is a separate global interpreter lock (GIL) instance for each CPU core, which prevents multiple threads from running concurrently. The reason for this is memory safety—it prevents memory leaks and race conditions. However, it can cause bottlenecks, because threads running in the same interpreter must wait until the currently destroys the GIL. 
  • Python is an interpreted language—a Python program can run without being compiled into a machine language. An interpreted language may perform worse than a compiled language because the instructions require more processing. With a compiled language, the compiler produces bytecode directly readable by the processor, requiring fewer resources to execute operations.
  • Python uses ahead-of-time (AOT) compilation—Python code requires interpretation for every execution. It uses pre-compiled files to cache bytecode generated from Python code. This approach is less effective than Java and other languages that use just-in-time (JIT) compilation. The Python AOT compiler must ensure the CPU can understand all code before it runs, slowing down the process. JIT compilation enables better optimization and executes the same bytecode faster. 

5 Tips for Improving Python Performance 

Use the following practices to enhance the performance of your Python application.

Benchmark Your Current Performance Metrics 

Benchmarking is the best way to verify that an application’s performance is adequate. Here are some popular benchmarking and profiling tools: 


This tool is useful for individually benchmarking isolated code lines. It can run and benchmark small code snippets, allowing you to control the tests. An important Timeit capability is to enable or disable garbage collection, providing deep insights into how that code manages the available memory.

However, Timeit may be unsuitable for benchmarking large pieces of code because it processes code as strings. The timeit.timeit() calls only output the time it takes for snippets to execute—they don’t provide other details about the code’s performance. 

Thus, you might benefit from a profiling tool that does the same thing as a benchmarking tool but with greater detail. 

Python Profiler

A profile is a set of statistics describing the frequency and duration of the execution of various program parts. A profiler simultaneously analyzes several metrics to check the code snippet. Python has a built-in profiler that covers most of the profiling requirements for generic applications.

The Python profiler provides more detailed, actionable data, including the number of calls monitored, the order of calls, and more: 

  • ncall—the total number of calls 
  • tottime—the total time taken in each call 
  • cumtime—the cumulative time for each sub-function call

These metrics help break down the code execution and find the exact lines that slow down an application’s performance.

You can also use a 3rd party profiler, like the free continuous profiler from Granulate, which can provide full visibility into Python applications alongside other languages. 


Another way to benchmark cells in jupyter notebooks is to add %time or %%time. The first option records the time it takes to run the first statement, while the second option gauges the time it takes to run an entire cell. This technique works on the fly. 

Avoid Global Variables

Typically in computer science, it is a best practice not to use global variables, and this is especially true for Python. It is generally better to use local variables to better track scope and memory usage. But aside from memory usage, Python is a bit faster when retrieving local variables than global ones. Therefore, it is best to avoid using global variables whenever possible.

Use List Comprehensions 

Every development language can create loop conditions, but it is first necessary to create a list of comprehension. Here is a simple example of how to make a comprehension list.

Use the following code to determine all odd-number cubes for a given series:

cube_numbers = []

for n in range(0,10):

if n % 2 == 1:


A programmer can wrap this three-line method into one line:

cube_numbers = [n**3 for n in range(1,10) if n%2 == 1]

This process is more accurate and runs the entire loop faster. It can make a huge difference when using Python over other languages because it cuts execution time.

Use Built-In Functions 

Just like any interpreted or high-level language, Python has its own built-in functions. You can use these built-in library functions to build many common capabilities into your Python code. Using built-in functions can help you optimize your code volume and avoid duplicate code, improving performance. It has the additional benefit of improving coding quality, easing collaboration, and making it easier to troubleshoot your code.

New call-to-action

Import Modules Lazily Wherever Possible

One of the best ways to improve performance metrics is to distribute loading of modules, and load them only when needed. Not all languages support this, but Python does, letting you optimize memory and processor usage.

Beginners are generally advised to define all imports at the top of the file to keep the job running. This is suitable for smaller Python programs. But for more complex programs with multiple modules, there will be a spike in memory usage when all the modules are fetched at once. Lazy loading can significantly reduce this overhead.

Related content: Read our guide to Python performance testing

Python 3.11 Version Performance 

The Python Software Foundation releases new versions of Python every year. They release a feature-locked beta version during the first half of the year and a final release towards the end of the year. 

The foundation finalized Python 3.11’s feature in September 2022 and released a beta version for testing. The foundation recommends trying the latest version on non-production code to verify it works with existing programs and ensure existing code benefits from this version’s performance enhancements.

A specializing adaptive interpreter

Python 3.11 provides many performance improvements, including a specializing adaptive interpreter. Because an object’s type does not change often, the interpreter can now attempt to analyze running code and replace general bytecodes with type-specific ones. For example, it can replace binary operations, such as add or subtract, with specialized versions for floats, strings, and integers.

Function calls

Python 3.11 improves function calls to require less overhead by ensuring stack frames for function calls are more efficiently designed and use less memory. Recursive calls are not tail-optimized, but version 3.11 has made them more efficient. Additionally, core modules required for the Python runtime are stored and loaded more efficiently, and the Python interpreter starts faster.

Speed improvements

The official Python benchmark suite states that Python 3.11 runs approximately 1.25 times faster than version 3.10. However, that speedup is an aggregate measure, which means some aspects are faster, while many others were only made slightly faster or remain the same. These improvements are free—there is no need to modify your code for Python programs to leverage 3.11’s speedups.

Alternative Python Implementations for Better Performance 


PyPy is an alternative interpreter designed to run faster than CPython. It is based on the RPython language, which was co-developed with it. PyPy’s main executable includes a Just-in-Time (JIT) compiler. It can run most benchmarks rapidly, including large and complicated Python applications.

While PyPy can speed up your code in many cases, there are some scenarios where it might not work, including:

  • Short-running processes—when PyPy does not run for a few seconds, the JIT compiler does not get enough time to warm up.
  • Runtime libraries—when all the time is spent on runtime libraries like C functions without running any Python code, the JIT compiler does not help speed things up.

PyPy works best when executing long-running programs that spend a significant amount of time executing Python code.

New call-to-action


This open source implementation of Python is tightly integrated with .NET. IronPython enables the NET Framework, Python libraries, and various .NET languages to use Python code efficiently. It performs well in Python programs that employ threads or multiple cores because it includes a JIT and does not use the global interpreter lock.


Stackless Python is an enhanced version of Python that enables you to leverage thread-based programming without experiencing any of the complexity and performance issues associated with conventional threads. Stackless adds lightweight and cheap microthreads to Python that help improve a program’s structure, create more readable code, and increase developer productivity.

Optimize application performance.

Save on cloud costs.

Start Now
Back to blog