If you’ve worked within the Python ecosystem for a while, chances are you’ve heard complaints about its multithreading model. More than likely, these complaints have centered around the GIL: the Python Global Interpreter Lock.
This article will help you understand the function of the GIL as well as its potential performance impacts and how it can be bypassed when necessary.
Clarifying Concurrency
Before digging into the Python GIL, let’s establish a common framework around concurrency-related concepts.
Concurrency and Parallelism
Concurrency is a computational model in which a task does not lock resources while it’s idle, allowing resources to be used by other tasks. Concurrency can be achieved by using multithreaded programs.
A multithreaded program is one that explicitly schedules threads for executing pieces of code. If these threads are run on a single CPU with a non-blocking approach, then the program runs under a concurrent model. On the flip side, if these threads are run at the same time on one or more CPUs, then the program runs under a parallel model. Parallelism is frequently considered to be a specialization of the concurrent model.
I/O and CPU Bound
In Python, threads are implemented as pthreads (EEE POSIX 1003.1c standard for Linux and macOS). Pthreads are OS-level threads. Meaning, the host operating system is responsible for supervision and scheduling. While this might suggest that Python threads can be scheduled by the OS to run in parallel, in reality, a multithreaded Python program will never truly be parallel. In Python, the GIL ensures that one and only one thread can be executed at a time.
Threads can be classified into one of two categories:
- CPU bound: These use the CPU intensively.
- I/O bound: These frequently get blocked because of an I/O operation, leaving the CPU idle.
It’s important to know what kind of threads your program has, as they will determine which concurrency approach should be used. The three possible approaches are illustrated in the table below:
Threads can be classified into one of two categories:
- CPU bound: These use the CPU intensively.
- I/O bound: These frequently get blocked because of an I/O operation, leaving the CPU idle.
It’s important to know what kind of threads your program has, as they will determine which concurrency approach should be used. The three possible approaches are illustrated in the table below:
Approach |
Python package |
Better when bound to: |
Parallel? |
Threading
|
threading
|
I/O
|
No
|
Multiprocessing
|
multiprocessing
|
CPU
|
Yes
|
Asynchronous
|
asyncio
|
I/O
|
No
|
The Global Interpreter Lock
In a multithreaded program, threads share the same memory space. When more than one thread tries to modify resources, it’s important to guarantee consistency and exclusive access to those resources.
Thread-Safe Code
In a thread-safe program, threads can access the same data structures securely because a synchronization mechanism always keeps data structures in a consistent state. The mechanism Python internally uses to support this synchronization for multithreaded programs is the global interpreter lock (GIL). The GIL’s protection occurs at the interpreter-state level. With the GIL in place, for instance, the integration of non-thread-safe C extension is easier because you can explicitly acquire and release the GIL from the C code, thus making your extension thread-safe at the Python level.
Similarly, the GIL also offers protection for internal C data structures that are heavily used in Python’s memory management. The memory management strategy used in Python requires protection against race conditions, memory leaks, and incorrect release objects. This protection is guaranteed through a mutex (a component of the GIL), which prevents threads from modifying shared data structures incorrectly.
The GIL’s job is to keep internal data structures synced and consistent across all threads sharing the same memory space. The presence of the GIL does not make your Python code thread-safe per se. The GIL does not guarantee consistency for high-level Python objects, like shared instances of a database connection. Neither does it guarantee consistency for instructions like a compound assignment:
x = x + 1
The line above is not atomic, thus, it could be interrupted midway if the thread running it drops the GIL (or if it is forced to do so). If another thread with the GIL modifies the x variable, you will likely get a race condition. In general, only atomic instructions are guaranteed to be thread-safe. For non-atomic instructions, you will need to use a lock (or any other synchronization mechanism), which makes a thread have exclusive access to shared resources at the Python level, making other threads wait until the lock is released.
How Does the GIL Work?
The implementation of the GIL can be found in the ceval_gil and pycore_gil C source files. Let’s dive into its internals.
The GIL’s Structure
In the source code, the GIL is defined as “a boolean variable (locked) whose access is protected by a mutex (gil_mutex), and whose changes are signalled by a condition variable (gil_cond).” That mutex lock can be seen in the following line:
MUTEX_LOCK(gil->mutex); Unapplying the mutex looks like this: MUTEX_UNLOCK(gil->mutex); The GIL’s state is defined in the following struct: struct _gil_runtime_state { unsigned long interval; _Py_atomic_address last_holder; _Py_atomic_int locked; unsigned long switch_number; PyCOND_T cond; PyMUTEX_T mutex; ... };
Pay special attention to the members locked, cond, and mutex, which are heavily used in all GIL-related operations within the cval_gil module—especially the take_gil and drop_gil functions.
The GIL’s Strategy
How the GIL is taken and dropped is also explained in the source code:
“A thread wanting to take the GIL will first let pass a given amount of time (interval
microseconds) before setting gil_drop_request.”
When a thread wants to be run, it needs to take the GIL. I/O operations cause the GIL to be dropped so that another thread can be executed. This is called cooperative multitasking. If the running thread does not release the GIL, then it can be signaled to drop it after some interval of microseconds. This is called preemptive multitasking. This mechanism is very important because some CPU-bound threads can abuse the GIL’s possession.
Within the take_gil function, the interval waiting time is implemented with the following function call:
unsigned long interval = (gil->interval >= 1 ? gil->interval : 1); int timed_out = 0; COND_TIMED_WAIT(gil->cond, gil->mutex, interval, timed_out); The signal to drop the GIL (also a function call) looks like this: if (timed_out && _Py_atomic_load_relaxed(&gil->locked) && gil->switch_number == saved_switchnum) { if (tstate_must_exit(tstate)) { MUTEX_UNLOCK(gil->mutex); PyThread_exit_thread(); } assert(is_tstate_valid(tstate)); SET_GIL_DROP_REQUEST(interp); }
Take a look at the implementation of these functions for further details.
The GIL in Action
To see how the GIL is acquired and released by a thread, imagine you have a piece of code in a thread containing the following:
>>> from time import sleep >>> sleep(2) Internally, this translates to a C function call (check the timemodule.c for more details): static int pysleep(_PyTime_t secs) This contains the following section: *** Py_BEGIN_ALLOW_THREADS Sleep(ul_millis); Py_END_ALLOW_THREADS *** As you can see, surrounding the C call to Sleep, there are two macros: Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS. The first macro calls the function PyEval_SaveThread(void), which internally drops the GIL: PyThreadState * PyEval_SaveThread(void) { ... #ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS assert(gil_created(&ceval2->gil)); #else assert(gil_created(&ceval->gil)); #endif drop_gil(ceval, ceval2, tstate); return tstate; }
When this function ends, the GIL is dropped, and another thread acquires it. Once the C Sleep function finishes, the second macro, Py_END_ALLOW_THREADS, will run. As a result, a call to PyEval_RestoreThread will be made, and the GIL will get retaken. This allows the original thread to continue running:
Void PyEval_RestoreThread(PyThreadState *tstate) { _Py_EnsureTstateNotNULL(tstate); take_gil(tstate); struct _gilstate_runtime_state *gilstate = &tstate->interp->runtime->gilstate; _PyThreadState_Swap(gilstate, tstate); }
The GIL’s Benefits
The GIL offers the following benefits:
- It makes non-thread-safe C extensions and libraries easier to integrate into the Python ecosystem.
- In multithreaded programs, the GIL makes the garbage collector cohesive with the reference counting mechanism.
- Single-threaded programs are very performant.
Performance Issues
As previously mentioned, any running thread needs to acquire the GIL. It also needs to drop and reacquire it, and, while all this happens, other threads need to be signaled, scheduled, run, etc. It takes time for all of these logistics to be performed by the interpreter. As a result, your Python program will suffer from this threading management overhead.
In addition, for applications with CPU-bound threads, the GIL will make the system behave like a single-threaded program. If you’re not aware of the GIL, your multithreaded Python program may be even slower than a single-threaded version of it.
Let’s look at a common misconception. Imagine you have the following piece of code running in production:
# set_doc (target_coll, doc_id, data) # use multithreading to run jobs in parallel for faster execution job = threading.Thread(target=set_doc, args=(target_coll, doc_id, data)).start() … job.join() The function set_doc is defined as follows: def set_doc(target_coll, doc_id, data): data[‘timestamp’] = firestore.SERVER_TIMESTAMP db.collection(target_coll).document(doc_id).set(data)
Notice that there is a comment in the code stating that parallel execution of the function set_doc will happen. This is not true! As mentioned before, in a Python multithreaded application (regardless of the thread’s nature, I/O or CPU-bound), no such parallel execution of a thread will happen. This program is not applying parallelism, but concurrency in the form of cooperative multitasking. This misleading comment may not look like a big issue, but what if other modules around this code were built based on the assumption that this piece would run in parallel? This incorrect assumption can lead to all sorts of problems, such as a wrong assignment for the number of virtual machines needed for request processing.
Overcoming the GIL’s Limitations
Most Python applications don’t require you to bypass the GIL. Remember, the GIL is your friend. It makes Python single-threaded applications efficient. Before trying to bypass the GIL, get to know the nature of your app. If your app is very I/O-intense, a multithreading approach with the GIL in place will probably work well. Alternatively, if you need more granular control for I/O constructs and want to have everything on a single thread, you can try using asyncio, which offers an API for cooperative multitasking.
If you still need to get around the GIL—perhaps because your application is CPU bound—there are a few strategies you can follow, described below.
Parallelism
You can use Python’s multiprocessing package for spawning subprocesses instead of threads. Those processes can be scheduled by the operating system for execution on different CPUs at the same time, making your software effectively parallel. Because you can manage those subprocesses from one Python parent’s process, one program will be, to some extent, in control of processes that are being run on a different CPU.
This approach has one major drawback: Processes need more memory space, and, therefore, their context switch is more expensive. In other words, creating new processes takes more time and more memory resources. Use this approach wisely; parallelism will not in itself always make your application faster. It usually works well when you have CPU-bound applications.
Alternative Python Implementations
The canonical implementation of Python (CPython) comes with a GIL. But, there are other implementations written in different languages that use their inherited multithreading models, meaning, they don’t come with a GIL. To mention a few of these cases, we have IronPython (.Net implementation) and Jython (Java implementation).
If your application’s scope is covered by these derived implementations, then you may need to consider this more radical approach. Keep in mind that the updates on these derived versions usually lag behind the canonical version.
It is also worth mentioning that the PyPy project (Python implementation written in Python) has tried in the past to remove the GIL. If one of these initiatives happens to work, this implementation may be the recommended way to go for bypassing the GIL. However, keep in mind that PyPy’s standard library is not as big as the one in CPython.
Conclusion
This article has examined how Python’s GIL works and what problem it solves. The GIL has its benefits, but they do come with potential performance impacts.
Multithreading in Python is a very mature feature, and, in the context of an I/O-intensive application, it might be a great approach. Use it with confidence, and, if it does not meet your needs, consider parallelism via multiprocessing. Another option is to take advantage of current cloud and third-party solutions. One of these is Intel Tiber App-Level Optimization, a tool that allows you to scale your app without any code changes.