How CI/CD is Sidetracking Optimization, and What You Can Do About It
High-velocity code changes are making it impossible to optimize infrastructure. But not all is lost in the battle for improved performance.Read more
Tom AmsterdamFeb 22, 2021
According to the 2020 StackOverflow Developer Survey and the TIOBE index, Go (or Golang) has gained more traction in recent years, especially for backend developers and DevOps teams working on infrastructure automation. This article discusses what makes Go an attractive programming language for developers when it comes to performance.
First, let’s cover some of Go’s high-level properties.
For those new to this programming language, some of the key facts you need to know about Golangs include:
What products have been built using Go? Go is used for products that demand global scale, like the ubiquitous container tools Docker and Kubernetes. The number of platforms currently running on top of Kubernetes says a lot about Go’s capabilities.
In the webspace, Hugo is billed as “the world’s fastest framework for building websites.” (Hugo is a static website generator that can serve pages in under 1 ms.)
And it should be no surprise that Google relies heavily on Go. In 2012, in fact, Rob Pike noted, “Go is a programming language designed by Google to help solve Google’s problems, and Google has big problems.”
As you may have surmised by now, when you need a backend component or service that supports global scaling, Go might be an option to consider. Go’s capabilities match with projects that:
Go has implemented different strategies on verticals like concurrency, system calls, task scheduling, and memory modeling, among others. All these strategies add up to a great balance between speed and robustness. But what makes Go one of the best programming languages when it comes to performance?
Go implements a variant of the CSP model, in which channels are the preferred method for two Goroutines (a user space thread-like, with a few kilobytes in its stack) to share data. This approach is actually the opposite of that frequently used with other languages like Ruby or Python—a global shared data structure, with synchronization primitives for exclusive access (semaphores, locks, queues, etc.). Keeping these global data structures consistent across all units involves a lot of overhead.
By following the CSP model, Go makes it possible to have concurrent constructions as primitives of the language. By default, Go knows how to deal with multiple tasks at once, and knows how to pass data between them. This, of course, translates to low latency with intercommunicating Goroutines. In Go, in the context of multithreading, you don’t write data to common storage. You create Goroutines to share data via channels. And because there is no need for exclusive access to global data structures, you gain speed.
It is important to note that you can also use mutex (or lock) mechanisms in Go, but that isn’t the default approach for a concurrent program.
Go operates under an M:N threading model. In an M:N model, there are units of work under the user space (the Goroutines or G in the scheduler lexicon) which are scheduled to be run by the language runtime on OS threads (or M in the scheduler lexicon) on machine processors (or P in the scheduler lexicon). A Goroutine is defined as a lightweight thread managed by the Go runtime. Different Goroutines (G) can be executed on different OS threads (M), but at any given time, only one OS thread can be run on a CPU (P). In the user space, you achieve concurrency as the Goroutines work cooperatively. In the presence of a blocking operation (network, I/O or system call), another Goroutine can be assigned to the OS thread.
Once the blocking call ends, the runtime will try to reassign the previous Goroutine to an available OS thread. It’s possible to achieve parallelism here, because once the Goroutines are assigned to an OS thread, the OS can decide to distribute its threads’ execution through its multiple cores.
By having multiple Goroutines assigned to OS threads—thus being run cooperatively (or in parallel if two OS threads are run simultaneously on different cores)—you get an efficient use of your machine’s CPUs, because all cores will be available for running your program’s functions.
Goroutines live within the user thread space. In comparison to OS threads, their operations cost less: The overhead for assigning them, suspending them, and resuming them is lower than the overhead required by OS threads. Goroutines and channels are two of the most important primitives Go offers for concurrency. One important aspect of Goroutines is that expressing them in terms of code is fairly easy. You simply put the keyword go before the function you want to schedule to be run outside of the main thread.
But how do Goroutines help make Go more performant? The minimal stack required for a Goroutine to exist is 2 KB. Goroutines can increase their stack on runtime if they see the need for more space, but overall, they are memory-friendly. This means their management overhead is minimal. In other words, you can have more working units being processed with a decent quantity of memory, and that translates into efficiency and speed.
Go comes with its own runtime scheduler. The language does not rely on the native OS thread/process scheduler, but it cooperates with it. Because the scheduler is an independent component, it has the flexibility for implementing optimizations. All these optimizations aim for one thing: to avoid too much preemption of the OS Goroutines, which would result in suspending and resuming the functions’ execution, an expensive operation.
Next, we are going to highlight some specific optimizations done by the scheduler in order to avoid preemption.
Generally, there are two ways to distribute workloads across CPUs. The first one is work sharing, in which busy processors send threads to other, less busy processors with the hope they will be taken and executed. The second method is work stealing, in which an idle processor is constantly looking to steal other processor threads. Go uses work stealing.
How does the work stealing approach help make Go faster? The migration of threads between processors is expensive, as it involves context switch operations. Under the stealing paradigm, this phenomenon occurs less frequently, resulting in less overhead.
The scheduler also implements a particular strategy called spinning threads, which tries to fairly distribute as many OS threads across processors as possible. Go runtime not only reduces the frequency of thread migrations between processors, it is also capable of moving an OS thread with no work assigned to another processor. This can balance CPU usage and power.
When you have all CPUs working with fairly distributed workloads, you are avoiding resource underutilization, which, again, translates to resource efficiency and speed.
What strategy does the Go scheduler follow for handling system calls? It turns out that it also helps reduce overhead overall. Let’s see how.
For system calls expected to be slow, the scheduler applies a pessimistic approach. It makes the OS thread release the processor in which it’s been running, just before the system call. Then, after the system call ends, the scheduler tries to reacquire the processor if it’s available. Otherwise, it’s enqueued by the scheduler until it finds a new available processor. The inconvenience of this approach is the overhead required for dropping and reacquiring a processor.
However, the scheduler uses a second approach for system calls that are known to be fast—an optimistic approach. With this approach, the OS thread running the Goroutine with the system call does not release the processor, but it flags it. Then, after a few microseconds (20 to be precise), another independent special Goroutine (the sysmon Goroutine) checks for all flagged processors. If they are still running the heavy Goroutine that involves the system call, the scheduler takes their processors away, so they’re suspended. If the stolen processor is still available once the system call ends, the Goroutine can continue executing. Otherwise, it will need to be scheduled for execution again (until a processor becomes available).
In this article, we have covered the different strategies the Go language takes with concurrency and task scheduling. Go’s scheduler strategies and its compiler optimizations are what make Go so performant. Go’s balance between speed, robustness, and friendly syntax makes it a great option for specialized networking and web applications.