Snap Case study

Snap Inc Achieves 13% Average Cost Reduction on EKS Workloads With Intel Granulate

Download Case Study

Software Engineering at Snap

Infrastructure: Java & Go on EKS for AWS

Snap Engineering teams build fun and technically sophisticated products that reach hundreds of millions of Snapchatters around the world, every day. They’re deeply committed to the well-being of everyone in their global community, which is why their values are at the root of everything they do. They strive to move fast, with precision, and always execute with privacy at the forefront.

The Snap software engineering team is responsible for designing, implementing, and operating their most critical and scalable services - ranging from user identity services, friend graph, and their core persistence layer. Their work includes understanding product requirements, evaluating trade-offs, and delivering the solutions needed to build innovative products, while applying best practices when it comes to availability, scalability, operational excellence, and cost management.

Snapchat’s Kubernetes Optimization Journey

AWS cloud compute powers their core product offerings, like Snapchat, messaging, photography, and backend analytics. With 397 Million daily active users (DAUs) using Snapchat every day on average, their software engineering team was dealing with large scale usage with constant and dramatic fluctuations in activity.

Operating on Amazon EKS was a logical choice because it facilitates scalable, fault-tolerant, and automated container orchestration, making it ideal for managing high-availability infrastructure. 

To minimize wasted resources and unnecessary spending on their Kubernetes clusters, Snap’s engineering team performed every form of optimization they were aware of. These activities included PGO, rightsizing, cloud discount programs, manual code optimization, observability, and tag management. While these optimization techniques were effective, in the spirit of persistent innovation and performance improvement, Snap was continuing to look for more cloud cost reduction opportunities.

Snap identified Intel Granulate as a key player toward their initiative to automate those existing optimization techniques at scale and thereby save on compute costs. The autonomous, continuous app-level optimization also aligned with their primary goal to reduce costs while maintaining existing SLAs.

Intel Granulate met Snap’s requirements for high levels of security adherence, fully supported API capabilities, and for the solution to remain effective while running in parallel to their other initiatives. It was essential for Snap that the solution did not negatively impact response time, required minimal engineering efforts, and could scale to thousands of services — all qualities that Intel Granulate ultimately delivered.

The ease of use and short time to value have been significant motivating factors for Snap to continue expansion of Intel Granulate’s Kubernetes optimization solution to more internal services at a rapid pace.
Tom Brown, Software Engineer at Snap Inc

Results of Intel Granulate’s EKS Optimization 

Snap began the process with Intel Granulate’s complimentary continuous profiling assessment. In this stage, Snap ran Intel Granulate’s open-source continuous profiler on several VMs in order to get an upfront analysis of expected performance improvement and cost reduction.

Deployment of Intel Granulate required no service code changes and began on a small number of clusters, in order to prove value and reliability before expanding. After deployment, the agent performed a short, autonomous workload learning phase. Just one week later, Intel Granulate was activated, showing immediate performance improvements leading to capacity reduction and cost savings.

Intel Granulate and Snap engineering teams worked extensively together to adapt the Intel Granulate agent to Snap’s elaborate homegrown Cloud control platform, along with adapting the agent’s capabilities to align with Snap’s CICD process.

Functionality included:

Integration into internal cloud control platform
Additional layers of Granulate redundancy baked into Snap’s AWS cloud
Extensive RESTful API to ensure full programmatic control
Extensive observability exposed by Granulate for Snap to have full visibility into Agent health and status

As of September 2023, Intel Granulate has been deployed on 350,000 vCores, providing an average of 13% cost reduction across 32 clusters, with more deployments in the pipeline.

21.7% vCore Reduction Cluster Level On Select High Usage Service

25% Reduction Instance Count On Select High Usage Cluster

350K
vCORES OPTIMIZED
13%
AVERAGE COST REDUCTION
21%
AVERAGE CPU UTILIZATION IMPROVEMENT
15
DAYS TIME TO VALUE

Technology
HQ: Santa Monica,
California

Snap Inc. is a technology company that believes the camera presents the greatest opportunity to improve the way people live and communicate. We contribute to human progress by empowering people to express themselves, live in the moment, learn about the world, and have fun together.

Snapchat, Snap Inc’s flagship product, is used by 750 million people every month to stay in touch with friends, express themselves, and explore the world.

Get an evaluation of your workload performance improvements

Talk to expert