Infrastructure: Java & Go on EKS for AWS
Snap Engineering teams build fun and technically sophisticated products that reach hundreds of millions of Snapchatters around the world, every day. They’re deeply committed to the well-being of everyone in their global community, which is why their values are at the root of everything they do. They strive to move fast, with precision, and always execute with privacy at the forefront.
The Snap software engineering team is responsible for designing, implementing, and operating their most critical and scalable services - ranging from user identity services, friend graph, and their core persistence layer. Their work includes understanding product requirements, evaluating trade-offs, and delivering the solutions needed to build innovative products, while applying best practices when it comes to availability, scalability, operational excellence, and cost management.
AWS cloud compute powers their core product offerings, like Snapchat, messaging, photography, and backend analytics. With 397 Million daily active users (DAUs) using Snapchat every day on average, their software engineering team was dealing with large scale usage with constant and dramatic fluctuations in activity.
Operating on Amazon EKS was a logical choice because it facilitates scalable, fault-tolerant, and automated container orchestration, making it ideal for managing high-availability infrastructure.
To minimize wasted resources and unnecessary spending on their Kubernetes clusters, Snap’s engineering team performed every form of optimization they were aware of. These activities included PGO, rightsizing, cloud discount programs, manual code optimization, observability, and tag management. While these optimization techniques were effective, in the spirit of persistent innovation and performance improvement, Snap was continuing to look for more cloud cost reduction opportunities.
Snap identified Intel Granulate as a key player toward their initiative to automate those existing optimization techniques at scale and thereby save on compute costs. The autonomous, continuous app-level optimization also aligned with their primary goal to reduce costs while maintaining existing SLAs.
Intel Granulate met Snap’s requirements for high levels of security adherence, fully supported API capabilities, and for the solution to remain effective while running in parallel to their other initiatives. It was essential for Snap that the solution did not negatively impact response time, required minimal engineering efforts, and could scale to thousands of services — all qualities that Intel Granulate ultimately delivered.
Snap began the process with Intel Granulate’s complimentary continuous profiling assessment. In this stage, Snap ran Intel Granulate’s open-source continuous profiler on several VMs in order to get an upfront analysis of expected performance improvement and cost reduction.
Deployment of Intel Granulate required no service code changes and began on a small number of clusters, in order to prove value and reliability before expanding. After deployment, the agent performed a short, autonomous workload learning phase. Just one week later, Intel Granulate was activated, showing immediate performance improvements leading to capacity reduction and cost savings.
Intel Granulate and Snap engineering teams worked extensively together to adapt the Intel Granulate agent to Snap’s elaborate homegrown Cloud control platform, along with adapting the agent’s capabilities to align with Snap’s CICD process.
Integration into internal cloud control platform
Additional layers of Granulate redundancy baked into Snap’s AWS cloud
Extensive RESTful API to ensure full programmatic control
Extensive observability exposed by Granulate for Snap to have full visibility into Agent health and status
As of September 2023, Intel Granulate has been deployed on 350,000 vCores, providing an average of 13% cost reduction across 32 clusters, with more deployments in the pipeline.
HQ: Santa Monica,
Snap Inc. is a technology company that believes the camera presents the greatest opportunity to improve the way people live and communicate. We contribute to human progress by empowering people to express themselves, live in the moment, learn about the world, and have fun together.
Snapchat, Snap Inc’s flagship product, is used by 750 million people every month to stay in touch with friends, express themselves, and explore the world.