Infrastructure: Apache Spark on Cloudera
Intel IT’s Enterprise Data Platform supports real-time and batch ingestion of structured, semi-structured, and unstructured data. Their data warehouses, together with their data lake, provide data to their visualization and data science platforms. These platforms are used by a variety of dashboards, application frameworks, and custom applications.
As a large IT shop, they continuously look for ways to reduce both operational expenses and infrastructure purchases, while simultaneously improving the user experience. Unfortunately, workload optimization can be disruptive and time consuming. lost optimization efforts require developers to take time away from business applications and product development to rewrite code – and sometimes, the tuning can lead to suboptimal performance.
Given Granulate’s capability to optimize big data platforms with no hands-on development, Intel IT decided to test it in their Cloudera platform. First, an assessor from their Information Security team performed a risk assessment of Granulate. He reviewed several assessments by independent third parties, including Granulate’s SOC 2 Type II Report, the results of Granulate’s penetration testing and remediation actions, and SecurityScorecard’s report. In that report, Granulate earned an “A” rating of 91. The Intel Information Security team assessed that Granulate met or exceeded the pertinent standards.
Intel IT initially deployed the Intel Granulate agent in their Quality Assurance environment to validate no disruption to business operations. The agent ran for about two to three weeks, learning about the platform’s Spark jobs.
Then they deployed the agent to all the nodes in their production environment. Again, the agent learned for about two to three weeks. The agent identified optimization opportunities at the job level behind the scenes, with no human involvement. Intel IT then activated the Intel Granulate agent and moved to the benchmark phase.
Intel IT gathered metrics for seven days. During this time, Intel Granulate dynamically optimized more than 1000 jobs – equivalent to approximately 42,000 executed Spark jobs/applications. The results were compelling: an average 25% reduction in memory utilization and 44% reduction in CPU utilization. Now Intel IT can run more jobs concurrently without adding memory or compute infrastructure, which will help them avoid capital and operational expenses in the future.
Intel Granulate shows the average memory and CPU overcommitment from April 21-27, 2023
Given the outstanding results Intel IT saw with Intel Granulate, they are planning to continue to run it in their Cloudera platform. In addition, they started a new PoC running Intel Granulate in one of their public cloud environments. They are also exploring the use of Intel Granulate in streaming platforms such as Apache Kafka, and log analytics platforms like Elasticsearch.
HQ: Santa Clara, California
Intel IT plays a central role in increasing the value of Intel’s business. The division works at the boundaries of innovation every day, developing data-driven solutions to improve the operations and processes of a global technology leader.
Their deep knowledge and experience as IT professionals enables them to develop and deploy cutting-edge technologies that deliver strong business value to our own and others’ enterprise IT organizations.