Infrastructure: Databricks Data Lake
Intel IT’s Enterprise Data Platform supports real-time and batch ingestion of structured, semi-structured, and unstructured data. Their data warehouses, together with their data lake, provide data to their visualization and data science platforms. These platforms are used by a variety of dashboards, application frameworks, and custom applications.
Learning from past experiences, Intel IT’s developers configured jobs to run with the largest memory and compute configurations to handle worst-case scenarios, such as month-end processing with the biggest datasets. However, this led to an over-allocation of resources during daily operations when the datasets were smaller.
After finding success on their Cloudera on-premises platform, where Intel Granulate reduced 25% of memory and 44% of CPU utilization, while offering potential cost savings, Intel IT began testing the optimization solution in their Databricks data lake.
First, an assessor from their Information Security team performed a risk assessment of Intel Granulate. He reviewed several assessments by independent third parties, including the SOC 2 Type II Report, penetration testing results and remediation actions, and SecurityScorecard’s report. In that report, the solution earned an “A” rating of 97. The Intel Information Security team assessed that Intel Granulate met or exceeded the pertinent standards.
Intel Granulate was quickly and securely implemented into Intel IT’s Databricks environment. Installation was followed with two main steps: first, implementation of the optimization agent via workspace initialization script for all workspaces targeted for optimization; the second step included securely sharing a Service Principal token to allow for Intel Granulate’s enhanced autoscaling capabilities to take effect. Once done, a brief learning period was conducted followed by activation.
Upon deployment in their production environment, Intel Granulate learned the environment and identified optimization opportunities at the job level behind the scenes, with no human involvement. Intel IT then activated the optimization solution and moved to the benchmark phase, resulting in the 23% vCore utilization improvement and 17% throughput improvement.
23% reduction in vCore utilization
17% data throughput increase
Once Intel Granulate is active, the core count consistently remains much lower while processing the same jobs present during the passive period. This type of benchmark analysis is delivered to all Intel Granulate clients upon initial activation to measure total improvement.
Intel Granulate autonomously learns application behavior, optimizing memory and CPU usage for each job continuously and without the need for manual intervention by application teams. This means one-time activation/change enables on-going optimization every time jobs are executing. Now, by simply activating Intel Granulate, Intel IT’s platform team can optimize resources and reduce excess allocation during runtime. This is achieved without any commitment of additional resources or code changes from the application teams.
Over the four weeks of testing, Granulate’s ability to enhance and optimize Databricks workload efficiency was clearly demonstrated, allowing more job execution per compute unit. Intel IT has now started a phased deployment across their remaining Databricks environment, with further rollout to follow upon successful measurement of key performance indicators.
HQ: Santa Clara, California
Intel IT plays a central role in increasing the value of Intel’s business. The division works at the boundaries of innovation every day, developing data-driven solutions to improve the operations and processes of a global technology leader.
Their deep knowledge and experience as IT professionals enables them to develop and deploy cutting-edge technologies that deliver strong business value to our own and others’ enterprise IT organizations.