Optimizing Spark and PySpark for Improved Big Data Performance

Continuous and autonomous optimization for Spark and PySpark workloads empowering more efficient data engineering, data science and machine learning

One optimization solution for all Spark use cases

Achieve faster, more efficient Spark applications with Intel Granulate

Batch/streaming data

Data science at scale

SQL analytics

Machine learning

Integrates with all major data storage and infrastructure

Complete more Spark jobs in less time

Intel Granulate allows data science, data engineering and data analysis teams to improve Spark and PySpark performance
Spark dynamic allocation

Optimized dynamic allocation and removal of executors based on the job patterns and predictive idle heuristics

JVM execution for Spark

JNI overhead reduction, execution control flow and reflection overhead optimization

Memory arenas optimization

Release of memory space and object sizes to reduce allocation overhead

Crypto & compression acceleration

Leveraging Crypto architecture, accelerators, and instruction sets for operations

Python optimization for PySpark

Automatic profile guided inlining of hot-path functions and optimized code based on each node processor architecture and generation in the cluster

“We saw the effect on the costs right away. After implementing Granulate we saw a 50% memory reduction and a 20% CPU reduction, which eventually translated to an 18% cost reduction… Looking forward, we’re going to install Granulate on all the other apps that we have”

Uri Harduf

DevOps Group Manager

They got to the core of their Spark applications

Python Spark EKS
50 %
memory reduction
20 %
cost reduction
15 %
CPU improvement
View case study
Big Data Spark EMR

33 %
reduction in cores
45 %
clusters optimized
100 %
EMR fleet optimization
View case study
EKS Big Data Spark
40 %
cost reduction
15 %
Spark time reduction
35 %
CPU reduction
View case study