Intel Granulate blogs: Spark

Understanding PySpark: Features, Ecosystem, and Optimization
Understanding PySpark: Features, Ecosystem, and Optimization
PySpark is a Python library for Apache Spark that allows users to interface with Spark using Python
Hadoop vs. Spark: 5 Key Differences and Using Them Together
Hadoop vs. Spark: 5 Key Differences and Using Them Together
The Hadoop platform is an open source system that allows storing and processing larger data sets on a cloud base. Apache Spark is an open source...
5 PySpark Optimization Techniques You Should Know
5 PySpark Optimization Techniques You Should Know
Apache PySpark is the Python API for Apache Spark, an open-source, distributed computing system that is designed for high-speed processing of...
Apache Spark: Architecture, Best Practices, and Alternatives
Apache Spark: Architecture, Best Practices, and Alternatives
Apache Spark is an analytics engine that rapidly performs processing tasks on large datasets. It can distribute data processing tasks on...
Spark on AWS: How It Works and 4 Ways to Improve Performance
Spark on AWS: Amazon EMR Features & Creating Your First Cluster
Apache Spark is an open source, distributed data processing system for big data applications. It enables fast data analysis using in-memory...
spark performance
Introduction To Apache Spark Performance
In this article, we first present Spark’s fundamentals, including its architecture, components, and execution mode, as well as APIs.