Back to Blog

Intel Granulate blogs: Spark

All AI AWS Azure Big Data Cloudera Containers Continuous Profiling Data Center Databricks GCP Golang Java Kafka Kubernetes Partners Python Spark Sustainability Technology

Python Big Data Spark

Understanding PySpark: Features, Ecosystem, and Optimization

PySpark is a Python library for Apache Spark that allows users to interface with Spark using Python

Big Data Spark

Hadoop vs. Spark: 5 Key Differences and Using Them Together

The Hadoop platform is an open source system that allows storing and processing larger data sets on a cloud base. Apache Spark is an open source...

Python Big Data Spark

5 PySpark Optimization Techniques You Should Know

Apache PySpark is the Python API for Apache Spark, an open-source, distributed computing system that is designed for high-speed processing of...

Big Data Spark

Apache Spark: Architecture, Best Practices, and Alternatives

Apache Spark is an analytics engine that rapidly performs processing tasks on large datasets. It can distribute data processing tasks on...

Spark on AWS: How It Works and 4 Ways to Improve Performance

AWS Big Data Spark

Spark on AWS: Amazon EMR Features & Creating Your First Cluster

Apache Spark is an open source, distributed data processing system for big data applications. It enables fast data analysis using in-memory...

Big Data Spark

Introduction To Apache Spark Performance

In this article, we first present Spark’s fundamentals, including its architecture, components, and execution mode, as well as APIs.