Intel Granulate blogs: Big Data

AWS EMR Cluster: Viewing, Managing, and Scaling Your Clusters
AWS EMR Clusters and Nodes
Amazon EMR lets you leverage a group of Amazon Elastic Compute Cloud (EC2) instances as a cluster for rapid processing and analysis.
Spark on AWS: How It Works and 4 Ways to Improve Performance
Spark on AWS: Amazon EMR Features & Creating Your First Cluster
Apache Spark is an open source, distributed data processing system for big data applications. It enables fast data analysis using in-memory...
What Is AWS EMR and 5 Critical Best Practices
What Is AWS EMR and 5 Critical Best Practices
AWS EMR processes data across a Hadoop cluster of virtual servers on Amazon Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2)
Running Hadoop on AWS: The Basics and 5 Tips for Success
Running Hadoop on AWS: The Basics and 5 Tips for Success
You can run Apache Hadoop on AWS using Amazon EMR, a managed service for processing and analyzing large datasets.
Optimizing Kafka Hero
Optimizing Kafka Performance
In this blog, we cover everything you need to know about optimizing Kafka performance to make sure latency remains low and throughput high.
ETL pipelines
Introduction to ETL pipelines
ETL, or extract, transform, and load, is the process of taking data from one source, transforming it, and then loading it into a destination
spark performance
Introduction To Apache Spark Performance
In this article, we first present Spark’s fundamentals, including its architecture, components, and execution mode, as well as APIs.