Intel®️ Tiber™️ App-Level Optimization blogs: Big Data

Understanding PySpark: Features, Ecosystem, and Optimization
Understanding PySpark: Features, Ecosystem, and Optimization
PySpark is a Python library for Apache Spark that allows users to interface with Spark using Python
AWS EMR Tutorial: Configuring & Managing Your First Cluster
AWS EMR Tutorial: Configuring and Managing Your First Cluster
Amazon EMR (formerly Amazon Elastic MapReduce) is a managed platform for cluster-based workloads. Learn how to plan, configure, and manage your...
Hadoop vs. Spark: 5 Key Differences and Using Them Together
Hadoop vs. Spark: 5 Key Differences and Using Them Together
The Hadoop platform is an open source system that allows storing and processing larger data sets on a cloud base. Apache Spark is an open source...
5 PySpark Optimization Techniques You Should Know
5 PySpark Optimization Techniques You Should Know
Apache PySpark is the Python API for Apache Spark, an open-source, distributed computing system that is designed for high-speed processing of...
Hadoop: Ultimate Guide for 2023
Hadoop: Basics, Running in the Cloud, Alternatives & Best Practices
Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications.
Apache Spark: Architecture, Best Practices, and Alternatives
Apache Spark: Architecture, Best Practices, and Alternatives
Apache Spark is an analytics engine that rapidly performs processing tasks on large datasets. It can distribute data processing tasks on...
AWS EMR Cluster: Viewing, Managing, and Scaling Your Clusters
AWS EMR Clusters and Nodes
Amazon EMR lets you leverage a group of Amazon Elastic Compute Cloud (EC2) instances as a cluster for rapid processing and analysis.
Spark on AWS: How It Works and 4 Ways to Improve Performance
Spark on AWS: Amazon EMR Features & Creating Your First Cluster
Apache Spark is an open source, distributed data processing system for big data applications. It enables fast data analysis using in-memory...
Running Hadoop on AWS: The Basics and 5 Tips for Success
Running Hadoop on AWS: The Basics and 5 Tips for Success
You can run Apache Hadoop on AWS using Amazon EMR, a managed service for processing and analyzing large datasets.