The American Airlines Journey to Contain Data Lakes Costs

Based on a panel discussion between Intel and American Airlines at Microsoft Ignite 2023.

The travel industry has had many ups and downs over the past 25 years. With fluctuating customer trends, fuel prices, government regulations, and many more factors, the growing cost of data management has only exacerbated an already volatile vertical.

Now that we’re in the midst of an AI revolution, there are great opportunities for application of these technologies in the travel industry, but they come with a cost. With these emerging needs for data streaming, processing and storage, American Airlines has found themselves confronted with increasingly costly data center infrastructure expenditure.

Over the past few years, American Airlines has taken the initiative to capitalize on emerging technologies like deep learning, cognitive computing and especially an open approach to AI. This enables much greater flexibility in terms of both development and deployment, not to mention leveraging optimizations from the open-source community.

A Peek Into the Infrastructure Black Box

American Airlines relies on many computing environments, even still relying on a mainframe as well as traditional on-premises client-server architecture. Up until just a couple of years ago, their Big Data operations were predominantly housed on physical infrastructure, having not yet ramped up their cloud migration. This reliance on bare metal infrastructure resulted in data silos, impairing collaboration and communication.

Databricks Optimization Guide Download Blog CTA

Over the past couple of years, their Cloud Enablement Program (CEP) established the basic groundwork for Cloud Center of Excellence (CCOE), a cross-functional team that leads and governs cloud computing strategy, implementation, and management. This entity drives best practices and ensures consistency in cloud usage across the organization, with a focus on cloud platform Innovation, Community of Practice, Governance and cost optimization.

Their main goal is to promote engineering excellence, improve operational resiliency, maximize cloud efficiency and promote reusable cloud patterns with stronger collaboration and automation.

Data Costs Take Off at American Airlines

With these goals in place and a team dedicated to pursuing these objectives, American Airlines began their migration to Microsoft Azure. Initially, there was an effort to adopt the cloud, but they recently accelerated this initiative to bring even more innovation to their cloud and engineering platforms.

Beyond the general goals laid out by the CCOE, their mission was to implement the cloud for increased agility. For their Big Data applications, this meant expanding their usage of Data Lake on Databricks. They found Databricks to be an excellent tool for simplifying the management of data workloads, breaking down data silos across the organization. Their operations, commercial, and other teams were able to seamlessly share technology, strategies, resources and data.

However, this also triggered spiraling cloud spend. Despite the advantages offered by Databricks, there were accompanying challenges. The data management platform proved to be one of American AIrlines’ most complex and fastest-growing workloads. Even when utilizing Photon, the price tag remained untenable, so they were looking for additional solutions to bring the costs down.

Discovering and Deploying Autonomous Optimization

The cloud innovation team at American Airlines learned about Intel Tiber App-Level Optimization through their relationship with Intel, and soon put it to the test on their Data Lake platform, consisting of their most challenging workloads.

Intrigued by the promise of autonomous and continuous optimization with no code changes or development efforts required, American Airlines began by deploying Intel Tiber App-Level Optimization on five clusters and began to see performance improvements right away. Intel Tiber App-Level Optimization’s agent was deployed incrementally, activating optimizations and sending benchmarks for each cluster. The results were benchmarked batch by batch and pretty soon they were optimizing up to 120 clusters per week.

Ultimately, the results were compelling enough to roll Intel Tiber App-Level Optimization out to the entire Databricks environment. Rather than requiring each siloed team to make major modifications to their clusters, they were instead required to simply deploy the optimization solution. This was a less intimidating prospect, because Intel Tiber App-Level Optimization’s sAgent only requires one init script for deployment and is enterprise-ready to support their scale, security and processes.

First Class Databricks Efficiency With Autonomous Application Performance Optimization

The autonomous optimization solution has a suite of tools at its disposal to optimize any type of Linux-based workload, including a code profiler, a Kubernetes rightsizing tool and a runtime optimization solution. Recently, Intel Tiber App-Level Optimization has added a number of Big Data, and more specifically Databricks, optimization methods to its growing roster of features.

Intel Tiber App-Level Optimization provides Spark and MapReduce optimization to increase density at lower costs by enabling dynamic Spark executor scheduling and YARN resource allocation with continuous optimization and node-level granularity of containers’ CPU and memory, and allocation and preemption of those containers. Specifically for Databricks, Dataproc and EMR, the solution optimizes managed scaling with something we call the gAutoscaler. Incorporating runtime optimization into this framework takes the performance even further, leading to industry leading cost efficiency.

Having found this one-two punch of gAutoscaler, the enhanced autoscaler for Data Lake orchestration, and the added layer of runtime optimization, Intel Tiber App-Level Optimization was so impactful that American Airlines decided to deactivate Photon, as it was proving to be extraneous. Intel Tiber App-Level Optimization is previous initiative agnostic, providing value on top of any other optimization solutions that might be running. This means that American Airlines can choose to tap into that additional Photon optimization in the future because the gAutoscaler can also respond to Photon’s adjustments.

Performance Improvements Lead to Engineer Enablement and 23% Cost Reductions

With the implementation of Intel Tiber App-Level Optimization, American Airlines reduced the number of utilized nodes and got more headroom, because each Data Lake workspace has a limited number of node connections allowed. This freed up the engineering teams to process and analyze data at the pace and scale that they needed, allowing them to use Data Lake as the platform was meant to be used. These results gave their data engineering team additional confidence to continue adopting the cloud, expand their compute capabilities, and become even more data-driven.

American Airlines Data Lakes job cluster, running with 37% fewer resources after implementing Intel Tiber App-Level Optimization.

Additionally, job completion times and cluster uptimes were reduced. These performance improvements led to cost reduction of 23% across all purpose clusters, job clusters and delta live tables. These cost and performance benefits were delivered autonomously without manual intervention.

With Intel Tiber App-Level Optimization implemented across the Databricks fleet, American Airlines is now looking to optimize the Intel Software solution in more services in their data center and Kubernetes environments on AKS.

The American Airlines Journey to Contain Data Lakes Costs

Keren Shmuely

Marketing Director, Intel Granulate

A Peek Into the Infrastructure Black Box

Data Costs Take Off at American Airlines

Discovering and Deploying Autonomous Optimization

First Class Databricks Efficiency With Autonomous Application Performance Optimization

Performance Improvements Lead to Engineer Enablement and 23% Cost Reductions

Save on cloud costs.