3. Method
3.1. Data and Variables
This analysis incorporates operational, financial, and public health data to evaluate the performance of the U.S. airline industry and its responsiveness to external disruptions, particularly the COVID-19 pandemic. The dataset spans from 2015 to 2024, which allows for the segmentation of the timeline into three distinct phases: the pre-COVID period (2015–2019), the COVID period (2020–2021), and the post-COVID recovery period (2022–2024). Aligning and mapping the data by year and quarter enables consistent time-series comparisons across all datasets and ensures meaningful interpretation of changes over time.
The selection of variables is grounded in both domain knowledge and prior literature on aviation operations and crisis response. Operational data are sourced from the Bureau of Transportation Statistics (BTS) and include metrics aggregated by year, quarter, and airline carrier. These metrics provide a detailed view of the scale and scope of airline operations across different periods. Specifically, total payload was selected as a key measure of cargo activity, which became particularly important when passenger demand declined and airlines shifted focus to freight services. Total seats and total passengers represent supply and demand, respectively, and are foundational indicators of airline capacity utilization and consumer behavior. Total freight extends the payload variable by quantifying cargo in volume terms, while total distance and total air time offer measures of network reach and operational intensity. Together, these variables capture both the physical extent and functional load of airline operations, enabling nuanced insights into how service strategies evolved during periods of disruption and recovery.
Financial data, also from BTS, are summarized by year, quarter, and carrier to capture the economic health and resilience of the airline industry. The key financial variables were chosen to reflect core dimensions of liquidity, solvency, and leverage—each critical for evaluating how carriers managed financial risks. Total cash indicates immediate liquidity and operational flexibility in crisis conditions. Total assets provide a snapshot of firm size and capital investments, while total current liabilities reflect short-term financial obligations. Total long-term debt was included to assess the extent of financial restructuring and reliance on external financing as a coping mechanism. These financial indicators collectively allow for a multi-faceted assessment of fiscal stability across different pandemic phases.
To measure the progression and external impact of the COVID-19 pandemic, the analysis incorporates public health data from Worldometers. This source provides daily and cumulative COVID-19 case counts, which are aggregated and aligned by quarter to match the airline datasets. The inclusion of pandemic severity metrics is critical for capturing exogenous shocks and for establishing a temporal linkage between epidemiological trends and shifts in airline performance. Case count data serve as proxies for market uncertainty, regulatory disruption, and traveler behavior changes, all of which directly influence both operational decisions and financial resilience.
One notable limitation in the variable selection process was the absence of consistent, publicly available revenue data across the full study period. Revenue would have served as a direct indicator of airline performance and market response, and its exclusion represents a gap that future research should aim to address. Integrating revenue metrics would offer an even more comprehensive view of financial outcomes and strengthen the ability to assess profitability in relation to operational changes.
3.2. Modeling Approach
To investigate how operational and financial variables influenced the U.S. airline industry’s performance during the COVID-19 pandemic, this study employed the XGBoost algorithm. XGBoost, a gradient-boosted decision tree method, is well known for its ability to handle large datasets, nonlinear relationships, and complex feature interactions. Its efficiency and predictive accuracy make it particularly suitable for time-series data impacted by external shocks like a global pandemic. In this analysis, separate XGBoost models were developed for each of four financial targets: total cash, total assets, total liabilities, and total debt. By isolating each financial outcome, the study was able to assess how specific operational variables uniquely influenced different aspects of financial health across pandemic phases.
3.3. Data Preprocessing
Before training the models, a comprehensive data preprocessing pipeline was established to enhance data quality, reduce noise, and optimize input consistency across all time periods and features. The initial step involved handling missing values, which were imputed using the median for each variable rather than the mean. This choice preserved the central tendency of the data while reducing the influence of outliers, ensuring robust imputation in the presence of skewed financial and operational distributions.
To control for irregularities, outlier treatment was applied through percentile-based capping. Specifically, values above the 99th percentile were clipped to the threshold to prevent rare but extreme values—such as exceptionally high debt loads or anomalous cargo volumes—from distorting model behavior. This approach retained the underlying data variability while minimizing the risk of overfitting to infrequent outliers.
In addition to cleaning the raw data, feature engineering was performed to expand the informational depth of the dataset. Interaction terms, such as passenger volume multiplied by air time, were created to represent operational intensity, while lagged features were added to capture temporal dependencies across financial quarters. These engineered variables allowed the models to learn from both concurrent and trailing patterns in airline performance, which are especially relevant when modeling over a crisis timeline like the COVID-19 pandemic.
A key transformation applied during preprocessing was the logarithmic transformation of select skewed variables, particularly those with long-tailed distributions such as total cash, freight volume, and long-term debt. This transformation was used to normalize the data, reduce heteroscedasticity, and improve the linear separability of features—ultimately enhancing the model’s ability to capture underlying patterns without being biased by scale differences. Log transformation also contributed to better interpretability by expressing financial and operational magnitudes in relative rather than absolute terms, which is often more meaningful when analyzing growth rates or proportional changes.
Lastly, all continuous variables were standardized using z-score normalization, aligning them to a common scale with zero mean and unit variance. This step was essential to ensure that features contributed equally to model training and that the regularization mechanisms in XGBoost operated consistently across variables of differing original magnitudes.
Together, these preprocessing steps established a reliable, well-structured input dataset that enabled the machine learning models to generalize effectively across time periods and capture complex relationships between airline operations and financial health.
3.4. Train-Test Splitting and Evaluation
The dataset was divided into an 80 percent training set and a 20 percent testing set to evaluate model performance on unseen data. To ensure consistency and reproducibility across experiments, a fixed random seed was applied during the data splitting process and throughout model training, including during hyperparameter tuning and cross-validation. This helped maintain uniform data partitioning and consistent results across different runs, reducing variability introduced by random sampling. Two performance metrics were used to assess model accuracy: Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). RMSE is sensitive to large errors and provides a measure of overall prediction deviation, while MAE offers a straightforward interpretation of average prediction error. Together, these metrics provided a balanced evaluation of the models’ effectiveness in predicting financial outcomes based on operational inputs, while the use of controlled randomness enhanced the credibility and repeatability of the analysis.
3.5. Hyperparameter Tuning
To optimize model performance and ensure generalizability across varying financial prediction tasks, a thorough and systematic hyperparameter optimization strategy was implemented. Rather than relying on fixed values or manual trial-and-error tuning, the analysis employed a randomized search framework using Optuna, a state-of-the-art optimization library that automates the exploration of hyperparameter spaces. This approach allowed for efficient sampling and evaluation of numerous parameter combinations while prioritizing configurations that reduced validation error.
The search focused on tuning several core hyperparameters critical to controlling model complexity, learning dynamics, and regularization. One of the most influential parameters was the learning rate (), which governs how aggressively the model updates weights with each boosting iteration. Smaller learning rates enable more gradual and refined learning, while higher values accelerate convergence at the risk of overfitting. Tuning this parameter allowed the models to balance speed and precision based on the complexity of each financial target.
The maximum tree depth was also varied to control how detailed each individual decision tree could become. Deeper trees capture more intricate patterns and interactions, but can lead to overfitting if not appropriately constrained. In contrast, shallower trees tend to generalize better but may underfit complex relationships. By tuning this parameter independently for each financial target, the models adapted to the degree of complexity required to explain the relationships between operational inputs and financial outcomes.
Additional structural parameters included the number of estimators, which defines how many trees are used in the ensemble, and subsampling ratios—specifically, (the proportion of data samples used per tree) and (the proportion of features considered when splitting nodes). These sampling parameters introduce randomness into the training process, which helps reduce variance, guard against overfitting, and improve robustness on unseen data.
The optimization process also incorporated regularization terms, including (L2 regularization), (L1 regularization), and (minimum loss reduction required to make a further partition on a leaf node). These parameters penalize overly complex trees and help control model flexibility, especially in datasets that include noisy or collinear features.
The randomized search was guided by cross-validation, where the dataset was split into training and validation folds to assess performance across different subsets. The objective function minimized during optimization was the root mean squared error (RMSE) on the validation set, ensuring that the selected configuration balanced accuracy and generalizability. Each model—corresponding to one of the four financial targets—was independently optimized, and as a result, different hyperparameter sets were chosen for each, reflecting the distinct predictive complexity of outcomes like total cash, liabilities, assets, and debt.
By combining randomized search, regularization, and cross-validation, the hyperparameter tuning process ensured that the final XGBoost models were both expressive and stable, capable of capturing nuanced financial patterns across different operational conditions and temporal phases.
3.6. Interpretability
To enhance model transparency and explain the drivers of financial outcomes, SHAP (SHapley Additive exPlanations) values were computed for each trained model. SHAP values offer a consistent, game-theoretic approach to interpreting the contribution of each input feature to individual predictions. This interpretability framework allowed for a clear understanding of which operational metrics most influenced financial performance across different time periods. For example, the analysis revealed that freight-related features gained significant importance during the COVID period, highlighting the industry’s strategic pivot toward cargo services when passenger travel was restricted. The use of SHAP values not only improved the explainability of the results but also supported data-driven insights into how operational strategies shifted in response to evolving pandemic conditions.
3.7. Analysis
The results of the XGBoost modeling process, trained separately on datasets from the COVID period (2020–2022) and the Non-COVID period (2015–2019), reveal a substantial shift in the relative importance of operational features in predicting airline financial performance, particularly with respect to Total_Cash. These changes reflect the significant transformation in the airline industry’s operating model brought on by the pandemic.
During the Non-COVID period, Total_Distance—which represents the cumulative mileage flown—was by far the most important predictor of cash holdings, contributing an overwhelming 73.8 percent to the model’s total gain. This suggests that prior to the pandemic, the extent of an airline’s operational reach was closely associated with its financial liquidity, likely due to stable passenger traffic and predictable revenue flows tied to long-haul and high-frequency routes.
In contrast, during the COVID period, the relative importance of Total_Distance dropped precipitously to just 7.5 percent, as route networks were reduced, long-haul travel was minimized, and demand for travel became highly volatile. In place of this previously dominant feature, Total_Freight emerged as the most critical variable, contributing 51.1 percent to model gain. This dramatic shift reflects the industry’s strategic pivot toward cargo operations as a primary source of revenue amidst a collapse in passenger demand. Passenger aircraft were converted into freighters, and freight logistics became a lifeline for maintaining operations and generating cash flow.
Similarly, Total_Payload—a measure that includes both passenger and cargo weight—also saw a significant increase in importance, accounting for 31.6 percent of the predictive power during the COVID period, compared to a much smaller role in the Non-COVID model. This reinforces the centrality of transported goods, rather than people, in driving liquidity during the crisis.
In contrast, features previously tied to passenger operations lost much of their predictive value. Total_Seats, which had moderate influence pre-COVID as a proxy for capacity and revenue potential, became virtually irrelevant during the pandemic, likely due to large-scale grounding of aircraft and plummeting load factors. The decoupling of seating capacity from financial performance underscores how drastically the business model shifted in response to pandemic conditions.
These changes, illustrated in
Figure 1 and
Figure 2, underscore the extent to which COVID-19 redefined operational priorities. They provide strong evidence of the industry’s adaptation toward cargo-centric revenue models and its temporary departure from the traditional passenger-based business structure.
To support these insights, a robust machine learning pipeline was developed using the XGBoost algorithm to model and predict four key financial targets: Total_Cash, Total_Assets, Total_Liabilities, and Total_Debt. Operational features including payload, seats, passengers, freight, air time, and distance served as input variables. Prior to modeling, extensive preprocessing was conducted to address data quality issues—missing values were imputed using the median, and extreme outliers were capped to reduce the influence of anomalies.
A total of 400 models were trained across various configurations, utilizing 20 different hyperparameter sets combined with 5 random seeds for cross-validation. This approach ensured that the results were not artifacts of particular model settings or random data partitions but reflected generalizable patterns in the underlying data.
Model evaluation was performed using both Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Among the four financial targets, predictions for Total_Cash were the most accurate, with a median RMSE of 1,931,776 and MAE of 852,543.9. This suggests that cash reserves, as a short-term financial metric, were most directly influenced by operational variables, particularly freight-related ones during the pandemic.
Conversely, the model struggled more with long-term balance sheet variables. Predictions for Total_Assets resulted in the highest error, with a median RMSE of 8,411,070 and MAE of 4,303,871.3, indicating that asset composition and valuation are likely influenced by factors beyond day-to-day operational metrics. Similarly, Total_Liabilities and Total_Debt were difficult to predict with high precision (RMSE values of 3,957,290 and 4,445,529, respectively), perhaps reflecting the role of strategic financing decisions, credit terms, and external economic pressures.
Feature importance values were extracted from each model to enhance interpretability and allow for further investigation into the operational drivers of financial outcomes. These results, along with the final model outputs, were systematically aggregated and visualized (
Figure 3) to support a comprehensive understanding of the relationships uncovered by the models.
Overall, this modeling framework provides a scalable and interpretable approach to financial forecasting within the aviation sector. The findings highlight both the potential of machine learning techniques for economic analysis and the importance of adjusting operational strategies to respond to external shocks such as global pandemics. Future work may benefit from incorporating macroeconomic indicators, airline-specific strategic decisions, or regional policy data to further enhance predictive accuracy and contextual understanding.