Preprint
Article

This version is not peer-reviewed.

AI-Enhanced Marketing Mix Modeling: Integrating ML, XAI, and LLMs for Greater Accuracy, Interpretability, and Actionability

Submitted:

31 January 2026

Posted:

02 February 2026

You are already at the latest version

Abstract
This paper proposes an AI-enhanced Marketing Mix Modeling (MMM) framework that integrates machine learning (ML), explainable AI (XAI), and large language models (LLMs) to evaluate marketing effectiveness with improved predictive accuracy, interpretability, and practical applicability. Moving beyond traditional MMM approaches, the framework employs the XGBoost algorithm to capture nonlinear relationships between multichannel marketing investments and business outcomes. SHAP analysis further enhances model interpretability through feature-importance rankings, beeswarm visualizations, and dependence plots that quantify each channel’s marginal contribution. In addition, a Claude-based LLM module translates complex model outputs into natural-language performance insights and budget allocation recommendations, lowering technical barriers and improving decision-making actionability. Experimental results demonstrate that, compared with an OLS baseline, the XGBoost model significantly improves predictive performance, increasing R² from 0.8572 to 0.9123 while reducing RMSE and MAE by 21.7% and 22.8%, respectively. The integrated SHAP and LLM components further elucidate channel impacts and suggest budget optimization strategies, yielding an estimated 18–25% improvement in marketing ROI. Overall, the proposed AI-enhanced MMM framework democratizes advanced marketing analytics by providing businesses with an accessible, automated solution that guides strategic marketing decision-making, enables efficient resource allocation, and improves marketing returns.
Keywords: 
;  ;  ;  

1. Introduction

In the era of digital marketing and multi-channel ecosystems, marketing decisions have become increasingly complex across diverse online channels such as search engines, social media, and e-commerce platforms. Accurately measuring return on investment (ROI) and allocating budgets efficiently among these channels require rigorous, data-driven analytical approaches. Traditional attribution models, including last-click and multi-touch attribution, often fail to capture the true contribution of each channel, especially under growing data-privacy restrictions. Consequently, Marketing Mix Modeling (MMM) has regained importance as a statistical framework for evaluating media effectiveness and guiding budget optimization.
However, conventional MMM approaches based on ordinary least squares (OLS) regression face critical limitations. Their linear assumptions constrain the ability to capture real-world complexities, such as diminishing returns, nonlinear patterns, and cross-channel interactions, leading to biased or oversimplified results. Although machine learning (ML) techniques can accommodate these nonlinear structures, their "black-box" nature and lack of transparency make it difficult for practitioners to interpret results and extract actionable insights.
To address these limitations, this study proposes an AI-enhanced MMM framework that integrates machine learning (ML), explainable AI (XAI), and large language models (LLMs). The framework leverages the XGBoost algorithm to improve predictive accuracy and robustness while employing SHAP (SHapley Additive Explanations) to generate transparent, channel-level interpretations of marketing impact. In addition, a Claude-based LLM module converts complex model outputs into intuitive natural-language explanations and actionable recommendations, enabling non-technical users to effectively interpret analytical results and make informed decisions. This integrated framework bridges the gap between accuracy, interpretability, and usability, enhancing the transparency and scientific rigor of marketing analytics.

2. Literature Review

Marketing mix modeling has evolved significantly since its foundational work by Hanssens et al., who established econometric principles for measuring advertising effectiveness across media channels [1]. Traditional MMM approaches rely predominantly on classical statistical and regression-based frameworks to estimate the relationship between marketing investments and business outcomes [2]. While these methods provide interpretable coefficients and statistical inference, their restrictive linear assumptions limit their ability to capture the complex, nonlinear dynamics characteristic of modern digital marketing ecosystems, including diminishing returns, saturation effects, and cross-channel interactions [3].
Recent advances in machine learning have introduced new possibilities for MMM. Researchers have demonstrated that gradient boosting methods, particularly XGBoost, can substantially improve predictive accuracy in marketing analytics by modeling nonlinear relationships without restrictive parametric assumptions [4]. However, the widespread adoption of ML-based approaches in business contexts has been hindered by their "black-box" nature, which complicates interpretation and reduces stakeholder trust [5].
To address this interpretability gap, explainable AI (XAI) techniques have emerged as critical enablers for transparency in complex models. SHAP values, grounded in cooperative game theory, provide consistent and theoretically sound feature attribution that decomposes model predictions into individual variable contributions [6]. Recent marketing studies have successfully applied SHAP to interpret consumer behavior models and advertising response patterns [7,8].
Despite these methodological advances, a significant gap remains between technical model outputs and managerial decision-making. Large language models (LLMs) offer promising solutions for bridging this gap by automating the translation of complex analytical results into natural-language insights [9,10]. However, few studies have systematically integrated ML, XAI, and LLMs into a unified marketing analytics framework that simultaneously optimizes accuracy, interpretability, and business usability, a gap this research aims to address.

3. Methodology

3.1. Machine Learning: XGBoost-Based MMM

In contrast to the traditional regression-based MMM framework, this study introduces an XGBoost-based MMM model to improve predictive accuracy and robustness. XGBoost is a gradient-boosted decision-tree algorithm that models nonlinear relationships and interaction effects among marketing variables, providing a more realistic representation of advertising phenomena such as diminishing returns, saturation, and cross-channel synergies. One of the key strengths of XGBoost over OLS is its ability to capture nonlinearities in the data. While OLS assumes linearity between input features (e.g., marketing spend) and the target variable (e.g., revenue), real-world marketing data often exhibit diminishing returns, saturation effects, and complex interactions that cannot be captured adequately by linear models. For instance, as marketing spend increases, the return on investment may decrease at a certain point, a relationship that a linear regression model might fail to represent accurately. Additionally, cross-channel interactions, where the effectiveness of one channel might depend on the spend of another, can be more effectively modeled by XGBoost through its decision trees and interaction terms. In this framework, marketing expenditures across channels (e.g., search, social media, and display) serve as input features to predict business response variables such as daily sales or revenue. The boosted ensemble model can be expressed as:
y ^ = t = 1 T f t ( x )
where y ^ denotes the predicted marketing response, x represents the multidimensional vector of marketing inputs, T is the total number of boosted trees, and f t is the t-th regression tree. Each tree is added sequentially to minimize the residual errors of the previous iteration, allowing the model to approximate highly nonlinear and interactive relationships without restrictive parametric assumptions.
XGBoost's ability to model these nonlinearities and interactions makes it particularly suitable for digital marketing data, where the relationship between spend and outcomes is seldom linear. Furthermore, XGBoost performs well in scenarios with high-dimensional data, where there are numerous input features (such as different marketing channels), as it can naturally handle feature selection and regularization. In contrast, OLS often suffers from multicollinearity and overfitting in such high-dimensional settings. For example, in our empirical analysis, we observed that the correlation between spending across different channels, such as Meta and Google, could introduce multicollinearity issues in OLS models. XGBoost mitigates this problem through its tree-based structure and regularization techniques, making it more reliable for such complex data sets. To validate the performance improvement, the XGBoost-based MMM model is compared with a baseline regression-based model through R2, RMSE, MAE, and MAPE. The comparison highlights that XGBoost outperforms OLS particularly in capturing these complex, nonlinear patterns and interactions in the marketing data, providing more accurate predictions and reducing prediction errors.

3.2. Explainable AI: SHAP Method

To enhance the interpretability of the XGBoost-based MMM, this study incorporates SHapley Additive exPlanations (SHAP), a game-theoretic framework that decomposes model predictions into additive contributions from individual marketing spend features. SHAP quantifies how changes in each channel’s spending shift the predicted business outcome relative to a baseline, enabling transparent attribution of media effects within the XGBoost-based nonlinear modeling framework. The model prediction can be expressed as:
f ^ ( x ) = ϕ 0 + j = 1 M ϕ j
where f ^ x represents the predicted business outcome, ϕ 0 is the expected model output across all samples, and ϕ j denotes the Shapley value for marketing spend feature j, capturing its marginal contribution. Conceptually, ϕ j reflects how the expected business outcome changes when channel j’s spending is observed at its actual level x j , compared with when it is replaced by a reference value x j ' :
Φ j = IE x f ( x ) x j IE x f ( x ) x j
By aggregating local Shapley values across all observations, the model reveals global insights into each channel’s importance, nonlinear spend-response patterns, and cross-channel interactions. These relationships are visualized through feature-importance bar plots, beeswarm summaries, and dependence plots, which illustrate how business outcomes respond to different levels of marketing investment across channels.
To demonstrate the real-world application of SHAP analysis, consider a scenario in which a company is deciding whether to increase its budget on Meta Instagram or Google PMax. By using SHAP values, the model can show not just the overall importance of these channels in driving sales but also how sensitive sales are to changes in spending for each channel. For instance, a SHAP value analysis might reveal that while Meta Instagram has the highest contribution to sales, there is a diminishing return for every dollar spent beyond a certain threshold. On the other hand, Google PMax might exhibit a more linear but consistent contribution, where increased spending continues to drive higher sales without the diminishing returns observed in Instagram. Such insights are crucial for making strategic decisions. If the model shows that Instagram is approaching saturation, the company may choose to allocate additional funds to Google PMax to maintain steady growth, rather than over-investing in Instagram and risking inefficient use of resources. This type of actionable insight, derived directly from SHAP values, informs budget allocation decisions by providing clear evidence of where to invest for optimal returns.
In addition to guiding budget allocation, SHAP values can be used to assess the impact of potential adjustments in marketing strategies. For example, if a company wants to explore the potential impact of increasing investment in TikTok, SHAP analysis could indicate whether a small increase in TikTok spend would yield significant improvements in sales, or if the channel has little effect relative to others. This integration of SHAP values into the decision-making process moves beyond theoretical improvements and directly addresses the practical needs of marketing managers, providing them with clear, data-driven insights that are both interpretable and actionable. The resulting explanatory structure also serves as input to the LLM translation module, enabling SHAP outputs to be converted into coherent natural-language explanations that support business interpretation and strategic recommendations.

3.3. Generative AI: LLM-Based Interpretation and Recommendation

To improve the business usability of the machine learning and explainable AI outputs, this study integrates a generative large language model (LLM) to automate the translation of technical results into structured business reports. Implemented through Anthropic's Claude API, the LLM processes the XGBoost-based MMM model outcomes and SHAP visualization plots to generate coherent natural-language summaries tailored for marketing decision-making. Guided by a structured prompt that emulates the reasoning of a senior data scientist, the LLM synthesizes three components: (1) an executive summary of findings; (2) key insights on channel effectiveness; and (3) data-driven recommendations for budget allocation and ROI optimization.
By integrating the LLM translation module into the analytical pipeline, the framework converts complex model outputs into interpretable narratives and actionable marketing recommendations. This enhancement improves the readability, accessibility, and managerial relevance of the results, enabling marketing stakeholders to obtain data-driven insights without requiring statistical or technical expertise and to optimize their marketing strategies to achieve maximum return on investment.

4. Framework Design

The overall system architecture, illustrated in Figure 1, consists of four integrated layers that function as an automated end-to-end pipeline. The first layer is the data input layer, which ingests daily marketing expenditures and business metrics and performs initial data checks, cleaning, and exploratory data analysis. The cleaned data flows directly into the second layer, a machine learning module, where an XGBoost-based MMM is trained and evaluated to model the relationships between marketing inputs and business outcomes. Building on these model outputs, the third layer applies the SHAP explainable AI module to generate feature-level marginal contributions and global impact distributions through feature-importance bar charts, beeswarm plots, and dependence curves. The fourth layer integrates the Claude-based LLM module, which transforms the model predictions and SHAP visualizations into executive summaries, key insights, and channel allocation guidance. Together, these interdependent layers form a unified system that seamlessly transforms raw data into business reports while ensuring predictive accuracy, interpretability, and managerial usability.

5. Empirical Analysis

5.1. Data Selection and Preprocessing

To evaluate the proposed framework, this study utilizes the “Multi-Region Marketing Mix Modeling (MMM) Dataset for Several eCommerce Brands” published on Figshare [11]. The dataset provides daily marketing expenditures across Google, Meta, and TikTok, along with online purchase records for 82 global e-commerce brands. For analysis, we focus on a U.S. apparel e-commerce brand with daily data spanning July 26, 2021 to May 20, 2024. The empirical objective is to quantify the relationship between daily digital marketing investments and online store revenue. The dependent variable, Revenue, is defined as the net value of online purchases after deducting discounts:
Revenue = All _ Purchases _ Original _ Price - All _ Purchases _ Gross _ Discount
The independent variables consist of daily spending across nine channels, including: GOOGLE_PAID_SEARCH_SPEND, GOOGLE_SHOPPING_SPEND, GOOGLE_PMAX_SPEND, GOOGLE_DISPLAY_SPEND, GOOGLE_VIDEO_SPEND, META_FACEBOOK_SPEND, META_INSTAGRAM_SPEND, META_OTHER_SPEND, and TIKTOK_SPEND.

5.2. Exploratory Data Analysis

Before model estimation, an exploratory data analysis (EDA) was conducted to obtain an overview of the temporal dynamics and inter-variable relationships within the dataset. Figure 2 presents the weekly time series of total digital marketing expenditure and revenue from 2021 to 2024. The two series exhibit strong co-movement, with revenue peaks generally following periods of intensified marketing investment. This pattern suggests a close short-term linkage between aggregate digital advertising activity and online sales performance, providing initial evidence of the relevance of marketing spend variations for revenue generation.
Figure 3 reports the correlation heatmap between revenue and daily spending across the nine marketing channels. Meta Instagram and Google PMax display the strongest positive correlations with revenue, followed by Google Paid Search and Meta Facebook. In contrast, channels such as TikTok, Google Shopping, Google Display, and Google Video show weak or negative correlations, indicating limited direct contribution to short-term revenue. Several channels also exhibit moderate intercorrelation particularly across Meta and Google platforms, suggesting overlapping activation patterns and highlighting potential multicollinearity concerns that further motivate the use of nonlinear machine learning models in subsequent analysis.
Collectively, the EDA results demonstrate that variations in digital marketing investments are strongly associated with short-term revenue dynamics and reveal heterogeneous performance across channels. These observations underscore the need for a more flexible modeling framework to quantify marketing effectiveness, as developed in the following sections.

5.3. Baseline and Machine Learning Models

This study develops two modeling approaches to quantify the relationship between marketing spend and revenue: (1) a traditional linear regression-based MMM baseline and (2) an XGBoost-based MMM model. These models are used to assess the predictive performance of the proposed framework and illustrate the potential gains from adopting nonlinear machine learning methods.
(1) 
Baseline Model: Linear Regression-based MMM 
A standard ordinary least squares (OLS) specification is employed as the benchmark. The model relates daily revenue to the nine digital marketing channels as follows:
Revenue = β 0 + i = 1 9 β i ( C h a n n e l S p e n d i n g ) i + ε
Although widely used in classical MMM applications, the linear form limits the model’s ability to represent nonlinear advertising responses, diminishing returns, and interaction effects, which are common in digital marketing settings.
(2) 
XGBoost-based MMM 
To address these limitations, the primary machine learning model in this study is XGBoost. As described in Section 3.1, XGBoost is well suited for capturing nonlinear patterns and complex dependencies among marketing channels. In this empirical analysis, the model is trained using the following hyperparameters:
  • n_estimators = 50
  • max_depth = 6
  • learning_rate = 0.1
  • subsample = 0.8
  • colsample_bytree = 0.8
  • objective = 'reg:squarederror'
  • n_jobs = 1, verbosity = 0, random_state = seed
These settings provide a balance between model complexity and generalization, with regularization controlled through subsampling and tree-depth constraints. All models were evaluated using an 80/20 train-test split to ensure consistent out-of-sample performance comparison. As shown in Section 5.4, XGBoost-based MMM achieves substantially higher predictive accuracy than the linear baseline.

5.4. Model Performance Comparison

Table 1 compares the predictive accuracy of the linear regression-based MMM and the XGBoost-based MMM model using four standard evaluation metrics R², RMSE, MAE, and MAPE on the test dataset. The XGBoost model achieves a substantially higher R² (0.9123 vs. 0.8572) and markedly lower error rates, reducing RMSE by 21.7%, MAE by 22.8%, and MAPE by 33.1%. These improvements indicate that nonlinear effects and interaction patterns missed by the linear baseline are effectively captured by the boosting model.
To further evaluate the predictive reliability of the XGBoost-based model, Figure 4 presents the Predicted vs. Actual plot and the residual distribution. In Figure 4a, the predicted values align closely with the 45-degree reference line, demonstrating strong predictive fidelity across the full revenue range. Figure 4b plots the residuals against predicted values. The residuals are centered around zero with no visible systematic pattern, indicating that XGBoost-based model effectively mitigates heteroskedasticity and nonlinear biases present in the linear baseline. Overall, these results validate the robustness and stability of the XGBoost-based MMM model in predicting revenue based on multi-channel digital marketing expenditures.

5.5. Explainable AI Analysis: SHAP

Next, SHAP was employed to quantify and visualize the marginal contributions of each marketing channel to revenue, providing a transparent decomposition of the XGBoost-based MMM model results. Figure 5a–c presents the SHAP-based explainability analysis:
The Feature Importance Plot (Figure 5a) ranks variables by their mean absolute SHAP values, showing that Meta Instagram Spend, Google PMax Spend, and Meta Facebook Spend are the most influential predictors, jointly accounting for the largest share of model explainability, affirming that social media and integrated Google campaigns play dominant roles in driving sales. The Beeswarm Plot (Figure 5b) further illustrates the direction and magnitude of each feature’s impact on predicted revenue: Higher spending levels (shown in red) for Meta Instagram, Google PMax, and Meta Facebook are consistently associated with strong positive SHAP values, confirming their revenue-generating roles. In contrast, Google Display and Google Video exhibit predominantly negative SHAP values regardless of spend, suggesting inefficiency or saturation at very low spend levels. TikTok shows minimal yet slightly positive contributions, consistent with its small share in the marketing mix. Finally, the Dependence Plots (Figure 5c) visualize nonlinear relationships between marketing spending and revenue predictions across all channels. Meta Instagram, Google PMax, and Google Paid Search show strong positive returns. Meta Facebook delivers moderate gains with noticeable saturation at higher spend levels. Shopping, Display, Video, and Meta Other exhibit weak or limited short-term impact. TikTok shows minimal spend and small but positive effects, suggesting underinvestment rather than low effectiveness.

5.6. LLM-Based Summary and Recommendation

This section reports the key outputs generated by the LLM after synthesizing the XGBoost-based MMM predictions and SHAP plots using the prompts designed in Section 3.3. The automated Executive Summary highlights the strong predictive performance of the model (R² = 0.9123) and emphasizes a notable mismatch between current spending allocation and revenue impact efficiency. It identifies Meta Instagram and Google PMax as the dominant contributors, while also noting saturation in Meta Facebook and Google Paid Search, negative returns from Google Display and Video, and substantial underinvestment in TikTok. The Key Insights further detail channel effectiveness: (i) Instagram yields the highest SHAP impact with sustained positive returns; (ii) PMax maintains stable efficiency across its spend range; (iii) Facebook and Paid Search experience diminishing returns at higher levels; (iv) Display and Video depress revenue; and (v) TikTok offers the strongest growth potential given minimal current investment. The Business Recommendations consolidate these findings into actionable budget guidance. The LLM suggests increasing investment in Instagram (to $1,500-2,000 daily) and TikTok (to $85-150 daily or a $2,000-3,000 monthly testing budget), modestly raising Paid Search (to $500-650 daily), and fully eliminating Display and Video. It also recommends maintaining PMax at its current level ($779 daily) and capping Meta Facebook at roughly $2,000 daily to avoid inefficiency. This model-inferred reallocation strategy is expected to increase overall marketing ROI by 18–25% and potentially generate an additional $50,000–75,000 in monthly revenue without increasing total spend.

6. Results & Discussion

The proposed AI-enhanced marketing mix modeling (MMM) framework demonstrates clear improvements in accuracy, interpretability, and practical business utility. Compared with the traditional OLS regression baseline, the XGBoost-based MMM model achieves substantially higher predictive precision (R² = 0.9123) and reduces RMSE and MAE by 21.7% and 22.8%, respectively. These gains confirm the model’s ability to capture nonlinear response patterns and cross-channel interactions that linear models typically overlook. By integrating SHAP explainable AI, the framework further quantifies each channel’s marginal contribution to revenue, revealing that Meta Instagram, Google PMax, and Meta Facebook generate the strongest positive impacts, whereas channels such as Google Display, Google Video, and TikTok exhibit limited short-term effectiveness under current spending levels. The addition of the Claude LLM further enhances managerial usability by converting technical model outputs into clear natural-language insights, enabling non-technical stakeholders to interpret analytical results and adjust channel allocations accordingly. The recommended reallocation strategy is expected to improve overall marketing ROI by approximately 18-25%.
To better demonstrate the practical value of the proposed framework, consider a hypothetical case study of an e-commerce company that has been using the traditional OLS-based MMM for several years. The company has been allocating its budget primarily to Google Search and Meta Facebook, with less focus on emerging channels like TikTok and Google PMax. After implementing the AI-enhanced MMM framework, which incorporates XGBoost, SHAP, and the Claude LLM, the company identifies several key insights that would have been difficult to uncover using traditional methods. For example, the XGBoost model reveals that while Meta Facebook is still a key driver of sales, its returns diminish at higher spending levels, suggesting that the company is overinvesting in this channel. On the other hand, the SHAP analysis indicates that Google PMax and TikTok, which had been underutilized, are showing significant positive returns with lower levels of investment. The LLM-generated report recommends shifting 20% of the budget from Meta Facebook to Google PMax and TikTok. After implementing these recommendations, the company observes an 18-25% improvement in its marketing ROI over a three-month period, confirming the model’s ability to optimize budget allocation. These results underscore the framework's potential to deliver tangible improvements in marketing performance by helping businesses make data-driven decisions that maximize the impact of their marketing spend.
Although the empirical analysis is demonstrated using a single brand, the proposed framework is designed to be broadly applicable across industries and organizational contexts. It offers substantial practical value for a wide range of firms, particularly small and medium-sized businesses (SMBs) with limited analytical resources. By automating model interpretation and transforming advanced ML/AI outputs into accessible insights, the system significantly lowers the barrier to adopting data-driven decision-making. This accessibility enables non-technical users to make more informed budget decisions, reduce inefficiencies, and enhance marketing performance at lower analytical cost. Moreover, as a scalable and domain-agnostic solution, the framework can support SMBs across diverse sectors, contributing to the broader democratization of advanced marketing analytics, accelerating digital transformation, and enhancing the competitiveness and sustainable growth.
However, while the framework is designed to democratize advanced marketing analytics, it is important to acknowledge potential challenges in deploying this technology, particularly for organizations with limited resources or technical expertise. Smaller businesses, for example, may struggle with data collection and integration due to insufficient infrastructure or the lack of a centralized data repository. Moreover, implementing a machine learning-based MMM framework requires a certain level of computational resources, which may not be readily available to all companies, especially those with tight budgets.
To mitigate these challenges, the framework can be further enhanced by developing user-friendly interfaces, comprehensive tutorials, and support systems that guide users through the process of setting up, interpreting, and acting upon the model’s recommendations. Furthermore, providing modular solutions or offering the framework as a cloud-based service could help reduce the technological burden on small businesses, enabling them to access the full power of the system without requiring significant upfront investment in infrastructure. Moreover, organizations with limited resources could benefit from scaled-down versions of the framework that focus on fewer channels or simpler models, making the technology more accessible without sacrificing its core benefits. These adaptations would allow smaller firms to gradually adopt more sophisticated marketing analytics as their capabilities and resources grow.

7. Conclusion & Future Research

This study presents an AI-enhanced marketing mix modeling (MMM) framework that integrates machine learning-based predictive accuracy with explainable AI interpretability and LLM-enabled managerial usability. By combining XGBoost-based MMM with SHAP and a Claude-based LLM module, the framework enhances modeling precision, transparency, and business applicability, making data-driven marketing insights more accessible to non-technical users. Future research will extend this approach by incorporating causal inference techniques (e.g., geo-experiments and Bayesian structural time-series models) to further strengthen real-world robustness and attribution accuracy. In addition, advancing the framework toward real-time analytics and adaptive budget optimization will improve its ability to respond to dynamic market conditions. Ultimately, the long-term objective is to develop a scalable SaaS platform that enables small and medium-sized enterprises (SMEs) to leverage AI-driven marketing intelligence, improve ROI, and accelerate their digital transformation.

Data Availability Statement

The dataset used in this study is publicly available at Figshare: https://figshare.com/articles/dataset/25314841.

References

  1. Hanssens, D. M., Parsons, L. J., and Schultz, R. L. 2001. Market response models: Econometric and time series analysis (2nd ed.). Springer Science & Business Media, New York, NY, USA.
  2. Runge, J., Skokan, I., Zhou, G., & Pauwels, K. (2024). Packaging Up Media Mix Modeling: An Introduction to Robyn’s Open-Source Approach. arXiv. https://arxiv.org/pdf/2403.14674.
  3. Dew, R., Padilla, N., & Shchetkina, A. (2024). Your MMM is broken: Identification of nonlinear and time-varying effects in marketing mix models. arXiv. [CrossRef]
  4. Berlilana, B., Hariguna, T., & El Emary, I. M. M. (2025). Enhancing digital marketing strategies with machine learning for analyzing key drivers of online advertising performance. Journal of Applied Data Sciences, 6(2), 1037–1046. [CrossRef]
  5. Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (May 2019), 206-215. [CrossRef]
  6. Lundberg, S. M. and Lee, S.-I. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS '17). Curran Associates Inc., Red Hook, NY, USA, 4765–4774. https://dl.acm.org/doi/10.5555/3295222.3295230.
  7. Bastos, J. A. and Bernardes, M. I. 2024. Understanding online purchases with explainable artificial intelligence. Information 15, 10 (2024), Article 587. [CrossRef]
  8. de Haan, E. (2022). Attribution Modeling. In: Homburg, C., Klarmann, M., Vomberg, A. (eds) Handbook of Market Research. Springer, Cham. [CrossRef]
  9. Pérez, A. S., Boukhary, A., Papotti, P., Castejón Lozano, L., & Elwood, A. (2025). An LLM-based approach for insight generation in data analysis. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 562–582). [CrossRef]
  10. Zytek, Z. A., Pido, S., Alnegheimish, S., Berti-Équille, L., & Veeramachaneni, K. (2024). Explingo: Explaining AI predictions using large language models. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData) (pp. 1197–1208). IEEE. [CrossRef]
  11. Anderson, A. 2024. Multi-region marketing mix modelling (MMM) dataset for several eCommerce brands. figshare Dataset. [CrossRef]
Figure 1. Architecture of AI-Enhanced MMM System.
Figure 1. Architecture of AI-Enhanced MMM System.
Preprints 197019 g001
Figure 2. Weekly Trend of Total Marketing Expenditure and Revenue. 
Figure 2. Weekly Trend of Total Marketing Expenditure and Revenue. 
Preprints 197019 g002
Figure 3. Correlation Heatmap (Revenue vs Marketing spend). 
Figure 3. Correlation Heatmap (Revenue vs Marketing spend). 
Preprints 197019 g003
Figure 4. XGBoost-based Model Evaluation: (a) Predictions vs. Actual Values; (b) Residual Distribution. 
Figure 4. XGBoost-based Model Evaluation: (a) Predictions vs. Actual Values; (b) Residual Distribution. 
Preprints 197019 g004
Figure 5. SHAP-Based Explainability Analysis: (a) Feature Importance Bar Plot; (b) Beeswarm Summary Plot; (c) Dependence Plots Across Marketing Channels. 
Figure 5. SHAP-Based Explainability Analysis: (a) Feature Importance Bar Plot; (b) Beeswarm Summary Plot; (c) Dependence Plots Across Marketing Channels. 
Preprints 197019 g005
Table 1. Model Performance Comparison. 
Table 1. Model Performance Comparison. 
Model RMSE MAE MAPE
Linear Regression-based MMM 0.8572 2222.12 1590.11 27.37%
XGBoost-based MMM 0.9123 1741.03 1228.14 18.33%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated