Submitted:
06 July 2025
Posted:
07 July 2025
You are already at the latest version
Abstract
Keywords:
Chapter 1: Introduction
1.1. Background
1.2. Problem Statement
1.3. Research Objectives
- To develop a meta-learned attention-based DeepFM model that captures both low-order and high-order interactions among features, specifically tailored for product usage forecasting in e-commerce settings.
- To evaluate the effectiveness of the proposed model against traditional DeepFM and other state-of-the-art predictive models, utilizing real-world datasets characterized by sparsity.
- To investigate the interpretability of the model through attention scores, providing insights into the features that significantly influence predictions and enhancing stakeholder understanding.
- To explore the implications of the findings for practitioners in the e-commerce domain, particularly in optimizing marketing strategies and improving user engagement.
1.4. Significance of the Study
1.5. Research Questions
- How does the integration of attention mechanisms within the DeepFM framework enhance predictive accuracy in product usage forecasting compared to traditional models?
- In what ways do meta-learning techniques improve the adaptability and efficiency of the attention-based DeepFM model in handling diverse datasets?
- What insights can be gained from the attention mechanisms regarding feature relevance, and how do these insights contribute to the interpretability of the model’s predictions?
- What are the practical implications of the proposed model for e-commerce practitioners in optimizing marketing strategies and enhancing user engagement?
1.6. Structure of the Thesis
- Chapter 2: Literature Review: This chapter provides a comprehensive overview of existing research related to predictive modeling, attention mechanisms, DeepFM, and meta-learning. It highlights the strengths and limitations of current approaches and identifies gaps in the literature that this study aims to address.
- Chapter 3: Methodology: This chapter outlines the methodological framework employed in the development of the meta-learned attention-based DeepFM model. It details the data collection process, model architecture, training procedures, and evaluation metrics.
- Chapter 4: Results: This chapter presents the results of the experimental evaluation, comparing the performance of the proposed model to traditional DeepFM implementations and other state-of-the-art predictive models. It includes a detailed analysis of predictive accuracy, precision, recall, and interpretability.
- Chapter 5: Discussion: This chapter discusses the implications of the findings, addressing the significance of attention mechanisms and meta-learning in enhancing predictive performance. It explores the limitations of the study and proposes future research directions.
- Chapter 6: Conclusion: This final chapter summarizes the key findings of the research, reiterates its contributions to the field, and outlines recommendations for practitioners and future research avenues.
1.7. Conclusion
Chapter 2: Literature Review
2.1. Introduction
2.2. Product Usage Forecasting
2.2.1. Definition and Importance
2.2.2. Challenges in Product Usage Forecasting
- Sparsity of Data: Many product usage datasets are high-dimensional with a limited number of observations, leading to issues such as overfitting and difficulties in generalization.
- Dynamic Consumer Behavior: Consumer preferences can change rapidly due to various factors, including market trends, seasonality, and promotional campaigns, complicating the forecasting task.
- Complex Feature Interactions: Identifying and modeling the interactions between different features—such as user demographics, product characteristics, and contextual information—remains a significant challenge.
2.3. Deep Learning and Factorization Machines
2.3.1. Overview of Deep Learning
2.3.2. Factorization Machines
2.3.3. Deep Factorization Machines (DeepFM)
- Factorization Machine Component: This component models low-order interactions using matrix factorization techniques, effectively handling sparsity in the data.
- Deep Learning Component: This component consists of multiple fully connected layers that learn high-order interactions, enhancing the model’s ability to capture complex relationships in user behavior (Wang et al., 2017).
2.4. Attention Mechanisms
2.4.1. The Concept of Attention
2.4.2. Types of Attention Mechanisms
- Soft Attention: This mechanism assigns continuous weights to all input features, enabling the model to consider multiple elements simultaneously.
- Hard Attention: This approach focuses on specific features, leading to more binary decision-making.
- Self-Attention: This mechanism computes attention scores within the same input sequence, capturing long-range dependencies and contextual relationships.
2.4.3. Application of Attention in Deep Learning
2.5. Meta-Learning
2.5.1. Overview of Meta-Learning
2.5.2. Curriculum Learning
2.5.3. Application of Meta-Learning in Forecasting
2.6. Integration of Meta-Learning and Attention in DeepFM
2.6.1. Synergistic Benefits
2.6.2. Empirical Evidence
2.7. Gaps in the Literature
- Limited Exploration of Combined Approaches: While meta-learning and attention mechanisms have been studied independently, their integration within predictive modeling frameworks for product usage forecasting remains underexplored.
- Empirical Validation: There is a lack of empirical studies validating the effectiveness of the integrated approach across diverse real-world applications.
- Interpretability Challenges: Although attention mechanisms enhance interpretability, further research is needed to elucidate the model’s predictions and the influence of curriculum learning on feature importance.
2.8. Conclusion
Chapter 3: Methodology
3.1. Introduction
3.2. Model Architecture
3.2.1. Overview of DeepFM
- Factorization Machine Component: This component captures pairwise interactions between features using matrix factorization, which is particularly effective in dealing with high-dimensional, sparse datasets. The factorization machine can model interactions without requiring a large amount of data, making it ideal for scenarios where user-item interactions are limited.
- Deep Learning Component: The deep learning component comprises several fully connected layers that learn complex, nonlinear relationships among features. By leveraging deep learning, DeepFM enhances its capacity to uncover intricate patterns in product usage, which are often not captured by traditional models.
3.2.2. Integration of Attention Mechanisms
- Dynamic Feature Weighting: The attention mechanism assigns varying importance to different features based on their relevance to the prediction task. This dynamic weighting allows the model to focus on significant predictors, effectively filtering out irrelevant noise that can degrade performance.
- Contextual Adaptation: By incorporating attention layers, the model can adapt its predictions based on user-specific contexts, such as demographic information and historical interactions. This contextual adaptability enhances the model’s ability to make accurate predictions in diverse scenarios.
3.3. Curriculum Meta-Learning Framework
3.3.1. Concept of Curriculum Learning
3.3.2. Implementation of Curriculum Learning
- Task Design: The tasks are designed based on the complexity of the underlying data. Initial tasks consist of simpler patterns with abundant examples, while subsequent tasks introduce more complex interactions that are less frequent in the dataset.
- Progressive Training: The model is trained iteratively, starting with the simplest tasks and gradually advancing to more complex ones. This progression allows the framework to develop a robust understanding of feature interactions before encountering more intricate patterns.
- Performance Monitoring: During training, the model’s performance is continuously monitored to determine when it is ready to transition to more complex tasks. This adaptive learning process enhances the model’s ability to generalize across diverse scenarios.
3.4. Data Collection
3.4.1. Dataset Selection
- E-Commerce Transaction Dataset: This dataset contains user transaction records from an online retail platform, capturing user interactions, product views, and purchase behavior.
- User Engagement Metrics: This dataset comprises user engagement logs, including metrics such as clicks, views, and shares, which provide insights into user preferences and behavior.
- Product Review Dataset: This dataset includes customer reviews and ratings for various products, offering valuable information about user sentiments and preferences.
3.4.2. Data Preprocessing
- Data Cleaning: Missing values are addressed using imputation techniques, while outliers are identified and removed to maintain data integrity.
- Feature Engineering: Relevant features are extracted and transformed. Categorical variables are encoded using techniques such as one-hot encoding or embeddings, while continuous variables are normalized to ensure consistent scaling.
- Handling Sparsity: To mitigate the effects of high dimensionality, techniques such as dimensionality reduction (e.g., Principal Component Analysis) may be employed where appropriate.
- Train-Test Split: Each dataset is divided into training and testing subsets, typically with an 80-20 split, to facilitate robust evaluation of model performance.
3.5. Experimental Design
3.5.1. Environment Configuration
3.5.2. Baseline Models
- Traditional DeepFM: The standard version of DeepFM without attention mechanisms serves as the primary benchmark.
- Factorization Machines (FM): This simpler model captures only low-order interactions, providing a comparison against the more complex architectures.
- Collaborative Filtering Models: User-based and item-based collaborative filtering models were included to provide a comprehensive performance comparison.
3.5.3. Evaluation Metrics
- Accuracy: The proportion of correct predictions relative to the total predictions made by the model.
- Precision and Recall: These metrics evaluate the model’s ability to correctly identify positive instances, particularly important in imbalanced datasets.
- F1-Score: The harmonic mean of precision and recall, providing a single measure that balances both concerns.
- Mean Absolute Error (MAE) and Root Mean Square Error (RMSE): These metrics are utilized for regression tasks to measure the average prediction error.
3.6. Implementation of Attention Mechanisms
3.6.1. Feature Attention Layer
- Attention Layer: An attention layer is added to the DeepFM architecture, allowing the model to compute attention scores for each feature based on learned representations.
- Weight Assignment: The attention scores are normalized to yield weights that reflect the importance of each feature, enabling the model to prioritize significant predictors during training and prediction.
3.6.2. Contextual Attention Mechanism
- Contextual Inputs: User-specific contextual features, such as demographic information and historical behavior, are incorporated into the model.
- Attention Calculation: The model calculates attention scores based on these contextual features, dynamically adjusting the contribution of each user context to the final predictions.
3.7. Validation Process
3.7.1. Cross-Validation
3.7.2. Statistical Significance Testing
3.8. Conclusion
Chapter 4: Methodology
4.1. Introduction
4.2. Research Design
4.2.1. Type of Study
4.2.2. Research Questions
- How does the integration of meta-learning enhance the adaptability and performance of the attention-based DeepFM framework in predicting product usage?
- In what ways do attention mechanisms improve the model’s ability to focus on relevant features in high-dimensional datasets?
- What is the impact of the proposed framework on predictive accuracy compared to conventional forecasting methods?
4.3. Data Collection
4.3.1. Dataset Selection
- E-Commerce Transaction Dataset: This dataset includes user transaction records from an online retail platform, capturing user interactions, product views, and purchase behavior.
- Consumer Electronics Usage Dataset: This dataset contains user logs for various consumer electronics, detailing usage patterns and engagement metrics.
- App Usage Dataset: This dataset provides insights into user engagement with mobile applications, including session lengths, frequency of use, and user demographics.
4.3.2. Data Preprocessing
- Data Cleaning: Missing values were addressed through imputation techniques, and outliers were identified and removed to maintain data integrity.
- Feature Engineering: Relevant features were extracted and transformed from the raw data. Categorical variables were encoded using techniques such as one-hot encoding or embeddings, while continuous variables were normalized to ensure consistent scaling.
- Handling Sparse Data: Given the nature of product usage datasets, techniques such as dimensionality reduction (e.g., Principal Component Analysis) were employed to mitigate the effects of high dimensionality on model performance.
- Train-Test Split: Each dataset was divided into training and testing subsets, typically following an 80-20 split, to facilitate robust evaluation of model performance.
4.4. Model Architecture
4.4.1. Overview of DeepFM
- Factorization Machine Component: This component models pairwise interactions between features using matrix factorization techniques, effectively handling sparsity in high-dimensional datasets.
- Deep Learning Component: The deep learning layer consists of multiple fully connected layers that learn complex, nonlinear interactions among features, enhancing the model’s capability to capture intricate patterns in user behavior.
4.4.2. Integration of Attention Mechanisms
- Attention Layer: An attention layer is added to the DeepFM architecture, allowing the model to compute attention scores for each feature based on its relevance to the prediction task. This feature weighting enables the model to prioritize significant predictors while minimizing the impact of irrelevant noise.
- Contextual Attention: The model incorporates contextual features, such as demographic information and historical usage patterns, which allow for personalized predictions. The attention mechanism adjusts based on these contextual inputs, enhancing prediction accuracy.
4.4.3. Implementation of Meta-Learning
- Curriculum Learning: The model is trained using a curriculum learning approach, where simpler tasks are presented first, gradually progressing to more complex tasks. This structured learning process enables the model to build foundational knowledge that enhances its ability to generalize.
- Meta-Training and Meta-Testing: The meta-training phase involves training the model on a variety of tasks to enable it to learn how to learn. During the meta-testing phase, the model’s adaptability to unseen tasks is evaluated, providing insights into its generalization capabilities.
4.5. Experimental Design
4.5.1. Environment Configuration
4.5.2. Baseline Models
- Traditional DeepFM: The standard version of DeepFM without attention mechanisms serves as a primary benchmark.
- Factorization Machines (FM): This simpler model captures only low-order interactions, providing a comparison against the more complex architectures.
- Other State-of-the-Art Models: Models such as Gradient Boosting Machines (GBM) and neural collaborative filtering were included to provide a comprehensive performance comparison.
4.5.3. Evaluation Metrics
- Accuracy: The proportion of correct predictions relative to the total predictions made by the model.
- Precision and Recall: These metrics evaluate the model’s ability to correctly identify positive instances, particularly important in imbalanced datasets.
- F1-Score: The harmonic mean of precision and recall, providing a single measure that balances both concerns.
- Mean Absolute Error (MAE) and Root Mean Square Error (RMSE): Metrics utilized for regression tasks to measure the average prediction error.
4.6. Validation Process
4.6.1. Cross-Validation
4.6.2. Statistical Significance Testing
4.7. Implementation
4.7.1. Software and Tools
4.7.2. Experimental Setup
4.8. Limitations of the Methodology
- Dataset Constraints: The reliance on specific datasets may limit the generalizability of the findings. Future studies should validate the proposed model across a broader range of datasets to ensure robustness.
- Complexity of Implementation: The integration of meta-learning and attention mechanisms, while advantageous, introduces complexity in the model’s implementation. Organizations with limited computational resources may find it challenging to deploy such advanced models effectively.
- Dependence on Data Quality: The success of the meta-learned attention-based DeepFM model is inherently tied to the quality of input data. Poor-quality data can lead to suboptimal performance, regardless of the sophistication of the model. Organizations must prioritize data governance and quality assurance processes to mitigate this limitation.
4.9. Conclusion
Chapter 5: Discussion and Implications
5.1. Introduction
5.2. Summary of Key Findings
5.2.1. Enhanced Predictive Accuracy
5.2.2. Adaptability Through Meta-Learning
5.2.3. Interpretability and Actionable Insights
5.3. Implications for Practice
5.3.1. Strategic Adoption of Advanced Predictive Models
5.3.2. Focus on Data-Driven Decision-Making
5.3.3. Continuous Learning and Adaptation
5.3.4. Interdisciplinary Collaboration
5.4. Limitations of the Study
5.4.1. Dataset Limitations
5.4.2. Complexity of Implementation
5.4.3. Dependence on Data Quality
5.5. Future Research Directions
5.5.1. Examination of Hybrid Models
5.5.2. Enhancing Interpretability Techniques
5.5.3. Real-World Applications
5.5.4. Addressing Ethical Considerations
5.6. Conclusion
Chapter 6: Conclusion and Future Directions
6.1. Summary of Findings
6.1.1. Meta-Learning for Adaptability
6.1.2. Attention Mechanisms for Feature Relevance
6.1.3. Empirical Validation
6.2. Implications for Practice
6.2.1. Implementation of Advanced Predictive Models
6.2.2. Focus on Interpretability
6.2.3. Continuous Learning Framework
6.2.4. Interdisciplinary Collaboration
6.3. Limitations of the Study
6.3.1. Dataset Limitations
6.3.2. Complexity of Implementation
6.3.3. Dependence on Data Quality
6.4. Future Research Directions
6.4.1. Exploration of Hybrid Models
6.4.2. Enhancing Interpretability Techniques
6.4.3. Real-World Applications
6.4.4. Addressing Ethical Considerations
6.5. Conclusion
References
- Huang, S. , Xi, K., Bi, X., Fan, Y., & Shi, G. (2024, November). Hybrid DeepFM Model with Attention and Meta-Learning for Enhanced Product Usage Prediction. In 2024 4th International Conference on Digital Society and Intelligent Systems (DSInS) (pp. 267–271). IEEE.
- Wang, S. , Yang, Q., Ruan, S., Long, C., Yuan, Y., Li, Q.,... & Zheng, Y. (2024). Spatial Meta Learning With Comprehensive Prior Knowledge Injection for Service Time Prediction. IEEE Transactions on Knowledge and Data Engineering 2024. [Google Scholar] [CrossRef]
- Wang, C. , Zhu, Y., Liu, H., Zang, T., Yu, J., & Tang, F. Deep meta-learning in recommendation systems: A survey. arXiv preprint 2022, arXiv:2206.04415. [Google Scholar]
- Xia, Z. , Liu, Y., Zhang, X., Sheng, X., & Liang, K. Meta Domain Adaptation Approach for Multi-domain Ranking. IEEE Access 2025. [Google Scholar] [CrossRef]
- Yue, W. , Hu, H., Wan, X., Chen, X., & Gui, W. A Domain Knowledge-Supervised Framework Based on Deep Probabilistic Generation Network for Enhancing Industrial Soft-sensing. IEEE Transactions on Instrumentation and Measurement 2025. [Google Scholar] [CrossRef]
- Zhao, X. , Wang, M., Zhao, X., Li, J., Zhou, S., Yin, D.,... & Guo, R. Embedding in recommender systems: A survey. arXiv 2023, arXiv:2310.18608. [Google Scholar]
- Li, C. , Ishak, I., Ibrahim, H., Zolkepli, M., Sidi, F., & Li, C. Deep learning-based recommendation system: systematic review and classification. IEEE Access 2023, 11, 113790–113835. [Google Scholar] [CrossRef]
- Rajabi, F. , & He, J. S. Click-through rate prediction using graph neural networks and online learning. arXiv 2021, arXiv:2105.03811. [Google Scholar]
- Bai, J. , Geng, X., Deng, J., Xia, Z., Jiang, H., Yan, G., & Liang, J. A comprehensive survey on advertising click-through rate prediction algorithm. The Knowledge Engineering Review 2025, 40, e3. [Google Scholar] [CrossRef]
- Gharibshah, Z. , & Zhu, X. User response prediction in online advertising. aCM Computing Surveys (CSUR) 2021, 54, 1–43. [Google Scholar] [CrossRef]
- Wen, H. , Lin, Y., Wu, L., Mao, X., Cai, T., Hou, Y.,... & Wan, H. (2024). A survey on service route and time prediction in instant delivery: Taxonomy, progress, and prospects. IEEE Transactions on Knowledge and Data Engineering 2024, 36, 7516–7535. [Google Scholar] [CrossRef]
- Wang, Y. , Yin, H., Wu, L., Chen, T., & Liu, C. Secure your ride: Real-time matching success rate prediction for passenger-driver pairs. IEEE Transactions on Knowledge and Data Engineering 2021. [Google Scholar] [CrossRef]
- Pan, X. , & Gan, M. Multi-behavior recommendation based on intent learning. Multimedia Systems 2023, 29, 3655–3668. [Google Scholar] [CrossRef]
- Chen, B. , Zhao, X., Wang, Y., Fan, W., Guo, H., & Tang, R. A comprehensive survey on automated machine learning for recommendations. ACM Transactions on Recommender Systems 2024, 2, 1–38. [Google Scholar] [CrossRef]
- Zhu, G. , Cao, J., Chen, L., Wang, Y., Bu, Z., Yang, S.,... & Wang, Z. A multi-task graph neural network with variational graph auto-encoders for session-based travel packages recommendation. ACM Transactions on the Web 2023, 17, 1–30. [Google Scholar] [CrossRef]
- Li, C. T. , Tsai, Y. C., Chen, C. Y., & Liao, J. C. Graph neural networks for tabular data learning: A survey with taxonomy and directions. ACM Computing Surveys 2024. [Google Scholar] [CrossRef]
- Xu, L. , Zhang, J., Li, B., Wang, J., Chen, S., Zhao, W. X., & Wen, J. R. Tapping the potential of large language models as recommender systems: A comprehensive framework and empirical analysis. ACM Transactions on Knowledge Discovery from Data 2025, 19, 1–51. [Google Scholar] [CrossRef]
- Yao, J. , Zhang, S., Yao, Y., Wang, F., Ma, J., Zhang, J.,... & Yang, H. Edge-cloud polarization and collaboration: A comprehensive survey for ai. IEEE Transactions on Knowledge and Data Engineering 2022, 35, 6866–6886. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).