Submitted:
20 June 2024
Posted:
24 June 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Presentation of the Problem
1.2. Defining Agents and Multi-Agents in Artificial Intelligence
1.3. Multi-Agent Systems (MAS)
1.4. Reinforcement Learning Agents
1.5. Large Language Model Agents
1.6. Integration and Differentiation
1.6.1. Summary
- Agents: Autonomous entities that perceive, decide, and act within an environment.
- Multi-Agent Systems (MAS): Systems composed of multiple interacting agents, enhancing problem-solving capabilities through cooperation.
- Reinforcement Learning Agents: Agents that learn by maximizing cumulative rewards through interaction with their environment.
- Large Language Model Agents: Agents that utilize natural language processing to understand and generate human-like text, suitable for tasks involving language.
1.7. Importance of the Problem
1.8. Research Goals
- Enhance the Efficiency of Pipeline Execution: By automating repetitive tasks such as data preprocessing, feature engineering, and model training, our framework aims to reduce the time and effort required to manage machine learning pipelines. This automation is expected to free up human resources for more strategic and creative tasks, thereby improving overall productivity.
- Improve Scalability: Our framework is designed to handle large and diverse datasets efficiently. By leveraging the parallel processing capabilities of multi-agent systems, we aim to ensure that the pipeline can scale to accommodate increasing data volumes and complexity without compromising performance. This addresses one of the key challenges highlighted by LeCun in the context of scalable AI systems [14].
- Increase Adaptability: The framework is intended to be flexible enough to adapt to various supervised learning applications. This includes the ability to integrate new models and techniques seamlessly, as well as to adjust to different domain-specific requirements and data characteristics. Our approach is designed to be modular, allowing for easy updates and extensions as new methodologies emerge.
- Provide Comprehensive Evaluation: We aim to rigorously evaluate the framework’s performance across different datasets and problem types. This involves assessing not only the accuracy and robustness of the models produced but also the efficiency and scalability of the pipeline as a whole. By doing so, we hope to demonstrate the practical benefits and limitations of our approach, providing valuable insights for further research and development. This comprehensive evaluation will help establish benchmarks and best practices for the deployment of multi-agent systems in machine learning pipelines.
2. Related Work
2.1. Summarize Previous Research Related to the Problem
2.1.1. Foundational Work in Multi-Agent Systems
2.1.2. Recent Developments in Multi-Agent Reinforcement Learning
2.1.3. Time Series Forecasting Models
- TimeGPT by Nixtla: A transformer-based model trained on a vast array of time series data, capable of high-accuracy predictions in diverse domains without retraining [20].
- uniTS by Gao et al.: This model integrates multiple forecasting techniques, providing a robust and unified framework for time series prediction [21].
- Chronos by Amazon: Designed for scalability and robustness, making it suitable for industrial-scale applications [22].
- Lag Llama by Ashok et al.: Focuses on leveraging lag-based features to enhance prediction accuracy across various contexts [23].
2.2. Highlight Gaps in the Literature
2.3. How Our Study Relates to and Builds Upon Existing Work
2.4. Highlight Gaps in the Literature
3. Methodology
3.1. Framework
3.2. Experimental Design
3.2.1. Workflow Overview
- Problem Definer Agent: Assesses the type of data and determines the machine learning problem (e.g., regression, classification, time series).
- Data Analyst Agent: Provides insights from the data, informing the decisions of the model consultant and feature engineer agents.
- Model Consultant Agent: Selects the most suitable machine learning model based on insights from previous agents, offering alternatives if necessary.
- Feature Engineer Agent: Conducts data transformations, splits the data into training and test sets, and divides it by features and target variables.
- Model Builder Agent: Creates, fits, and evaluates the machine learning model, and makes predictions.
- Report Generator Agent: Compiles summaries from all agents into a comprehensive final report.
3.3. Large Language Model Used
3.3.1. Limitations of Llama 3
3.4. Enhancing Prompt Engineering
- Data Analyst: "As a data analyst, your task involves comprehending the dataset, assessing its quality, and performing exploratory data analysis."
- Feature Engineer: "As a data engineer, your responsibility is to review the data analyst’s findings and to refine and adapt the data for machine learning application. Post optimization, you are to partition the data into training and test sets without randomizing it."
- Model Builder: "Your duty as a machine learning engineer is to construct an XGBoost model using the data prepared by the feature engineer. This involves fitting the model to the data, generating predictions, and evaluating the model’s performance."
- Report Generator: "As a report generator, you are tasked with compiling insights from previous agents into a detailed report and saving it as a .txt file."
3.4.1. Adopting the CO-STAR Methodology
- Context: Supply the agents with background information pertinent to the task.
- Objective: Clearly articulate the specific task the Language Model (LLM) is expected to carry out.
- Style: Designate the desired writing style for the LLM’s output.
- Tone: Determine the intended tone for the response, setting an appropriate attitude for the communication.
- Audience: Ascertain the target audience for whom the response is being tailored.
- Response: Define the format in which the response should be presented.
3.5. Evaluation Methods
| Outcome | Description |
|---|---|
| Correct | |
| Precise | The pipeline was execute without failing any step |
| Imprecise | Correctly reach the predictions and the report but one agent have failed |
| Incorrect | |
| Hallucination | Throughout the sequence one of the agents invent data to reach the final predictions |
| Rate Limits | On of the agents reaches a loop to solve the task and rate limits are exceeded |
| Predictions Outcome | Description |
|---|---|
| Very Good | For classification problem if the accuracy is above 0.9. For the others if the normalized RMSE is bellow 0.1. |
| Good | For classification problem if the accuracy is between 0.7 and 0.9. For the others, if the normalized RMSE is between 0.1 and 0.3. |
| Poor | For classification if the accuracy is between 0.5 and 0.7. For others, if the normalized RMSE is between 0.3 and 0.5 |
| Non-exists | The pipeline doens’t produced Predictions. |
- Very Good Predictions: for classification problem if the accuracy is above 0.9. For the others if the normalized RMSE is bellow 0.1.
- Good Predictions: for classification problem if the accuracy is between 0.7 and 0.9. For the others, if the RMSE is between 0.1 and 0.3.
- Poor Predictions: for classification if the accuracy is between 0.5 and 0.7. FOr others, if the normalized RMSE is between 0.35 and 0.5
- Non-existant: the pipeline doens’t produced Predictions.
3.5.1. Datasets
4. Results and Analysis
5. Conclusion and Future Work
- Pipeline Performance: The pipeline demonstrated an 80% success rate across diverse machine learning tasks, highlighting its robustness and potential for automation.
- Prediction Quality: The framework produced very good results for 11 datasets, good results for 9, and poor results for 4, with no predictions generated for 6 datasets.
- Model Selection: The model consultant agent tended to favor familiar models, indicating a potential area for improvement in model diversity and selection criteria.
- Resource Efficiency: Optimization of agent tasks reduced the average token usage to 37,875 tokens, demonstrating efficient resource utilization.
- Rate-Limit Constraints: API rate limits caused failures in complex datasets, indicating a need for improved workflow optimization.
- Model Selection Bias: The model consultant agent’s preference for certain models suggests a need for more diverse training or refined selection algorithms.
- Hallucinations and Infinite Loops: Issues with agent hallucinations and infinite loops highlight the need for better error handling and context management.
5.1. Future Work
Acknowledgments
Appendix A
| Data set | Input | Report Generated | Normalized RMSE | Tokens Used |
|---|---|---|---|---|
| Second Hand Car Price | I want to predict car price | yes | 0.1048 | 31248 |
| Cholesterol | I want to predict cholesterol | yes | 0.0757 | 37728 |
| Crab Age | I want to predict crab age | yes | 0.1400 | 39753 |
| Bike Rents for the Day | I want to predict bike rents for the day | yes | 0.0125 | 65447 |
| Fuel Consumption | I want to predict Fuel Consumption | yes | 0.1078 | 42462 |
| House Sales in King County | I want to predict house prices | Rate Limits | - | - |
| Medical Insurance Cost | I want to predict the cost of medical insurance | yes | 0.0451 | 33197 |
| Elevator Predictive maintenance | I want to predict the vibration | yes | 0.2390 | 34177 |
| Happiness Index | I want to predict the overall rank of happiness | yes | 0.03222 | 40633 |
| Student Performance | I want to predict students Performance | yes | 0.0225 | 26316 |
| Data set | Input | Report Generated | Accuracy | Tokens Used |
|---|---|---|---|---|
| Airline Costumer Satisfaction | I want to predict whether future customers will be satisfied | yes | 0.9574 | 34598 |
| Anemia type | I want to predict the type of anemia | yes | 0.9883 | 26771 |
| Employee Attrition | I want to predict employee attrition | yes | 0.8776 | 30568 |
| Mushroom Classification | I want to predict the mushroom class (is edible or poisenous) | yes | 1 | 30746 |
| Obesity Classification | I want to predict the type of obesity | yes | 0.9091 | 29742 |
| Machine Predictive Maintenance | I want to predict the variable target | Rate Limits | - | - |
| Telecom costumer churn | I want to predict costumer Churn Category and Churn Reason | Hallucinates | - | - |
| Thyroid disease | I want to predict recurrence of thyroid cancer, if yes or no | yes | 0.9870 | 22578 |
| White wine quality | I want to predict the classification of the wine quality | yes | 0.6594 | 21877 |
| Red wine quality | I want to predict the classification of the wine quality | yes | 0.6735 | 24864 |
| Data set | Input | Report Generated | Normalized RMSE | Tokens Used |
|---|---|---|---|---|
| Air Passengers | I want to forecast the passengers rate | yes | 0.2253 | 30011 |
| Daily Minimum Temperatures | I want to forecast the daily minimum temperature | Rate Limits | - | - |
| Eletric Production | I want to forecast the eletrical production | yes | 0.2322 | 27856 |
| Hourly Gasoline Prices | I want to forecast gasoline prices | yes | 0.0184 | 39678 |
| Microsoft Stock | I want to forecast microsoft stock | yes | 0.3651 | 28895 |
| Montlhy Beer Production | I want to forecast the monthly beer production | yes | 0.2439 | 24720 |
| Energy Consumption | I want to forecast energy consumption | yes | 0.1396 | 82912 |
| National Population | I want to forecast population growth | Rate Limits | - | - |
| Water Pum Rul | I want to forecast RUL of water pump | Rate Limits | - | - |
| Sales of Shampoo | I want to forecast the shampoos sale | yes | 0.3555 | 19925 |
References
- IBM. What Is a Machine Learning Pipeline? 2021. Available online: https://www.ibm.com/cloud/learn/machine-learning-pipelines.
- Stonebranch. MLOps and Automation for Machine Learning Pipelines. 2023. Available online: https://www.stonebranch.com/solutions/mlops.
- LeCun, Y. The Critical Need for Scalable and Efficient Machine Learning Systems. 2021. Available online: https://www.facebook.com/yann.lecun/posts/10159815944837143.
- Zapier. What are AI agents? A comprehensive guide, 2023.
- Wikipedia. Multi-agent system, 2023.
- Institute, T.A.T. Multi-agent systems, 2023.
- TJU-DRL-LAB. Multiagent-RL: The official code releasement of publications in MARL field of TJU RL lab, 2023.
- MARLlib. One repository for Multi-agent Reinforcement Learning (MARL), 2023.
- Zhang, K.; Yang, Z.; Basar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithmsarXiv preprint arXiv:1911.10635. arxiv:1911.10635 2019, [arXiv:cs.LG/1911.10635]. arXiv:1911.10635.
- Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Zhang, S.; Zhu, E.; Li, B.; Jiang, L.; Zhang, X.; Wang, C. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, [arXiv:cs.AI/2308.08155].
- Vidhya, A. Machine Learning Pipelines: Why Adopted by Data Scientists? 2021. Available online: https://www.analyticsvidhya.com/blog/2021/05/machine-learning-pipelines-why-adopted-by-data-scientists/.
- AWS. Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines. 2023. Available online: https://aws.amazon.com/sagemaker/pipelines/.
- LeCun, Y. Real-Time Processing and Decision-Making in AI Systems. 2021. Available online: https://www.facebook.com/yann.lecun/posts/10159815944837143.
- LeCun, Y. Scalable AI Systems. 2015. Available online: https://www.facebook.com/yann.lecun/posts/10159815944837143.
- Shoham, Y.; Leyton-Brown, K. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations; Cambridge University Press, 2009. [Google Scholar]
- Weiss, G. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence; The MIT Press, 2013. [Google Scholar]
- Stone, P.; Veloso, M. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 2000, 8, 345–383. [Google Scholar] [CrossRef]
- MARLlib. One repository for Multi-agent Reinforcement Learning (MARL), 2023.
- Zhang, K.; Yang, Z.; Basar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. ArXiv preprint 2019, [arXiv:cs.LG/1911.10635].
- Nixtla. TimeGPT: A Model for Time Series Forecasting, 2023.
- Gao, S.; Koker, T.; Queen, O.; Hartvigsen, T.; Tsiligkaridis, T.; Zitnik, M. UniTS: Building a Unified Time Series Model. arXiv preprint arXiv:2403.00131 2024, [arXiv:cs.LG/2403.00131].
- Ansari, A.F.; Stella, L.; Turkmen, C.; Zhang, X.; Mercado, P.; Shen, H.; Shchur, O.; Rangapuram, S.S.; Arango, S.P.; Kapoor, S.; Zschiegner, J.; Maddix, D.C.; Wang, H.; Mahoney, M.W.; Torkkola, K.; Wilson, A.G.; Bohlke-Schneider, M.; Wang, Y. Chronos: Learning the Language of Time Series, 2024, [arXiv:cs.LG/2403.07815].
- Ashok, A.; Williams, A.R.; Ghonia, H.; Bhagwatkar, R.; Khorasani, A.; Bayazi, M.J.D.; Adamopoulos, G.; Riachi, R.; Hassen, N.; Biloš, M.; Garg, S.; Schneider, A.; Chapados, N.; Drouin, A.; Zantedeschi, V.; Nevmyvaka, Y.; Rish, I. Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. arXiv preprint arXiv:2310.08278 2024, [arXiv:cs.LG/2310.08278].
- Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting 2021, 37, 388–427. [Google Scholar] [CrossRef]
- Banik, D.; Das, S.; Ghosh, D.; Bandyopadhyay, N. MLOps: Machine Learning Operations, 2022, [arXiv:cs.LG/2208.04798].
- Garza, A.; Mergenthaler-Canseco, M. TimeGPT-1, 2023, [arXiv:cs.LG/2310.03589].
- Rasul, K.; Ashok, A.; Williams, A.R.; Ghonia, H.; Bhagwatkar, R.; Khorasani, A.; Bayazi, M.J.D.; Adamopoulos, G.; Riachi, R.; Hassen, N.; Biloš, M.; Garg, S.; Schneider, A.; Chapados, N.; Drouin, A.; Zantedeschi, V.; Nevmyvaka, Y.; Rish, I. Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting, 2024, [arXiv:cs.LG/2310.08278].
- Python core team. pickle — Python object serialization. 2023. Available online: https://docs.python.org/3/library/pickle.html.
- Teo, S. How I Won Singapore’s GPT-4 Prompt Engineering Competition. 2023. Available online: https://towardsdatascience.com/how-i-won-singapores-gpt-4-prompt-engineering-competition-34c195a93d41.
- AiFlowSolutions. MADS: Multi-Agent Drift Simulator. n.d. Available online: https://github.com/AiFlowSolutions/MADS/tree/experimental-results-paper (accessed on 20 June 2024).

| Outcome | Classification | Regression | Time-Series | Total |
|---|---|---|---|---|
| Correct | ||||
| Precise | 8 | 8 | 7 | 23 |
| Imprecise | - | 1 | - | 1 |
| Incorrect | ||||
| Hallucination | 1 | - | - | 1 |
| Rate Limits | 1 | 1 | 3 | 5 |
| Classification | Regression | Time-Series | Total | |
|---|---|---|---|---|
| Very Good | 5 | 5 | 1 | 11 |
| Good | 1 | 4 | 4 | 9 |
| Poor | 2 | - | 2 | 4 |
| No-exists | 2 | 1 | 3 | 6 |
| Classification | Regression | Time-Series | Total | |
|---|---|---|---|---|
| Tokens used | 36941 | 38996 | 37688 | 37875 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).