Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Diagnosis-Effective Sampling of Application Traces

Version 1 : Received: 27 April 2024 / Approved: 29 April 2024 / Online: 29 April 2024 (04:59:11 CEST)

How to cite: Poghosyan, A.; Harutyunyan, A.; Davtyan, E.; Petrosyan, K.; Baloian, N. Diagnosis-Effective Sampling of Application Traces. Preprints 2024, 2024041876. https://doi.org/10.20944/preprints202404.1876.v1 Poghosyan, A.; Harutyunyan, A.; Davtyan, E.; Petrosyan, K.; Baloian, N. Diagnosis-Effective Sampling of Application Traces. Preprints 2024, 2024041876. https://doi.org/10.20944/preprints202404.1876.v1

Abstract

Distributed tracing is a cutting-edge technology for monitoring, managing, and troubleshooting native cloud applications. It offers a more comprehensive and continuous observability, surpassing traditional logging methods, and is indispensable for navigating modern complex software architectures. However, the sheer volume of generated traces is staggering in distributed applications, and direct storage and utilization of every trace are impractical due to associated operational costs. This entails a sampling strategy to select which traces warrant storage and analysis. Historically, sampling methods have included a rate-based approach, often relying heavily on a manual configuration. There is a need for a more intelligent approach, and we propose a hierarchical sampling methodology to address multiple requirements concurrently. Initial rate-based sampling mitigates the overwhelming volume of traces, as no further analysis can be performed on this level. In the next stage, more nuanced analysis is facilitated based on the previous foundation, incorporating information regarding trace properties and ensuring the preservation of vital process details even under extreme conditions. This comprehensive approach not only aids in the visualization and conceptualization of applications but also enables more targeted analysis in later stages. As we delve deeper into the sampling hierarchy, the technique becomes tailored to specific purposes, such as simplifying application troubleshooting. In this context, the sampling strategy prioritizes the retention of erroneous traces from dominant processes, thus facilitating the identification and resolution of underlying issues. The focus of this paper is to reveal the impact of the sampling on troubleshooting efficiency. Leveraging intelligent and explainable artificial intelligence solutions enables the detection of malfunctioning microservices and provides transparent insights into root causes. We advocate for using rule-induction systems, which offer explainability and efficacy in decision-making processes. By integrating advanced sampling techniques with machine learning-driven intelligence, we empower organizations to navigate the complexities of large-scale distributed cloud environments effectively.

Keywords

native-cloud applications; application monitoring; distributed tracing; trace sampling; application troubleshooting; root cause analysis; explainable artificial intelligence; rule-induction systems

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.