Submitted:
24 March 2026
Posted:
25 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background
2.1. Microservice-based Systems and Their Evolution
2.2. Formal Verification and Rigorous Analysis Techniques
2.3. Continuous Verification in DevOps and Cloud-native Settings
3. Methodology
3.1. Review Protocol and Research Questions
- RQ1: What formal modelling and verification approaches (e.g., TLA+, Petri nets, session types, SMT-based techniques, process algebras) have been applied to microservice-based systems, and what evidence is reported about their effectiveness and scalability?
- RQ2: How does architectural or API evolution in microservice-based systems impact verifiability (e.g., specification drift, protocol violations), and what practices or techniques are proposed to mitigate these impacts?
- RQ3: Which repository- or artefact-derived signals (e.g., commits, API diffs, service dependency graphs) are used to infer or update formal models or specifications automatically (i.e., an MSR angle)?
- RQ4: What tools and pipelines support continuous or incremental formal verification in CI/CD environments for microservice-based systems?
3.2. Literature Search Strategy
- Microservices / cloud-native (e.g., “microservic*”, “micro-service*”, “micro service*”, “MSA”),
- Evolution / change (e.g., “evolut*”, “version*”, “refactor*”, “API change”, “migration”), and
- Verification / formal analysis (e.g., “formal verification”, “model checking”, “runtime verification”, “type system”, “session type*”, “SMT”, “TLA+”, “Alloy”, “Petri net*”, and specific tools such as UPPAAL, CSP, Spin).
| Indexer | Final Query | Sources | Count |
|---|---|---|---|
| IEEE Xplore | ("microservic*" OR "micro-service*" OR "micro service*" OR MSA) AND (evolut* OR version* OR refactor* OR "api version*" OR "api change*" OR compatib* OR migration) AND (verify* OR "formal verification" OR "model checking" OR "runtime verification" OR "formal specification" OR "formal analysis" OR "type system" OR "session type*" OR SMT OR "TLA+" OR Alloy OR "Petri net*" OR UPPAAL OR CSP OR Spin OR Promela) | IEEE Xplore | 53 |
| ACM DL | [[All: "microservice"] OR [All: "microservices"]] AND [[All: evolut*] OR [All: "api evolution"] OR [All: "architecture evolution"] OR [All: co-evolution]] AND [[All: "model checking"] OR [All: "formal verification"] OR [All: tla+] OR [All: uppaal] OR [All: alloy] OR [All: "petri net*"] OR [All: "session type*"] OR [All: csp]] AND [E-Publication Date: (01/01/2010–12/31/2025)] | ACM Digital Library | 592 |
3.3. Inclusion/Exclusion Criteria
| Inclusion Criteria | Exclusion Criteria |
|---|---|
| I1 Study is internal to the cloud-native domain. We are only interested in analyzing change impacts on cloud-native systems. | E1 Study is about physical networking or other fields not directly related to cloud-native systems, or study is not about distributed systems. |
| I2 Study is about analyzing changes related to microservice systems. | E2 Study is not clearly related to at least one aspect of the specified research questions. |
| I3 Study comes from an acceptable source such as a peer-reviewed scientific journal, conference, symposium, or workshop. | E3 Secondary literature reviews. |
| I4 Study reports on methods, tools, measurements, or any techniques concerning change analysis on microservice systems. | E4 Fault analysis or trace analysis papers intended for system robustness rather than evolution or maintenance. |
| I5 Study describes solid evidence on microservice change analysis, for instance, by using rigorous analysis, experiments, case studies, benchmarks, or simulations. | E5 Study did not undergo a peer-review process, such as a non-reviewed journal, magazine, or conference paper, master theses, books, and doctoral dissertations (in order to ensure a minimum level of quality). |
| E6 Study has been extended in another paper included by the SLR. |
3.4. Study Selection Process and Statistics
- Initial: This phase records the raw number of results returned by the final search string in each database before any screening. We simply recorded the count down by each site after applying the 2010-2025 publication window. This yielded 53 records from IEEE Xplore and 592 from the ACM Digital Library, for a total of 645 unique results at the outset.
- Proceedings: The proceedings stage filters through items that are not individual research articles, such as entire “Proceedings of …” volumes, front matter, prefaces, or editorials. These were identified by checking the record type and book title fields (for example, entries whose title or type clearly indicated they were a proceedings volume rather than a standalone paper). Once screened, we then moved forward to the next phase.
- Removal (duplicate elimination): The removal stage eliminates exact and near-duplicate records within and across databases. To do this, we exported all remaining items into Zotero and used its duplicate-detection functionality to find records with identical or highly similar titles, DOIs, and author lists. Duplicates were merged or removed so that each logical paper appeared only once in the dataset.
-
Filtering (abstract screen): The filtering stage applies to a fast abstract screening against the inclusion and exclusion criteria (Section 3.3). At this point we focused on three main questions:
- (i)
- does the paper clearly concern microservices or cloud-native systems,
- (ii)
- is there any notion of evolution or change (e.g., architecture, API, configuration), and
- (iii)
- is there a link to formal, rigorous, or systematic analysis? Each record was then filtered to provide a higher-level initial filtering of our query results.
- Full Read: The full read stage corresponds to full text screening. For each study kept from the initial filtering, we carefully read the paper to verify clear alignment with our research questions and criteria: explicit microservice/cloud-native context, analysis of evolution or change, and the presence of a formal or otherwise rigorous verification/analysis method with sufficient methodological detail. Papers that lacked scope (e.g., generic SOA without microservice specificity) or that did not describe their methods in enough detail to judge or reuse them were excluded at this stage.
- Quality Assessment: For our final stage of screening, we then performed a quality assessment of screening through each paper we had left at this stage to determine the significance the paper had to our study. We applied the 15-item, we weighed QA checklist and computed each study’s overall quality score. Studies below the predefined threshold or clearly failing key criteria were excluded.
- Extract: With the screening stages completed, we began the extraction stage. The extraction stage involves perfuming structured data extraction for all data items within Section 3.6. Concretely, this means the paper provided enough information to identify: the formalism/engine used, the types of evolution of events considered, the evidence and metrics used in evaluation, and any DevOps/CI/CD aspects. If a paper was conceptually relevant but too vague or incomplete to fill these fields, it did not progress.
- Snowballing: The snowballing stage captures additional candidate papers found through backward (reference list) and forward (cited-by) snowballing from the final studies we have. Each snowballed paper was re-subjected to the same multi-stage screening pipeline as our initial query results were.
3.5. Quality Assessment
| QA ID | Question / Criterion | Purpose / Focus Area | Score Options | Notes / Examples |
|---|---|---|---|---|
| QA1 | Study explicitly focuses on cloud-native or microservice-based systems | Relevance | 1 / 0.5 / 0 | Exclude if generic SOA or unrelated systems |
| QA2 | Analyzes architectural or API evolution | Evolution focus | 1 / 0.5 / 0 | Include if discusses versioning, dependencies, or maintenance |
| QA3 | Involves formal methods (e.g., TLA+, Petri nets, SMT, Alloy) | Verification methods | 1 / 0.5 / 0 | Must mention at least one formal verification approach |
| QA4 | Situated within cloud-native or distributed software engineering context | Context fit | 1 / 0.5 / 0 | Not physical networking or hardware systems |
| QA5 | Describes methods/tools for change or evolution analysis | Methodological clarity | 1 / 0.5 / 0 | Must describe reproducible approach |
| QA6 | Reports empirical validation (experiment, benchmark, case study) | Evidence strength | 1 / 0.5 / 0 | Scores higher for reproducible experiments |
| QA7 | Identifies data sources (repo, CI/CD, APIs, etc.) | Data transparency | 1 / 0.5 / 0 | Check if data or tools are publicly available |
| QA8 | Evaluates effectiveness or scalability of verification method | Internal validity | 1 / 0.5 / 0 | For example, scalability to large systems |
| QA9 | Discusses limitations or threats to validity | Bias control | 1 / 0.5 / 0 | Helps assess internal/external validity |
| QA10 | Proposes new model, framework, or tool | Innovation | 1 / 0.5 / 0 | Must contribute something original |
| QA11 | Extends existing work with new evidence | Novelty | 1 / 0.5 / 0 | Avoids duplicate publications |
| QA12 | Provides actionable findings or guidelines for evolving microservices | Applicability | 1 / 0.5 / 0 | Useful for practitioners or future research |
| QA13 | Peer-reviewed publication (journal, conference, symposium, workshop) | Quality control | 1 / 0.5 / 0 | Exclude gray literature |
| QA14 | Written in English and accessible in full text | Accessibility | 1 / 0.5 / 0 | Ensure you can fully assess it |
| QA15 | Provides reproducible results or available tools/datasets | Replicability | 1 / 0.5 / 0 | GitHub link, dataset, or method details |
- 1 – clearly satisfied
- 0.5 – partially satisfied
- 0 – not satisfied or unclear
-
Topical relevance and context (QA1-QA4)
- o
- QA1: Explicit focus on cloud-native or microservice-based systems
- o
- QA2: Analysis of architectural or API evolution
- o
- QA3: Use of formal methods or explicit verification techniques (e.g., TLA+, Petri nets, SMT, Alloy)
- o
- QA4: Placement within a cloud-native or distributed software engineering context (as opposed to pure hardware/physical networking)
-
Methodological clarity and evidence (QA5-QA9)
- o
- QA5: Methods/tools for change or evolution analysis are clearly described and reproducible
- o
- QA6: Presence of empirical validation (experiment, benchmark, case study)
- o
- QA7: Data sources are identified (e.g., repositories, CI/CD pipelines, APIs)
- o
- QA8: Evaluation of effectiveness or scalability of the method
-
Contribution and applicability (QA10-QA12)
- o
- QA10: Proposal of a new model, framework, or tool
- o
- QA11: Extension of existing work with new evidence (not duplicate publication)
- o
- QA12: Actionable findings or guidelines for evolving microservices (useful to practitioners or future research)
-
Venue quality, accessibility, and reproducibility (QA13-QA15)
- o
- QA13: Peer-reviewed publication (journal, conference, symposium, or workshop)
- o
- QA14: Written in English and accessible in full text
- o
- QA15: Provision of reproducible results or artifacts (e.g., GitHub repository, dataset, or detailed method description)
| QA ID | Criterion | Weight |
|---|---|---|
| QA1 | Microservice/Cloud-native relevance | 0.075 |
| QA2 | Evolutionary aspect (API/architecture) | 0.075 |
| QA3 | Formal methods/verification use | 0.075 |
| QA4 | Cloud-native or distributed content | 0.075 |
| QA5 | Methods/tools clearly described | 0.0875 |
| QA6 | Empirical validation (experiment, benchmark, case study) | 0.0875 |
| QA7 | Data sources identified (repos, CI/CD, APIs) | 0.0875 |
| QA8 | Evaluates effectiveness or scalability | 0.0875 |
| QA9 | Limitations or threats to validity discussed | 0.0875 |
| QA10 | Proposes new model/framework/tool | 0.05 |
| QA11 | Extends existing work with new evidence | 0.05 |
| QA12 | Provides actionable findings/guidelines | 0.05 |
| QA13 | Peer-reviewed publication | 0.05 |
| QA14 | Written in English and full text available | 0.05 |
| QA15 | Reproducible results/tools available | 0.05 |
3.6. Data Extraction and Synthesis
| Authors & Year | Study Type | System Context | Method Used | Verification/ Evaluation |
Contribution Summary |
|---|---|---|---|---|---|
| Biswas et al. (2023) [3] | Empirical + Tool | Microservice architecture models | Model-to-code consistency checking | Detects architecture drift | Continuous consistency checking reduces divergence |
| Namyar et al. (2025) [13] | Formal verification + system | Cloud control-plane | TLA+ model checking | Safety verification; scalability evaluation | Presents ZENITH: formally verified control plane |
| Copei et al. (2023) [8] | Static analysis | Microservice implementations | Static code analysis | Detects API mismatches, structural violations | Static analysis improves microservice implementation quality |
| Cerny et al. (2018) [7] | Conceptual taxonomy | Microservice architecture | Contextual analysis | Conceptual reasoning | Defines contextual understanding of microservice architectures |
| Boza et al. (2019) [4] | Performance experiment | Container-based deployments | Empirical performance evaluation | Measures resource variability and impact | Argues for performance-aware container deployment |
| Anisetti et al. (2020) [1] | DevOps certification | Cloud service pipelines | Continuous certification pipeline | Certification via automated tests | Introduces continuous certification for DevOps |
| Shahin et al. (2017) [14] | Empirical study | Microservice-based DevOps teams | Interviews and qualitative analysis | Process/coordination evaluation | Shows how CD impacts team structures and responsibilities |
| Burns et al. (2016) [5] | Systems design | Cloud-scale cluster systems | Systems architecture analysis | Observational; production-scale reliability | Summarizes Borg, Omega, and Kubernetes evolution |
| Ferrara et al. (2024) [9] | Survey | Software verification | Verification methods survey | Discusses scalability limits | Identifies major challenges in software verification |
| Soares et al. (2023) [16] | Survey | Software architecture evaluation | Continuous evaluation literature review | Identifies evaluation gaps | Shows trends in continuous architecture evaluation |
| Cerny et al. (2024) [6] | Static analysis + visualization | Microservice codebases | Architecture recovery from code | Structural violation detection | Generates updated architecture models from code |
| Stillwell & Coutinho (2015) [17] | DevOps workflow | Microservice-style cloud platform | DevOps pipeline (Ansible, Docker, Vagrant) | Automated unit + integration testing | Shows DevOps accelerates integration across distributed teams |
| Mampage et al. (2022) [11] | Survey | Serverless / FaaS systems | Resource management taxonomy | QoS / performance discussion | Provides holistic overview of resource management challenges |
| Merlino et al. (2019) [12] | Systems/platform | Edge–Fog–Cloud hierarchical systems | Workload modeling and placement | Empirical performance evaluation | Proposes workload engineering patterns across layers |
| Silva et al. (2024) [15] | Runtime testing method | Service-based applications | Self-adaptive testing framework | In-the-field fault detection | Introduces dynamic adaptive testing in deployed environments |
| Anisetti et al. (2018) [2] | Security testing | Composite SOA/service compositions | Test-based certification | Security violation detection | Connects testing evidence to certification for evolving services |
| Camilli (2020) [19] | Formal verification | Microservice process flows | Continuous formal verification | Model checking | Continuous formal verification of microservice-based process flows |
| Lamport (2009) [18] | Formal specification | Distributed systems | PlusCal/TLA+ specification | Protocol correctness | Introduces PlusCal algorithm language for TLA+ |

4. Results
4.1. Formal Modeling and Verification Approaches in Microservice-Based Systems
4.1.1. Model Checking Approaches
4.1.2. Control-Plane Verification and Fault Tolerance
4.1.3. Formal Verification of Service Behavior, APIs, and Process Flows
4.1.4. Static Code Analysis as a Verification Aid
4.1.5. General Verification Barriers and Challenges
4.2. Architectural / API Evolution and Impact on Verifiability
4.2.1. Architectural Drift & Inconsistency
4.2.2. Evolution of Microservice Topology – Complexity to Verification Failure
4.2.3. Organizational & Process Evolution Impacts
4.2.4. Self-Adaptive Testing for Evolving Systems
4.3. Repository-Derived Signals for Updating Models
4.3.1. Architecture Extraction from Static Code
4.3.2. Tracking Evolution via Commits & Dependency Changes
4.3.3. Opportunities for Formal Model Generation
4.4. Continuous / Incremental Verification in CI / CD
4.4.1. Continuous Certification in DevOps
4.4.2. Continuous Certification Frameworks for Cloud and Microservices
4.4.3. Runtime / Field-Based Continuous Testing
4.4.4. Integrating Architecture Validation into CI / CD
4.4.5. Formal Verification Pipelines
5. Discussion
6. Threats to Validity
- (i).
- Search Bias: The search strategy we employed focused primarily on IEEE Xplore, SpringerLink and the ACM Digital Library, which are strong repositories for software engineering research. Moreover, relevant studies may exist in other venues such as Elsevier, or practitioner-oriented outlets. To limit the scope, we risk omitting contributions that address microservice evolution and verification from different perspectives, particularly industrial or interdisciplinary work. Future reviews should broaden the search base and consider complementary indexing services to reduce this bias.
- (ii).
- Selection Bias: A multi-stage screening process (abstract filtering, full-text reading, and quality assessment) was designed to be rigorous, however inevitably involved subjective judgment. For instance, borderline cases where microservices were mentioned but not central to the study required interpretation. Even though all of us participated in screening, differences in interpretation could have influenced which studies were retained.
- (iii).
- Publication Bias: Our investigation restricted the dataset to peer-reviewed venues to ensure methodological quality. Although this strengthens reliability, it excludes gray literature such as technical reports, industrial white papers, or blog posts. These sources often contain practical insights into microservice evolution and verification practices, especially in fast-moving DevOps environments. The absence of those insights may skew the review toward academic prototypes rather than industrial realities.
- (v).
- Quality Assessment Subjectivity: Our QA checklist provided a structured way to evaluate studies; however, scoring inevitably involves interpretation. For instance, assessing whether a method was “clearly reproducible” or whether evidence was “sufficiently empirical” required judgment calls. Minor reporting issues may have led to lower scores for otherwise relevant studies, potentially affecting synthesis.
- (vi).
- Generalizability: Many included studies rely on small-scale case studies, illustrated examples, or academic prototypes. Although these provide valuable insights, they may not generalize large-scale, polyglot, industrial microservice deployments. The lack of industrial-scale validation limits the external validity of our conclusions. To explicitly recognize these threats, we aim to provide transparency and encourage future work to mitigate them through broader search strategies, inclusion of industrial evidence, and open data practices.
7. Conclusions
- (i)
- Formal Methods in Microservices (RQ1): A various number of formal techniques have been applied, including Petri nets for workflow modeling, TLA+ for protocol correctness, SMT solvers for configuration checking, and session types for communication safety. However, most evaluations remain confined to small-scale case studies, with limited evidence of scalability to industrial systems.
- (ii)
- Impact of Evolution (RQ2): The Architectural drift, API versioning, and dependency changes pose significant challenges to verifiability. Few approaches directly address specifications of drift or automated mitigation of protocol violations. This gap underscores the need for techniques that can adapt to formal models as systems evolve.
- (iii)
- Signals for Model Updates (RQ3): The Repository-derived signals such as commits, API diffs, and dependency graphs are beginning to be leveraged to infer or update formal models. While promising, automation remains immature, and integration with tool chains is limited.
- (iv)
- Continuous Verification in CI/CD (RQ4): There exists growing interest in embedding verification into DevOps pipelines, moreover, tool support is fragmented and empirical evidence of industrial adoption is scarce. Therefore, continuous verification remains more of a vision than a widely realized practice.
- The intrinsic approach integrates formal methods from the outset. Models are generated, maintained, and evaluated throughout the life development cycle. This requires strong support from frameworks, tools, and organizational processes, but offers higher fidelity and consistency. Its drawback is the high initial investment and cultural shift required.
- The extrinsic approach focuses on recovering models from code repositories, logs, and runtime artifacts after the system has been built. It requires less upfront support but may suffer from model accuracy issues and incomplete coverage of evolving behaviors.
- •
- The empirical grounding via large-scale industrial case studies that validate scalability and practicality.
- •
- Through integration with CI/CD pipelines to enable continuous verification as part of everyday DevOps workflows.
- •
- Then, open datasets and reproducible tooling to foster collaboration, benchmarking, and transparency.
- •
- The automation of model updates using repository and runtime signals to keep specifications aligned with evolving systems.
- •
- Inclusive adoption of intrinsic and extrinsic approaches, tailoring verification strategies to organizational maturity, resource availability, and system complexity.
Funding
Data Availability Statement
Conflicts of Interest
References
- Anisetti, M.; Ardagna, C.A.; Gaudenzi, F.; Damiani, E. A Continuous Certification Methodology for DevOps. *MEDES ’19: Proceedings of the 11th International Conference on Management of Digital EcoSystems*, 205–212. [CrossRef]
- Anisetti, M.; Ardagna, C.; Damiani, E.; Polegri, G. Test-Based Security Certification of Composite Services. *ACM Transactions on the Web* **2018**, *13*(1), 1–43. [CrossRef]
- Biswas, P.; Morgenstern, A.; Antonino, P.O.; Capilla, R.; Nakagawa, E.Y. Continuous Evaluation of Consistency in Software Architecture Models. *ECSA 2023 Proceedings*, 141–149. [CrossRef]
- Boza, E.F.; Abad, C.L.; Narayanan, S.P.; Balasubramanian, B.; Jang, M. A Case for Performance-Aware Deployment of Containers. *WOC ’19 Proceedings*, 25–30. [CrossRef]
- Burns, B.; Grant, B.; Oppenheimer, D.; Brewer, E.; Wilkes, J. Borg, Omega, and Kubernetes. *Communications of the ACM* **2016**, *59*(5), 50–57. [CrossRef]
- Cerny, T.; Abdelfattah, A.S.; Yero, J.; Taibi, D. From Static Code Analysis to Visual Models of Microservice Architecture. *Cluster Computing* **2024**, *27*(4), 4145–4170. [CrossRef]
- Cerny, T.; Donahoo, M.J.; Trnka, M. Contextual Understanding of Microservice Architecture. *ACM SIGAPP Applied Computing Review* **2018**, *17*(4), 29–45. [CrossRef]
- Copei, S.; Schreiter, M.; Zündorf, A. Improving the Implementation of Microservice-Based Systems with Static Code Analysis. *Lecture Notes in Business Information Processing* **2023**, *489*, 31–38. [CrossRef]
- Ferrara, P.; Arceri, V.; Cortesi, A. Challenges of Software Verification. *International Journal on Software Tools for Technology Transfer* **2024**, *26*, 421–430. [CrossRef]
- Khamespanah, E.; Jaghoori, M.M. 20 Years of Actor Model Checking with Rebeca. *LNCS* **2025**, *15560*, 26–43. [CrossRef]
- Mampage, A.; Karunasekera, S.; Buyya, R. Resource Management in Serverless Computing. *ACM Computing Surveys* **2022**, *54*(11s), 1–36. [CrossRef]
- Merlino, G.; Dautov, R.; Distefano, S.; Bruneo, D. Workload Engineering in Edge–Fog–Cloud Computing. *ACM Transactions on Internet Technology* **2019**, *19*(2), 1–22. [CrossRef]
- Namyar, P.; Ghavidel, A.; Zhang, M.; Madhyastha, H.V.; Ravi, S.; Wang, C.; Govindan, R. ZENITH: Formally Verified Highly Available Control Plane. *SIGCOMM ’25 Proceedings*, 409–433. [CrossRef]
- Shahin, M.; Zahedi, M.; Babar, M.A.; Zhu, L. Adopting Continuous Delivery and Deployment. *EASE ’17 Proceedings*, 384–393. [CrossRef]
- Silva, S.; Pelliccione, P.; Bertolino, A. Self-Adaptive Testing in the Field. *ACM Transactions on Autonomous and Adaptive Systems* **2024**, *19*(1), 1–37. [CrossRef]
- Soares, R.C.; Capilla, R.; dos Santos, V.; Nakagawa, E.Y. Trends in Continuous Evaluation of Software Architecture. *Computing* **2023**, *105*(9), 1957–1980. [CrossRef]
- Stillwell, M.; Coutinho, J.G. A DevOps Approach to Integration of Software Components. *QUDOS 2015 Proceedings*, 1–6. [CrossRef]
- Lamport, L. The PlusCal Algorithm Language. *ICALP 2009 Proceedings*, pp. 36–60. [CrossRef]
- Camilli, M. Continuous Formal Verification of Microservice-Based Process Flows. *ECSA 2020 Proceedings*, pp. 420–435. [CrossRef]
- Netflix Conductor Documentation. Available online: https://github.com/Netflix/conductor (accessed on 12 December 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).