Submitted:
23 December 2025
Posted:
23 December 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
- proposes a taxonomy of failure modes and “trust ruptures” specific to SOC deployments;
- introduces an interaction model that maps uncertainty + provenance + explanation to analyst actions (triage, investigation, response, and post-incident learning);
- provides a metric set and evaluation blueprint that can be used before running full experiments, as a design and documentation scaffold.
2. Background: Trust, Calibration, and Automation Bias
2.1. Trust as a Calibrated, Task-Specific Relationship
2.2. Why SOC Conditions Amplify Miscalibration
- Alert overload encourages shortcut decisions and deference to scores;
- Asymmetric costs (false negatives can be catastrophic; false positives erode attention);
- Adversarial pressure enables evasion and poisoning attempts;
- Non-stationarity causes drift in user behavior, infrastructure, and attacker tactics [5];
- Tool heterogeneity means analysts fuse evidence from SIEM, EDR, NDR, IAM logs, and threat intel feeds, each with different quality.
2.3. Responsible AI Expectations in Security Tooling
3. Problem Framing: Trust Failures in SOC AI
4. Threat Model: Adversaries Targeting Analyst–AI Trust
4.1. Adversary Goals
- Evasion: perform malicious actions while keeping AI scores low and explanations benign.
- Disruption: create alert storms and false positives to exhaust analysts and reduce trust.
- Trust hijacking: induce analysts to over-rely on AI recommendations (or to ignore them) at critical moments.
4.2. Attack Surfaces
- Telemetry manipulation: log tampering, time skew, or selective suppression of events.
- Feature and pipeline fragility: exploiting brittle proxies the model learned (environment-specific shortcuts).
- Poisoning and feedback loops: influencing labels through staged “benign” outcomes or noisy tickets.
- Explanation gaming: triggering feature patterns that yield comforting explanations even when behavior is malicious.
| Strategy | How it breaks trust | Defensive control |
|---|---|---|
| Alert flooding | Analysts learn to ignore alerts; disuse becomes rational | Rate-limit, deduplicate, and surface “new pattern” alerts separately |
| Low-and-slow evasion | Model sees weak signals; confidence appears high due to proxy cues | Uncertainty triggers, abstention, and cross-source corroboration |
| Telemetry gaps | False negatives; analysts assume “no news is good news” | Integrity checks, missing-data flags, fallback rules |
| Poisoned feedback | Retraining reinforces attacker-shaped labels | Label provenance, gated updates, anomaly checks on training data |
| Explanation mimicry | Explanations appear benign even for malicious activity | Adversarial testing of explanations; multi-view evidence packages |
5. Analyst–AI Trust Interaction Model

5.1. Evidence Package: What the Analyst Needs at the Moment of Decision
- Local explanation: which signals/features drove the score (appropriate abstraction level);
- Uncertainty: confidence intervals, calibration cues, or “insufficient evidence” flags;
- Provenance: data sources, enrichment lineage, model version, and time window;
- Counter-evidence prompts: checks the analyst should perform (“verify process tree”, “confirm sign-in location”);
- Action safety: recommended next steps with blast-radius hints (e.g., isolate host vs. observe).
5.2. Guardrails: Preventing Blind Automation
- require explicit analyst confirmation for high-impact actions;
- implement rate limits on automated containment to avoid cascading outages;
- ensure fallback modes (rule-based or conservative thresholds) during telemetry outages;
- record audit trails for every automated suggestion and analyst override.
6. SOC Workflow Mapping: Where Trust Decisions Actually Happen
6.1. Triage: Fast Ranking Under Uncertainty
6.2. Investigation: Causal Reconstruction and Hypothesis Testing
6.3. Response: High-Impact Actions with Blast Radius
6.4. Post-Incident Learning: Closing the Loop Without Contaminating Labels
| Stage | Minimum evidence package | Automation boundary |
|---|---|---|
| Triage | Top drivers, quick falsification check, uncertainty flag, source list | Assist/recommend; avoid auto-close unless confidence is high |
| Investigation | Provenance, timeline, correlated entities, explanation at multiple depths | Assist; semi-automated enrichment and graphing are safe |
| Response | Impact estimate, blast radius, alternative actions, required confirmations | Automate only low-risk actions; approvals for high-impact steps |
| Learning | Outcome summary, label confidence, drift notes, model/version references | No autonomous retraining; gated updates and audits |
7. Trust Calibration Metrics (Beyond Accuracy)
- Calibration error: do predicted probabilities match empirical outcomes? (e.g., ECE) [7];
- Selective prediction utility: performance when the model is allowed to abstain under uncertainty;
- Evidence adequacy: analyst-rated usefulness of explanations for the specific decision (triage vs. response);
- Override rate and rationale: frequency and reasons analysts reject recommendations (signal of misfit);
- Time-to-decision and error cost: speed improvements without increasing harmful errors.
8. Design Principles for SOC-Ready Trustworthy AI
8.1. Principle 1: Make Uncertainty Explicit and Actionable
- confidence bands or calibrated probabilities;
- uncertainty categories (“data missing”, “out-of-distribution”, “conflicting signals”);
- abstention pathways (route to manual triage or request more telemetry).
8.2. Principle 2: Match Explanation to the Analyst Task
8.3. Principle 3: Engineer for Drift and Operational ML Risk
- drift monitors on input distributions and alert rates;
- canary deployments and rollback plans;
- model cards / change logs accessible to analysts;
- periodic red-team evaluation for evasion resilience.
8.4. Principle 4: Encode Safe Automation Boundtaries
- Tier 0 (assist): summarize evidence and propose queries;
- Tier 1 (recommend): suggest actions with impact estimates;
- Tier 2 (automate): only for low-risk actions with tight constraints.
9. Blueprint for an Evaluation (Without Running Experiments Yet)
- Operational simulation: replay real SOC log sequences with injected drift and telemetry faults to test robustness and guardrails;
10. Worked Examples (Conceptual)
10.1. Phishing-to-Account-Takeover
- show provenance (email gateway verdict + identity logs + device posture);
- provide uncertainty flags if mailbox telemetry is incomplete or delayed;
- explain top signals (unusual sender infrastructure, new device, abnormal token scope);
- recommend safe actions (password reset, token revocation, conditional access) with explicit confirmation.
10.2. Endpoint Anomaly During Patch Windows
10.3. Insider-like Behavior with Ambiguous Intent
11. Governance and Documentation
- model purpose, limitations, and known failure modes;
- data lineage and retention constraints;
- incident playbooks describing when to override AI;
- audit policy for automated actions.
12. Related Work and Positioning
13. Trust Repair: What to Do After the Model Is Wrong
- Detect the rupture: spikes in override rate, sudden alert distribution changes, or analyst feedback indicating “nonsense” explanations.
- Triage the root cause: separate data-quality breaks from model drift and from attacker evasion. Provenance metadata accelerates this.
- Mitigate safely: tighten automation boundaries, enable conservative thresholds, or fall back to rules while maintaining visibility.
- Communicate clearly: publish a short internal incident note describing what happened, scope, and temporary guidance for analysts.
- Verify the fix: run a regression suite, replay recent traffic, and validate calibration before restoring automation.

14. Practical Implementation Checklist (SOC-Ready)
14.1. Interface and Analyst Experience
- Provide a one-line “why” plus a one-line “what to check next” for every high-severity alert.
- Expose uncertainty categories (missing telemetry, out-of-distribution, conflicting signals) instead of only a confidence score.
- Show model/version and feature availability so analysts can reason about what the system could have seen.
- Record analyst overrides with short reason codes to create a measurable feedback channel.
14.2. Engineering and Monitoring
- Track input distribution drift and alert-rate drift; alert on sudden shifts.
- Maintain a regression suite of representative incidents and benign activity; replay before every model update.
- Implement canary releases and rollbacks for model updates, the same way you would for high-risk code.
- Maintain secure logging and integrity checks to reduce the chance of telemetry manipulation.
14.3. Governance and Documentation
- Publish a “model card” for SOC users: purpose, limitations, failure modes, and safe usage guidance.
- Define which actions can be automated at each confidence/uncertainty level and enforce this in tooling.
- Ensure auditability: who saw what, what the model recommended, what the analyst did, and why.
15. Scope for Future Empirical Work
16. Limitations
17. Conclusion
References
- Raja Parasuraman and Victor Riley. Humans and automation: Use, misuse, disuse, abuse. Human Factors 1997, 39(2), 230–253. [CrossRef]
- Mosier, K.L.; Skitka, L.J. Human decision makers and automated decision aids: Made for each other? In Automation and Human Performance: Theory and Applications; 1996. [Google Scholar]
- Floridi, L.; et al. AI4People—An ethical framework for a good AI society. Minds and Machines 2018, 28, 689–707. [Google Scholar] [CrossRef] [PubMed]
- European Commission High-Level Expert Group on AI. Ethics Guidelines for Trustworthy AI. 2019. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.
- D. Sculley, G. Holt, D. Golovin, et al. “Hidden technical debt in machine learning systems.” In Advances in Neural Information Processing Systems (NeurIPS), 2015. https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.
- Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. https://arxiv.org/abs/1702.08608. [CrossRef]
- Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. “On calibration of modern neural networks.” In Proceedings of ICML, 2017. https://arxiv.org/abs/1706.04599.
- M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani. “A detailed analysis of the KDD CUP 99 data set.” In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009.
- N. Moustafa and J. Slay. “UNSW-NB15: A comprehensive data set for network intrusion detection systems.” In Proceedings of the Military Communications and Information Systems Conference (MilCIS), 2015.
- I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani. “Toward generating a new intrusion detection dataset and intrusion traffic characterization.” In Proceedings of the International Conference on Information Systems Security and Privacy (ICISSP), 2018.
| Category | What it looks like | Trust impact / typical consequence |
|---|---|---|
| Data quality break | Missing fields, time skew, duplicated events, inconsistent enrichment | False spikes/drops; analysts lose faith in the system; silent false negatives |
| Distribution shift | New software rollouts, IAM policy changes, remote work patterns | Score meaning changes; explanation becomes misleading; increased false positives |
| Adversarial evasion | Living-off-the-land, benign-appearing behavior, log manipulation | Over-trust leads to missed intrusion; model confidence becomes uninformative |
| Label leakage / proxy learning | Model learns environment-specific shortcuts (e.g., “admin hosts = malicious”) | High offline accuracy but poor generalization; brittle trust |
| Pipeline / deployment mismatch | Training features differ from production features; version skew | Unexpected behavior; hard-to-debug incidents; trust collapse after surprises |
| Automation bias | Analyst defers to AI recommendation despite contrary evidence | Premature containment/escalation; reduced investigative rigor [2] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
