Preprint
Article

This version is not peer-reviewed.

The Context Sensitivity Paradox: How Stakeholder Framing Shapes Moral Judgment in Humans and AI

Submitted:

12 February 2026

Posted:

24 February 2026

You are already at the latest version

Abstract
Do humans and artificial intelligence systems apply consistent ethical frameworks when making organizational decisions, or do morally contested contextual features systematically influence moral judgments? We conducted a mixed-methods experiment with 300 organizational leaders and three frontier AI models (GPT-4, Claude 3 Opus, Gemini Pro 1.5) responding to 240 systematically varied ethical scenarios. We collected 6,000 human responses (300 participants × 20 scenarios each) and generated 7,200 AI responses at temperature 0.7 (240 scenarios × 3 models × 10 repetitions), with additional sensitivity analyses at temperatures 0.3, 0.5, and 1.0.Analysis revealed substantial systematic variation in moral judgments: both humans and AI systems (at temperature=0.7, selected post-hoc to match human total variation levels) made different recommendations for structurally identical dilemmas based on how stakeholders were described (identifiable individuals vs. statistical aggregates: OR=2.08, p<.001), whether actions were framed as active causation versus passive allowance (d=0.63, p<.001), and temporal proximity of consequences (OR=1.52, p<.001).We decomposed this variation into three components: (1) structural consistency (agreement when only irrelevant features vary: M=0.85), (2) contextual responsiveness (variation attributable to debatable features: 22-24% of total variance), and (3) arbitrary residual variation (32-34% of variance). AI temperature parameter directly controls variation magnitude: at T=0.3, AI showed less variation than humans (0.26 vs. 0.42, p<.001); at T=0.7 (typical deployment setting), AI approximated human levels (0.41 vs. 0.42, p=.56); at T=1.0, AI exceeded human variation (0.49 vs. 0.42, p<.001) but with degraded coherence (91.6% vs. 97.2% at T=0.7). This temperature-dependence means we cannot claim AI inherently exhibits human-like variation; rather, temperature embeds implicit assumptions about desired reasoning patterns. Critically, contextual feature effects (identifiability, relational, temporal, action-omission) remained significant across all temperatures (all η²p > 0.08, p<.001), indicating robust patterns independent of overall variation levels.Humans exhibited more relational reasoning than AI (d=0.56, p<.001). Mediation analysis on a coded subsample of 800 responses revealed that relational reasoning partially explained contextual responsiveness differences (indirect effect β=0.043, 95% CI [0.028, 0.061]), accounting for 69% of human-AI differences. Critically, relational reasoning was associated with higher contextual responsiveness but lower arbitrary variation, suggesting systematic sensitivity rather than random inconsistency.Whether contextual responsiveness represents cognitive bias or appropriate moral sensitivity remains philosophically contested. Principlism interprets our findings as evidence of widespread moral reasoning failures; particularism interprets identical patterns as appropriate attention to morally relevant contextual details. Our data provide empirical constraints for this normative debate but cannot adjudicate it.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated