Large language models (LLMs) are increasingly explored for clinical documentation support, yet the influence of prompting architecture on documentation quality in complex longitudinal contexts remains poorly characterized. This controlled retrospective methodological study evaluated three prompting strategies—Single Prompt (SP), Section-Based Prompt (SBP), and Section-Based Prompt with Writing Refinement (SBP+W)—for generating inpatient rehabilitation discharge reports using OpenAI large language model (GPT-5.2).
Twenty anonymized rehabilitation cases involving prolonged hospital stays and multidimensional func-tional documentation were processed under standardized model conditions. AI-generated reports were compared with human-authored summaries. Two blinded board-certified rehabilitation physicians in-dependently evaluated outputs using a structured 4-point ordinal scale assessing structural integrity, clinical coherence, completeness, and readability. Inter-rater reliability was estimated with quadratic weighted Cohen’s kappa and bootstrap confidence intervals. Group differences were analyzed using non-parametric testing and exploratory multivariable modeling.
All LLM prompting strategies achieved significantly higher expert-rated quality scores than hu-man-authored reports (p < 0.01). SBP demonstrated the highest median performance and strongest regression effect, although differences among LLM-based strategies were not statistically significant after correction. Prompting strategy explained more variability in expert ratings than case-level factors.
Structured section-based prompting may represent a practical design lever for improving perceived quality in AI-assisted clinical documentation workflows.
Keywords: artificial intelligence; clinical documentation; discharge reports; large language models; medical writing; prompt architecture; prompt engineering; rehabilitation medicine.