Preprint
Article

This version is not peer-reviewed.

Phenotypic Diversity in Multiple Sclerosis Can Be Represented by Four Additive Symptom Modules

Submitted:

12 June 2026

Posted:

12 June 2026

You are already at the latest version

Abstract
Background: Multiple sclerosis (MS) lacks a single invariant phenotypic core. Patients accumulate heterogeneous combinations of sensory, motor, cognitive, and autonomic impairments over time, reflecting lesions disseminated in time and space. Methods: We analyzed 4,617 de-identified neurology progress notes from 578 patients with MS at a single academic medical center. A large language model (GPT-5.2) categorized each note with respect to 17 non-mutually exclusive neurological phenotype features, and note-level features were aggregated into patient-level binary phenotype vectors. Non-negative matrix factorization (NMF) was applied to generate 2-, 3-, 4-, and 5-module solutions. For each rank, we calculated relative reconstruction error and module-level feature loadings. In the preferred 4-module solution, we derived patient-level module percentages, identified highly dominant (greater than 55%), near-pure (greater than 70%), and pure single-module profiles, and quantified admixture using Shannon entropy and the effective number of modules. Results: The 4-module solution was the most clinically interpretable. The four latent modules were sensory-visual-pain, ataxic-spastic-falls, cognitive-psychologic-fatigue, and autonomic-bladder-bowel, aligning closely with established functional systems in MS. By module dominance, 244 were considered sensory-visual-pain dominant, 128 ataxic-spastic-falls dominant, 138 autonomic-bladder-bowel dominant, and 68 cognitive-psychologic-fatigue dominant. Most patients exhibited admixed phenotypes, with the effective number of modules spanning approximately 1 to 4. Using pre-specified thresholds, 154 patients (26.6%) were highly dominant in a single module, 72 (12.5%) were near-pure, and 7 patients had pure single-module profiles. Purer phenotypes were predominantly sensory-visual-pain dominant. Conclusions: MS phenotypic diversity in routine clinical practice can be parsimoniously represented as mixtures of four latent symptom modules rather than as positions along a single severity axis. Most patients show substantial admixture of sensory, motor, cognitive, and autonomic involvement, but a minority exhibit relatively pure or strongly dominant module patterns. This modular representation provides an interpretable framework for quantifying MS phenotype and for generating testable hypotheses about biologically meaningful MS subtypes.
Keywords: 
;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated