LLM-Driven Multi-Modal Biological Spectrum Analysis via Contribution-Aware Dynamic Fusion and Flow-Based Distillation

Kentaro Yamada; Nicholas Campbell

doi:10.20944/preprints202604.2069.v1

Submitted:

28 April 2026

Posted:

30 April 2026

You are already at the latest version

Abstract

The integration of mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy holds great promise for comprehensive biomolecular profiling, yet existing computational approaches are limited to single modality analysis, employ static fusion strategies, and incur prohibitive inference costs. We propose SpectraLLM, a unified large language model-driven framework for multi-modal biological spectrum analysis. SpectraLLM introduces a modality-agnostic spectral encoder that projects both MS and NMR spectra into a shared token space, a contribution-aware dynamic multi-modal balance mechanism that adaptively weights each modality per sample, a flow-based knowledge distillation strategy that compresses the teacher model to a compact student with 4.3× lower latency, and parameter-efficient transfer learning via lightweight adapters for rapid domain adaptation. Evaluated on three large-scale benchmarks—MetaboSpectrum-10K, ProteinSpectra-5K, and CellNMR-3K—SpectraLLM achieves state-of-the-art performance, including an AUC of 0.947 for biomarker identification and 96.1% teacher performance retention after distillation. In a clinical case study on early-stage pancreatic cancer detection, SpectraLLM achieves an AUC of 0.961, substantially outperforming both the clinical standard CA 19-9 and existing computational methods, demonstrating the potential of LLM-driven multi-modal spectral analysis for precision medicine.

Keywords:

mass spectrometry

;

NMR spectroscopy

;

multi-modal fusion

;

knowledge distillation

;

biomarker discovery

Subject:

Biology and Life Sciences - Biology and Biotechnology

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

LLM-Driven Multi-Modal Biological Spectrum Analysis via Contribution-Aware Dynamic Fusion and Flow-Based Distillation

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe