Recent work on neural scaling demonstrates consistent performance gains with increased data and model capacity, yet these improvements are typically assessed using surface-level metrics that do not capture factual reliability. In multi-document summarization (MDS), this limitation is particularly acute, as scaling has been shown to amplify hallucination and content distortion. In this paper, we investigate the empirical scaling behaviour of faithfulness-aware transformers under tightly controlled conditions, using LSHT as a fixed architectural and training baseline. Rather than proposing new scaling laws, we analyze how summarization quality, faithfulness and efficiency evolve as dataset size and model capacity are independently increased, while holding architecture, optimization, decoding and hardware constant. All experiments are conducted exclusively on the Multi-News benchmark to avoid cross-dataset confounds. Across ROUGE, coverage, repetition and faithfulness-oriented metrics, we show that lexical overlap and factual consistency follow distinct scaling dynamics. Faithfulness improves most rapidly during early data scaling (approximately 3–4% relative gain from 3k to 12k samples) but exhibits diminishing marginal returns at larger scales, whereas ROUGE continues to increase more smoothly. We further show that faithfulness is more sensitive to data diversity than to volume alone and identify practical scaling regimes that maximize faithfulness gains relative to computational cost. These results establish empirical expectations for scaling faithfulness-aware MDS systems and provide actionable guidance for reliable summarization under realistic resource constraints.