This paper presents a Load-Dependent Multimodal Vibration Signal Enhancement and Fusion Framework (LD-MVSEFF) for load-specific condition monitoring, building on the Customised Load Adaptive Framework (CLAF). The proposed approach enhances the classification of CLAF load-dependent fault subclasses namely Healthy, Mild, Moderate, and Severe by integrating complementary information from raw vibration signals and signal-encoded representations. Three input channels are employed, combining time–frequency domain features with Continuous Wavelet Transform (CWT) and Gramian Angular Difference Field (GADF) image encodings, with each channel independently trained and evaluated to identify its most effective classifiers. To address the reduced separability of the Mild and Moderate fault subclasses under varying load conditions, a weighted decision fusion strategy is introduced, assigning classifier contributions according to their class-specific strengths. Experimental evaluation over five runs demonstrates high and stable performance, with the best configuration achieving an overall accuracy of 99.04% ± 0.22% and an average training time of 18 min and 30 s. The results confirm the effectiveness of LD-MVSEFF as a robust multimodal methodology for load-specific condition monitoring.