Motion artifacts corruptwearable ECG signals and generate false alarms of arrhythmias, limiting the clinical adoption of continuous cardiacmonitoring. We present a dual-streamdeep learning framework formotionrobust binary arrhythmia classification throughmulti-modal sensor fusion andmulti-SNR training. ResNet-18 processes ECG spectrograms,while CNN-BiLSTMencodes accelerometermotion patterns; attention-gated fusion with gate diversity regularization adaptively weightsmodalities based on signal reliability. Training in MIT-BIHdata augmented at three noise levels (24, 12, 6 dB) enables noise-invariant learningwith successful generalization to unseen conditions. The framework achieves 99.5%accuracy under clean signals, gracefully degrading to 88.2%at extreme noise (-6 dB SNR)—a 46%improvement over trainingwith single-SNR. The high gate diversity (σ > 0.37) confirms adaptive context-dependent fusion. With 0.09% false positive rate and real-time processing (238 beats/second), the systemprovides practical continuous arrhythmia screening, establishing the foundation for hierarchical monitoring systems where binary screening activates detailed multi-class diagnosis.