In second language learning, research on multimodal input often assumes that when learners receive both audio and written text, comprehension becomes easier because cog-nitive load is reduced. This assumption, however, may not fully explain how multimodal input works in digital reading. The present study reexamines this assumption by investi-gating whether listening-reading input regulates learner engagement rather than simply lowering cognitive demand during digital EFL reading. Forty Korean university EFL learners were randomly assigned to either a text-only reading condition or a listen-ing-reading condition in which audio accompanied the written text. Vocabulary knowledge was measured using a 20-item test administered before and after the interven-tion, and delayed retention was examined descriptively. Heart rate data were continuous-ly recorded in order to examine physiological responses during task performance and the recovery period. The results showed that the listening–reading group made clear improvement in vocabu-lary scores (M = 67.00 → 80.25; t(19) = −11.395, p < .001, η² = .872, d = 2.47). In contrast, the text-only group did not show statistically significant improvement (p = .096). Contrary to the expectation that multimodal input would reduce physiological load, participants in the listening-reading condition exhibited higher task heart rate (d = .59) and greater eleva-tion relative to baseline (d = .63). These results indicate that listening-reading may facili-tate lexical acquisition through heightened engagement and attentional activation rather than simple cognitive load reduction. The findings provide additional insight for research on multimodal input in SLA and suggest that an appropriate level of cognitive engage-ment may play an important role in digital reading environments.