This paper presents a comparative analysis of Reinforcement Learning (RL)-based strategies for optimizing Frequency-Hopping Spread Spectrum (FHSS) systems against a first-order Markov jammer in Unmanned Aerial Vehicle (UAV) communications, addressing critical vulnerabilities in electronic warfare scenarios. The jammer model simulate adaptive threats in drone networks. Simulations were conducted within a Markov Decision Process (MDP) framework featuring 16 channels and episodes of 1000 steps. Three approaches were evaluated: Baseline random channel selection, Tabular Q-Learning, and Deep Q-Network (DQN) employing 16-128-128-16 neural architecture. Training spanned 100–500 episodes, with performance assessed via key metrics: Success Rate (%), Bit Error Rate (BER), Signal-to-Noise Ratio (SNR), action Entropy, and Packet Loss Rate (PLR) under Forward Error Correction (FEC).