The high-speed railway operation environment monitoring network system has a wide variety of sensors and inconsistent business requirements, which have certain requirements for data transmission delay and network life. Most of the previous research focuses only on prolonging network lifetime or reducing data transmission delays when designing or optimizing routing protocols, without co-designing the two. In addition, due to the harsh operating environment of high-speed railways, when the network changes dynamically, the traditional routing algorithm will generate unnecessary redesigns and lead to high overhead. Based on the actual needs of high-speed railway operation environment monitoring, this paper proposes a novel Double Q-values adaptive model combined with the existing reinforcement learning method, which considers the energy balance of the network and real-time data transmission, and constructs energy saving and delay. The two-dimensional reward avoids the extra overhead of maintaining a global routing table while capturing network dynamics. In addition, the adaptive weight coefficient is used to ensure the adaptability of the model to each business of the high-speed railway operation environment monitoring system. Finally, simulations and performance evaluations are carried out and compared with previous studies.Finally, simulations and performance evaluations are carried out and compared with previous studies. The results show that the proposed routing algorithm successfully extends the network’s life cycle and gets good real-time data performance. It also saves energy and has fewer delays than the other three routing protocols in different situations.