This paper proposes an adaptive photovoltaic (PV) power forecasting approach integrating Double Q-Learning and Stacking ensemble with XGBoost meta-learner to address the poor adaptability of conventional methods in real-time grid-connected PV systems. Unlike single models with limited generalization or fixed-weight ensembles that merely imitate experience superficially, the proposed approach adapts dynamically to time-varying meteorological and operational conditions. It pre-trains three complementary base models, namely RF, SVR and LightGBM, constructs a Stacking framework with XGBoost as the secondary-learner to generate high-precision baseline predictions via out-of-fold validation, and embeds a Double Q-Learning agent to output adaptive weights by capturing meteorological-temporal features and real-time prediction errors. The final prediction is obtained by fusing the Stacking output and Double Q-Learning adjusted base model outputs. Tests on a 50MW PV station dataset show it outperforms four single models and traditional ensembles in MAE, MSE, RMSE, and R², enabling reliable, generalized and adaptive real-time predictions.