Air pollution poses a serious environmental and public health problem in Bishkek, Kyrgyzstan, especially during the winter months when the concentration of particulate matter increases dramatically. Despite the urgency of the problem, there are fewer than eight monitoring stations in the city, which leaves large urban areas without proper air quality control. This article presents the first systematic study of image-based AQI assessment for Bishkek, which explores whether transfer learning models can extract visual cues related to environmental pollution from on-site urban photographs under real-world uncontrolled conditions. Two hybrid deep learning architectures, VGG16 and EfficientNetB0, each augmented with scalar PM2.5 input data, were trained and evaluated on a locally collected dataset of 1,014 image pairs–AQI. EfficientNetB0 consistently outperformed VGG16 on all three evaluation indicators, reducing RMSE by 15.5% (66.49 vs. 78.71) and MAE by 16.6% (49.00 vs. 58.78). Both models demonstrated a partial predictive signal in the AQI range from low to moderate, confirming that visual features related to the atmosphere can be detected even based on small datasets from local sources. The performance limitations reflect the scale of the dataset and sparse sensor infrastructure, rather than the lack of a studied structure, which is consistent with similar pilot studies conducted under similar data constraints. This work establishes a basic and methodological framework for future image-based air quality monitoring in Central Asia and identifies key bottlenecks — the size of the dataset, tag interference caused by geographic mismatches in sensor images, and the density of monitoring stations - that should be addressed in future work.