Version 1
: Received: 29 September 2023 / Approved: 30 September 2023 / Online: 30 September 2023 (08:04:34 CEST)
How to cite:
Wen, X.; Jiang, Y.; Liu, F.; Mi, J.; Li, C.; Hu, J.; Shi, X.; Dong, X. The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning. Preprints2023, 2023092160. https://doi.org/10.20944/preprints202309.2160.v1
Wen, X.; Jiang, Y.; Liu, F.; Mi, J.; Li, C.; Hu, J.; Shi, X.; Dong, X. The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning. Preprints 2023, 2023092160. https://doi.org/10.20944/preprints202309.2160.v1
Wen, X.; Jiang, Y.; Liu, F.; Mi, J.; Li, C.; Hu, J.; Shi, X.; Dong, X. The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning. Preprints2023, 2023092160. https://doi.org/10.20944/preprints202309.2160.v1
APA Style
Wen, X., Jiang, Y., Liu, F., Mi, J., Li, C., Hu, J., Shi, X., & Dong, X. (2023). The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning. Preprints. https://doi.org/10.20944/preprints202309.2160.v1
Chicago/Turabian Style
Wen, X., Xiang-Ping Shi and Xiao-Wei Dong. 2023 "The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning" Preprints. https://doi.org/10.20944/preprints202309.2160.v1
Abstract
We tested selecting data randomly or proportionally in class imbalanced sample. Collecting data into the training and test set according to the initial ratio of QSOs, galaxies and stars were rec-ommended. We experimented using the original imbalanced data or introducing the class balance technologies: SMOTE, SMOTEENN, SMOTETomek, ADASYN, BorderlineSMOTE1, Border-lineSMOTE2, and RandomUndersampling. The SMOTEENN performed the best in the Sample 1. The LightGBM, CatBoost, XGBoost, and RF were compared when adopting the SMOTEENN using the petroMag_u, petroMag_g, petroMag_r, petroMag_i, petroMag_z, J, H, Ks, W1, W2, W3, W4 magnitudes as features. All of the precisions or recalls exceeded 0.94. The RF cost a little more time than the other three algorithms, but resulted in the best evaluating indicators. Utilizing the SMOTEENN +RF technology, the precision, recall and f1-score for QSOs (galaxies, stars) could achieve 0.98 (0.99, 0.98), 0.99 (0.96, 1.00), 0.98 (0.97, 0.99) respectively in Sample 1. Utilizing the SMOTEENN +RF technology, the precision, recall and f1-score for QSOs (galaxies, stars) could achieve 0.94 (0.96, 0.96), 0.98 (0.90, 0.97), 0.96 (0.93, 0.97) using the petroMag_u, petroMag_g, petroMag_r, petroMag_i, petroMag_z, W1, W2, W3, W4 magnitudes as features.
Keywords
QSOs; LightGBM; CatBoost; XGBoost; random forest
Subject
Physical Sciences, Astronomy and Astrophysics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.