Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning

Version 1 : Received: 29 September 2023 / Approved: 30 September 2023 / Online: 30 September 2023 (08:04:34 CEST)

How to cite: Wen, X.; Jiang, Y.; Liu, F.; Mi, J.; Li, C.; Hu, J.; Shi, X.; Dong, X. The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning. Preprints 2023, 2023092160. https://doi.org/10.20944/preprints202309.2160.v1 Wen, X.; Jiang, Y.; Liu, F.; Mi, J.; Li, C.; Hu, J.; Shi, X.; Dong, X. The Classification for The Sources in SDSS DR18: Searching for QSOs by Machine Learning. Preprints 2023, 2023092160. https://doi.org/10.20944/preprints202309.2160.v1

Abstract

We tested selecting data randomly or proportionally in class imbalanced sample. Collecting data into the training and test set according to the initial ratio of QSOs, galaxies and stars were rec-ommended. We experimented using the original imbalanced data or introducing the class balance technologies: SMOTE, SMOTEENN, SMOTETomek, ADASYN, BorderlineSMOTE1, Border-lineSMOTE2, and RandomUndersampling. The SMOTEENN performed the best in the Sample 1. The LightGBM, CatBoost, XGBoost, and RF were compared when adopting the SMOTEENN using the petroMag_u, petroMag_g, petroMag_r, petroMag_i, petroMag_z, J, H, Ks, W1, W2, W3, W4 magnitudes as features. All of the precisions or recalls exceeded 0.94. The RF cost a little more time than the other three algorithms, but resulted in the best evaluating indicators. Utilizing the SMOTEENN +RF technology, the precision, recall and f1-score for QSOs (galaxies, stars) could achieve 0.98 (0.99, 0.98), 0.99 (0.96, 1.00), 0.98 (0.97, 0.99) respectively in Sample 1. Utilizing the SMOTEENN +RF technology, the precision, recall and f1-score for QSOs (galaxies, stars) could achieve 0.94 (0.96, 0.96), 0.98 (0.90, 0.97), 0.96 (0.93, 0.97) using the petroMag_u, petroMag_g, petroMag_r, petroMag_i, petroMag_z, W1, W2, W3, W4 magnitudes as features.

Keywords

QSOs; LightGBM; CatBoost; XGBoost; random forest

Subject

Physical Sciences, Astronomy and Astrophysics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.