Version 1
: Received: 9 February 2021 / Approved: 10 February 2021 / Online: 10 February 2021 (13:15:10 CET)
Version 2
: Received: 6 March 2021 / Approved: 8 March 2021 / Online: 8 March 2021 (13:45:14 CET)
Version 3
: Received: 6 December 2021 / Approved: 7 December 2021 / Online: 7 December 2021 (11:28:35 CET)
How to cite:
Besharati, M. R.; Izadi, M. SimulaD: A Novel Feature Selection Heuristics for Discrete Data. Preprints2021, 2021020260. https://doi.org/10.20944/preprints202102.0260.v3
Besharati, M. R.; Izadi, M. SimulaD: A Novel Feature Selection Heuristics for Discrete Data. Preprints 2021, 2021020260. https://doi.org/10.20944/preprints202102.0260.v3
Besharati, M. R.; Izadi, M. SimulaD: A Novel Feature Selection Heuristics for Discrete Data. Preprints2021, 2021020260. https://doi.org/10.20944/preprints202102.0260.v3
APA Style
Besharati, M. R., & Izadi, M. (2021). SimulaD: A Novel Feature Selection Heuristics for Discrete Data. Preprints. https://doi.org/10.20944/preprints202102.0260.v3
Chicago/Turabian Style
Besharati, M. R. and Mohammad Izadi. 2021 "SimulaD: A Novel Feature Selection Heuristics for Discrete Data" Preprints. https://doi.org/10.20944/preprints202102.0260.v3
Abstract
By applying a running average (with a window-size= d), we could transform Discrete data to broad-range, Continuous values. When we have more than 2 columns and one of them is containing data about the tags of classification (Class Column), we could compare and sort the features (Non-class Columns) based on the R2 coefficient of the regression for running averages. The parameters tuning could help us to select the best features (the non-class columns which have the best correlation with the Class Column). “Window size” and “Ordering” could be tuned to achieve the goal. this optimization problem is hard and we need an Algorithm (or Heuristics) for simplifying this tuning. We demonstrate a novel heuristics, Called Simulated Distillation (SimulaD), which could help us to gain a somehow good results with this optimization problem.
Keywords
Feature Selection; Discrete Data; Heuristics; Running average
Subject
Computer Science and Mathematics, Discrete Mathematics and Combinatorics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Commenter: Mohammad Reza Besharati
Commenter's Conflict of Interests: Author