SimulaD: A Novel Feature Selection Heuristics for Discrete Data

Mohammad Reza Besharati; Mohammad Izadi

doi:10.20944/preprints202102.0260.v2

Submitted:

06 March 2021

Posted:

08 March 2021

Read the latest preprint version here

Abstract

By applying a running average (with a window-size= d), we could transform Discrete data to broad-range, Continuous values. When we have more than 2 columns and one of them is containing data about the tags of classification (Class Column), we could compare and sort the features (Non-class Columns) based on the R2 coefficient of the regression for running averages. The parameters tuning could help us to select the best features (the non-class columns which have the best correlation with the Class Column). “Window size” and “Ordering” could be tuned to achieve the goal. this optimization problem is hard and we need an Algorithm (or Heuristics) for simplifying this tuning. We demonstrate a novel heuristics, Called Simulated Distillation (SimulaD), which could help us to gain a somehow good results with this optimization problem.

Keywords:

Feature Selection

;

Discrete Data

;

Heuristics

;

Running average

Subject:

Computer Science and Mathematics - Discrete Mathematics and Combinatorics

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

SimulaD: A Novel Feature Selection Heuristics for Discrete Data

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe