Article
Version 2
Preserved in Portico This version is not peer-reviewed
Near-Optimal Sparse Neural Trees for Supervised Learning
Version 1
: Received: 4 May 2021 / Approved: 6 May 2021 / Online: 6 May 2021 (17:09:47 CEST)
Version 2 : Received: 5 November 2021 / Approved: 9 November 2021 / Online: 9 November 2021 (16:54:30 CET)
Version 2 : Received: 5 November 2021 / Approved: 9 November 2021 / Online: 9 November 2021 (16:54:30 CET)
How to cite: Chakraborty, T. Near-Optimal Sparse Neural Trees for Supervised Learning. Preprints 2021, 2021050117. https://doi.org/10.20944/preprints202105.0117.v2 Chakraborty, T. Near-Optimal Sparse Neural Trees for Supervised Learning. Preprints 2021, 2021050117. https://doi.org/10.20944/preprints202105.0117.v2
Abstract
Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980s. On the other hand, deep learning methods have boosted the capacity of machine learning algorithms and are now being used for non-trivial applications in various applied domains. But training a fully-connected deep feed-forward network by gradient-descent backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. In this paper, we propose near-optimal neural regression trees, intending to make it much faster than deep feed-forward networks and for which it is not essential to specify the number of hidden units in the hidden layers of the neural network in advance. The key idea is to construct a decision tree and then simulate the decision tree with a neural network. This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees. We propose near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature. Additionally, the proposed NSNT model obtain a fast rate of convergence which is near-optimal up to some logarithmic factor. We comprehensively benchmark the proposed method on a sample of 80 datasets (40 classification datasets and 40 regression datasets) from the UCI machine learning repository. We establish that the proposed method is likely to outperform the current state-of-the-art methods (random forest, XGBoost, optimal classification tree, and near-optimal nonlinear trees) for the majority of the datasets.
Keywords
decision trees; deep feed-forward network; neural trees; consistency; optimal rate of convergence
Subject
Computer Science and Mathematics, Probability and Statistics
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (1)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment
Commenter: TANUJIT CHAKRABORTY
Commenter's Conflict of Interests: Author