Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Near-optimal Sparse Neural Trees

Version 1 : Received: 4 May 2021 / Approved: 6 May 2021 / Online: 6 May 2021 (17:09:47 CEST)
Version 2 : Received: 5 November 2021 / Approved: 9 November 2021 / Online: 9 November 2021 (16:54:30 CET)

How to cite: Chakraborty, T.; Chakraborty, T. Near-optimal Sparse Neural Trees. Preprints 2021, 2021050117. Chakraborty, T.; Chakraborty, T. Near-optimal Sparse Neural Trees. Preprints 2021, 2021050117.


Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980s. On the other hand, deep learning methods have boosted the capacity of machine learning algorithms and are now being used for non-trivial applications in various applied domains. But training a fully-connected deep feed-forward network by gradient-descent backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. In this paper, we propose near-optimal neural regression trees, intending to make it much faster than deep feed-forward networks and for which it is not essential to specify the number of hidden units in the hidden layers of the neural network in advance. The key idea is to construct a decision tree and then simulate the decision tree with a neural network. This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees. We propose near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature. Additionally, the proposed NSNT model obtain a fast rate of convergence which is near-optimal upto some logarithmic factor. We comprehensively benchmark the proposed method on a sample of 80 datasets (40 classification datasets and 40 regression datasets) from the UCI machine learning repository. We establish that the proposed method is likely to outperform the current state-of-the-art methods (random forest, XGBoost, optimal classification tree, and near-optimal nonlinear trees) for the majority of the datasets.

Supplementary and Associated Material


Decision trees; Deep feed-forward network; Neural trees; Consistency; Optimal rate of convergence.


Computer Science and Mathematics, Algebra and Number Theory

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.