Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980s. On the other hand, deep learning methods have boosted the capacity of machine learning algorithms and are now being used for non-trivial applications in various applied domains. But training a fully-connected deep feed-forward network by gradient-descent backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. In this paper, we propose near-optimal neural regression trees, intending to make it much faster than deep feed-forward networks and for which it is not essential to specify the number of hidden units in the hidden layers of the neural network in advance. The key idea is to construct a decision tree and then simulate the decision tree with a neural network. This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees. We propose near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature. Additionally, the proposed NSNT model obtain a fast rate of convergence which is near-optimal upto some logarithmic factor. We comprehensively benchmark the proposed method on a sample of 80 datasets (40 classification datasets and 40 regression datasets) from the UCI machine learning repository. We establish that the proposed method is likely to outperform the current state-of-the-art methods (random forest, XGBoost, optimal classification tree, and near-optimal nonlinear trees) for the majority of the datasets.