Preprint
Article

This version is not peer-reviewed.

Approximating Functions with Multi-Features on Sphere by Deep Convolutional Neural Networks

Submitted:

02 December 2025

Posted:

03 December 2025

You are already at the latest version

Abstract
In recent years, deep convolutional neural networks (DCNNs) have demonstrated remarkable success in approximating functions with multiple features. However, several challenges remain unresolved, including the approximation of target functions in Sobolev spaces defined on the unit sphere, and the extension of the types for intrinsic functions. To address these issues, we propose a DCNNs architecture with multiple downsampling layers to approximate multi-feature functions in Sobolev spaces on the unit sphere. Our method facilitates automatic feature extraction without requiring prior knowledge of the underlying composite structure and alleviates the curse of dimensionality in function approximation by extracting general smooth and spherical polynomial features. Compared with previous approaches, the proposed DCNNs architecture is more effective in capturing a variety of features.
Keywords: 
;  ;  ;  ;  

1. Introduction

Deep neural networks have demonstrated exceptional computational power and benefited greatly from the availability of large-scale data [1], leading to remarkable performance across a wide range of scientific and engineering applications. Among these architectures, deep convolutional neural networks (DCNNs), which consist of multiple convolutional layers, have proven particularly effective. DCNNs have been successfully applied to speech recognition, image classification, strategic decision-making in board games, and many other practical tasks [2,3,25,26,27].
In the context of approximation theory, DCNNs have shown strong potential [4,5,6,7,12,28], and their approximation capabilities are often regarded as a form of nonlinear approximation [31]. Early foundational work [13,29,30] examined shallow neural networks with sigmoidal-type activation functions for function approximation. More recently, DCNNs with rectified linear unit (ReLU) activation functions have been shown to possess universal approximation properties and continuous functions [8]. Furthermore, [9] demonstrated that DCNNs require only one-eighth of the free parameters needed by fully connected neural networks (FNNs) to approximate smooth functions.
Nevertheless, several challenges persist. For instance, the error rates reported in [10,11,19] suffer from the curse of dimensionality, with the approximation error deteriorating as the input dimension d increases. In an effort to address this, [21] considered the approximation of composite functions of the form f ( Q ( x ) ) in the Sobolev space W β ( B d ) for ( 0 < β 1 ) , where Q ( x ) is polynomial. This approach achieved an error rate of O ( ϵ 1 β ) for ( 0 < ϵ < 1 ) . However, this method assumes a single feature, implying strong dependence among the input variables.
It has been shown that convolutional layers can effectively extract relevant features [32,33]. Neural networks, due to their mesh-free architectures, have also demonstrated the ability to overcome the curse of dimensionality when the target functions exhibit intrinsic structures [17,18,34], possess fast decaying Fourier coefficients [13,14], or belong to general mixed smooth function spaces [15,16]. To further reduce variable dependence and alleviate the curse of dimensionality, [20] proposed the approximation of functions with multiple features in Sobolev space W β ( [ 0 , 1 ] d ) for ( 0 < β 1 ) . This setting is more realistic for complex tasks, where numerous features are typically required to describe objects. In this case, the error rate becomes O ( ϵ d β ) , where d denotes the number of features and d d . This indicates that the curse of dimensionality can be mitigated by extracting features; however, the error rate is still worse than in the case d = 1 , as the input variables exhibit reduced dependence.
Despite these advances, several challenges remain unresolved. This paper focuses on approximating functions with multiple features defined on the unit sphere, i.e., f W β ( S d 1 ) for β ( 0 , 1 ] . The target functions are of the form:
f ( x ) = f ( F 1 ( x ) , F 2 ( x ) , , F d ( x ) ) , x S d 1 ,
where each { F τ ( x ) } τ = 1 d is a smooth function in W r ( S d 1 ) with ( r > 0 ) . In addition, for further alleviating the curse of dimensionality in function approximation, we approximate functions with spherical polynomial features and functions with symmetric polynomial features in Scetion 3.4 and 3.5 respectively. The DCNN architecture proposed in Section 3.2 is capable of extracting three types of features: general smooth features, polynomial features, and symmetric polynomial features. In contrast, DCNNs constructed in [20] are limited to extracting only polynomial and symmetric polynomial features.

2. Preliminaries

We begin by establishing notation for function classes. Let S d 1 = { x R d : x 2 = 1 } denote the unit sphere in R d .

2.1. Deep Convolutional Neural Networks with Down Sampling

DCNNs, constructed using convolutional layers, have demonstrated significant effectiveness in tasks, such as image classification [3]. These networks consist of a sequence of convolutional filters denoted by w = { w ( j ) : Z R } j = 1 J , where each filter w ( j ) has infinite support over the index set { 0 , , s ( j ) } for some s ( j ) N , referred to as filter length. In this work, we assume a uniform filter length across layers, that is, s ( j ) s N , which yields the dimensionality sequence { d j = d + j s } j = 1 J . The activation function employed is the ReLU.
σ ( u ) = u u 0 , 0 u < 0 .
Given a filter w with support on the set { 0 , , s } , its convolution with a sequence v = ( v 1 , , v d j 1 ) produces a new sequence denoted by w v defined as follows:
( w v ) i = k Z w i k v k = k = 1 d j 1 w i k v k , i Z ,
which has support on the set { 1 , , d j 1 + s } . A DCNN comprising J hidden layers, with neuron mappings { h ( j ) : R R d j } j = 1 J is defined iteratively starting from the input vector h ( 0 ) = x , x = ( x 1 , , x d ) and
h ( j ) ( x ) = σ ( T ( j ) h ( j 1 ) ( x ) b ( j ) ) .
where T ( j ) represents the Toeplitz type convolutional matrix.
T ( j ) : = ( w i k ( j ) ) i = 1 , , d j , k = 1 , , d j 1 ,
associated with a filter w ( j ) of filter length s and d j N given explicitly by
T ( j ) = w 0 ( j ) 0 0 0 w 1 ( j ) w 0 ( j ) 0 0 0 w s ( j ) w s 1 ( j ) w 0 ( j ) 0 0 0 w s ( j ) w 1 ( j ) w 0 ( j ) 0 0 w s ( j ) w 0 ( j ) 0 w s ( j ) w 1 ( j ) 0 0 w s ( j ) w s 1 ( j ) 0 0 w s ( j ) R d j × d j 1 .
A constraint is imposed on the bias vectors { b ( j ) } j = 1 J associated with the convolutional layers,
b s + 1 ( j ) = = b d j s ( j ) , j = 1 , , J .
Definition 1 
(Deep Convolutional Neural Networks with Downsampling). Let x = ( x 1 , x 2 , , x d ) S d 1 represent the input data vector, and let s N denote the filter length. A downsampled DCNNs with mappings { h ( j ) : R d R d j } j = 1 J , incorporating downsampling operations at selected layers, is defined recursively. The layer widths { d j } j = 1 J are determined as follows: d 0 = d , d J 1 = d + J 1 s D 1 ⌋, d J 2 = d J 1 + ( J 2 J 1 ) s D 2 , d J k = d J k 1 + ( J k J k 1 ) s D k and more generally,
d j = d j 1 + s , j { 1 , 2 , , J } { J 1 , J 2 , , J k }
is a sequence of vectors of functions defined iteratively by h ( 0 ) = x . The downsampling operator D D : R D ^ R D ^ / D , parameterized by scaling factor D D ^ , is defined by
D D ( v ) = ( v i D ) i = 1 D ^ / D , v = ( v i ) i = 1 D ^ R D ^ ,
where u denotes the integer part of u > 0 . Denote the activated affine mapping C T , b induced by the connection matrix T and bias vector b as
C T , b ( x ) = σ ( T x b ) ,
and
h ( j ) ( x ) = C T ( j ) , b ( j ) ( h ( j 1 ) ( x ) ) if J k 1 < j < J k , D D k C T ( j ) , b ( j ) ( h ( j 1 ) ( x ) ) if j = J k ,
where J 0 = 0 , and let { D 1 , D 2 , , D k } denote the downsampling scaling factors corresponding to the layers { J 1 , J 2 , , J k } . The bias vectors satisfy
( b ( j ) ) s + 1 = ( b ( j ) ) s + 2 = = ( b ( j ) ) d j 1 , j { J 1 , J 2 , , J k } .
In this study, we refer to J 1 as the first group of convolutional layers, while J 2 J 1 constitutes the second group, and in general, J k J k 1 represents the k-th group of convolutional layers.

2.2. Spherical Harmonics and Sobolev Space on the Sphere

Let H k d represent the set of all spherical harmonics of degree k on S d 1 . This space can be characterized as the eigenfunction corresponding to the Laplace-Beltrami operator Δ 0 on S d 1 (see [22], Chapter 1.4):
H k d = { F C 2 ( S d 1 ) : Δ 0 F = λ k F } ,
where λ k = k ( k + d 2 ) . The dimension of the linear space H k d is
N ( k , d ) = k + d 1 k k + d 3 k 2 = O ( k d 2 ) .
Let Proj k : L 2 ( S d 1 ) H k d represent the orthogonal projection from L 2 ( S d 1 ) to H k d . The spaces H k d , where k Z + , are mutually orthogonal with respect to the inner product of L 2 ( S d 1 ) and possess a spherical harmonic expansion:
F = k = 0 Proj k F = k = 0 l = 1 N ( k , d ) F ^ k , l Y k , l ,
where { Y k , l } l = 1 N ( k , d ) forms an orthonormal basis of H k d , and F ^ k , l represents the Fourier coefficients of F, which are given by the expression:
F ^ k , l = F , Y k . l L 2 ( S d 1 ) = S d 1 F ( x ) Y k , l ( x ) d μ ( x ) .
The normalized spherical measure μ on S d 1 is defined by the surface area ω d = 2 π d 2 Γ ( d 2 ) , such that μ ( S d 1 ) = 1 ).
In addition, Proj k F ( x ) has an integral representation given by:
Proj k F ( x ) = S d 1 F ( y ) Z k ( x , y ) d μ ( y ) , x , y S d 1 ,
where
Z k ( x , y ) = l = 1 N ( k , d ) Y k , l ( x ) Y k , l ( y ) , x , y S d 1 .
It can be easily demonstrated that Z k ( x , y ) is the reproducing kernel of H k d independent of the choice of { Y k , l } l = 1 N ( k , d ) . Furthermore, with λ = d 2 2 ,
Z k ( x , y ) = k + λ λ C k λ ( x , y ) , x , y S d 1 ,
where C k λ ( t ) is the Gegenbauer polynomial of degree k with parameter λ > 1 2 , as discussed, for instance, in [22]. Consequently, for any x , y S d 1 , k + λ λ C k λ ( x , y ) serves as a reproducing kernel of H k d in the sense that
S d 1 p ( y ) k + λ λ C k λ ( x , y ) d μ ( y ) = p ( x ) , p H k d .
Definition 2 
(Sobolev Space on Sphere). Let 3 d N { 0 } , and for r > 0 , the Sobolev space W r ( S d 1 ) is defined as
W r ( S d 1 ) : = F L ( S d 1 ) : ( Δ 0 + I ) r / 2 F L ( S d 1 ) < ,
with the norm given by
F W r ( S d 1 ) : = ( Δ 0 + I ) r / 2 F L ( S d 1 ) = k = 0 ( 1 + λ k ) r 2 l = 1 N ( k , d ) F ^ k , l Y k , l L ( S d 1 ) .

2.3. Near-Optimal Approximation and Cubature Formula

The optimal approximation of a function using polynomial spaces of varying degrees is generally nonlinear. In spherical harmonic analysis, a valuable method for approximation is the application of a linear operator, denoted as L n .
Let η C ( [ 0 , ] ) with η ( t ) = 1 be a smooth function, such that t [ 0 , 1 ] , 0 η ( t ) 1 for t [ 1 , 2 ] , and η ( t ) = 0 for t 2 . Define
k n ( t ) = k = 0 2 n η ( k n ) 2 λ + k λ C k λ ( t ) , t [ 1 , 1 ] ,
and introduce a sequence of linear operators K n , for n Z + , on the space L ( S d 1 ) as follows:
K n ( F ) ( x ) = S d 1 F ( y ) k n ( x , y ) d μ ( y ) , x S d 1 .
This integral operator is bounded and yields effective approximations for functions belonging to Sobolev spaces.
Proposition 1 
([19]). For n N , r > 0 and F W r ( S d 1 ) , there holds
F K n ( F ) L ( S d 1 ) C η 2 d 1 n r F W r ( S d 1 ) ,
where C η represents a constant depending only on the function η. Furthermore, with a constant C η , d > 0 depending on d and η,
K n ( F ) L ( S d 1 ) C η , d F W r ( S d 1 ) .
To obtain a discrete representation of K n ( F ) , a cubature formula is employed for integrating polynomials of degree up to 4 n on the unit sphere S d 1 , with d 3 ([23], Theorem 3.1).
(Cubature Formula) There exist a constant C d > 0 that depends only on d, such that for any m C d n d 1 , there exist positive constants γ l and points y l S d 1 , for l = 1 , , m , satisfying the following relation:
S d 1 F ( x ) d μ ( x ) = l = 1 m γ l F ( y l ) , F 4 n ( S d 1 ) ,
where 4 n ( S d 1 ) denotes to the space of polynomials of degree up to 4 n on S d 1 . Furthermore, for F 4 n ( S d 1 ) , the following inequality holds:
F L ( S d 1 ) max l = 1 , , m n d 1 γ l | F ( y l ) | ,
where A B indicates that existence of C 1 > 0 and C 2 > 0 that are independent of n or m, such that C 1 A B C 2 A . In particular, we say that the family { ( γ l , y l ) } l = 1 m follows a cubature rule of degree 4 n .

3. Approximating Functions with Multiple Features on Sphere

In practical deep learning applications, the input dimension d of the target function is often quite large. As d increases, the approximation rates generally worsen exponentially, leading to what is known as the curse of dimensionality. This poses significant challenges when working with high-dimensional functions.
However, when the input variables are interdependent, meaning there are inherent structures within the target functions, DCNNs can help alleviate the curse of dimensionality by making use of these underlying relationships.
Our first target is to approximate composite function with general smooth features
f W β ( [ B F , B F ] d ) , β ( 0 , 1 ]
with B F = max τ { F τ } τ = 1 d :
f ( x ) = f ( F 1 ( x ) , F 2 ( x ) , , F d ( x ) ) , x S d 1 ,
where d d , and { F τ ( x ) } τ = 1 d W r ( S d 1 ) ( r > 0 ). Since the outer function f is Lipschitz- β continuous, the norm of f is given by
f W β ( [ B F , B F ] d ) : = sup x [ B F , B F ] d | f ( x ) | + sup x y [ B F , B F ] d | f ( x ) f ( y ) | | x y | β .
Theorem 1. 
Let 2 s d , d 3 , d d , N N , and let { F τ ( x ) } τ = 1 d be smooth functions. Define B F = max { F τ } τ = 1 d , and let f W β ( [ B F , B F ] d ) with β ( 0 , 1 ] . For approximating functions of the form (20), there exists a downsampled DCNNs as defined in Definition 1 (explicitly constructed as in Section 3.1), with downsampling scaling parameters { d , 1 } at layers { J 1 , J 2 } , respectively, and filters { w ( j ) } j = 1 J 2 of uniform length s, bias vectors { b ( j ) } j = 1 J 2 + 1 , connection matrix F ( J 2 + 1 ) , and coefficient vector c ( J 2 + 1 ) R N , such that
c ( J 2 + 1 ) · h ( J 2 + 1 ) f L ( S d 1 ) c 1 f W β N β d
where c 1 = c 1 ( d , d , β , F τ , η ) , and η C ( [ 0 , ] ) defined in Section 2.3.

3.1. Proof of Theorem 1

Before the proof, we firstly introduce the two Lemmas for approximating inner and outer functions.
Define a kernel:
l n ( t ) = k = 0 2 n η ( k n ) λ + k λ C k λ ( t ) , t [ 1 , 1 ] ,
which is a polynomial of degree 2 n . According to [19], combined with Proposition 1 and the cubature formula, we have:
F τ K n ( F τ ) L ( S d 1 ) = F τ l = 1 m γ l L n ( F τ ) ( y l ) l n ( y l , · ) L ( S d 1 ) C η 2 d 1 n r F τ W r ( S d 1 ) ,
where
L n ( F τ ) ( y l ) = S d 1 F τ ( z ) l n ( y l , z ) d μ ( z ) ,
and { ( γ l , y l ) } l = 1 m follows a cubature rule of degree 4 n .
For N 1 N , let t = { t i } i = 1 2 N 1 + 3 be the uniform mesh on [ 1 1 N 1 , 1 + 1 N 1 ] with t i = 1 + i 2 N 1 . Construct a linear operator I t on C [ 1 , 1 ] by
I t ( l n ) ( u ) = i = 2 2 N 1 + 2 l n ( t i ) δ i ( u ) , u [ 1 , 1 ] , l n C [ 1 , 1 ] ,
where δ i C ( R ) , i = 2 , , 2 N 1 + 2 , is given by
δ i ( u ) = N 1 ( σ ( u t i 1 ) 2 σ ( u t i ) + σ ( u t i + 1 ) ) .
To facilitate the analysis of the number of free parameters, we define a linear operator L N 1 : R 2 N 1 + 1 R 2 N 1 + 3 , which acts on ζ = ( ζ i ) i = 1 2 N 1 + 1 R 2 N 1 + 1 as follows:
( L N 1 ( ζ ) ) i = ζ 2 , for i = 1 , ζ 3 2 ζ 2 , for i = 2 , ζ i 1 2 ζ i + ζ i + 1 , for 3 i 2 N 1 + 1 , ζ 2 N 1 + 1 2 ζ 2 N 1 + 2 , for i = 2 N 1 + 2 , ζ 2 N 1 + 2 , for i = 2 N 1 + 3 .
This operator L N 1 facilitates the representation of the approximation operator I t on C ( [ 1 , 1 ] ) in terms of { σ ( · t i ) } i = 1 2 N 1 + 3 as
I t ( l n ) = N 1 i = 1 2 N 1 + 3 L N 1 { l n ( t k ) } k = 2 2 N 1 + 2 i σ ( · t i ) .
By [19,24], we have
l = 1 m γ l L n ( F τ ) ( y l ) l n ( y l , · ) l = 1 m γ l L n ( F τ ) ( y l ) I t ( l n ) ( y l , · ) C 3 d + 2 n d + 3 C 1 C η , d N 1 2 F τ W r ,
where C > 0 is a constant. Combining this with (23) and (27), we obtain the following Lemma:
Lemma 1. 
Let d 3 , r > 0 , n N , N 1 N . There exist γ l R and y l S d 1 , l = 1 , 2 , , m , such that for any F τ W r ( S d 1 ) ( τ = { 1 , 2 , , d } ),
F τ l = 1 m γ l L n ( F τ ) ( y l ) I t ( l n ) ( y l , · ) c ˜ F τ W r ( S d 1 ) max n r , n d + 3 N 1 2 ,
where c ˜ = C η 2 d 1 + C · 3 d + 2 C 1 C η , d .
For approximating the outer function f of the form (20), we use the result of shallow sigmoidal neural networks approximation with scaling [35].
Lemma 2 
([35]). Let d , N N , β > 0 . For any f W β ( [ B F , B F ] d ) , there exists { α ^ k } k = 1 N R d , { b ^ k } k = 1 N R , { c ^ k } k = 1 N R , and a constant C d , β depending on d and β such that
sup z [ B F , B F ] d | f ( z ) k = 1 N c ^ k σ ^ ( α ^ k · z b ^ k ) | B F β C d , β f W β N β d ,
where σ ^ is the activation function of the sigmoidal type.
Proof 
(Proof of Theorem 1).
For the approximate target functions of the form (20), we construct DCNNs based on a series of convolutional layers and a final fully connected layer. The convolutional transformations are represented by matrices { T ( j ) : = T w ( j ) R ( d j 1 + s ) × d j 1 } j = 1 J 2 , each induced by corresponding filter w : = { w ( j ) } j = 1 J 2 each supported in { 0 , 1 , , s } . Associated with these layers are bias vectors { b ( j ) R d j 1 + s } j = 1 J 2 . The network concludes with a fully connected layer h ( J 2 + 1 ) : R d J 2 R N defined by a connection matrix F ( J 2 + 1 ) R N × d J 2 and a bias vector b ( J 2 + 1 ) R N , expressed as
h ( J 2 + 1 ) ( x ) = σ ^ ( F ( J 2 + 1 ) h ( J 2 ) ( x ) b ( J 2 + 1 ) ) ,
where σ ^ is the activation function of the sigmoidal type. The activation function σ ^ that satisfies the assumptions:
σ ^ ( i ) ( u ) 0 , u R , i Z + ,
and for some integer q 1 , we have lim u σ ^ ( u ) | u | q = 0 and lim u + σ ^ ( u ) u q = 1 .
The associated hypothesis space H , used for learning and function approximation, encompasses all output functions defined by the filter sequence w = { w ( j ) } j = 1 J 2 , the connection matrix F ( J 2 + 1 ) , bias sequence b = { b ( j ) } j = 1 J 2 + 1 , and a coefficient vector c R N , is given by:
H : = c ( J 2 + 1 ) · h ( J 2 + 1 ) ( x ) : w , b , F ( J 2 + 1 ) , c ( J 2 + 1 ) .
We design the initial convolutional layer to extract linear features by employing a convolutional factorization approach [8,10], which facilitates the construction of ridge functions for approximating the smooth components { F τ ( x ) } τ = 1 d . Let s 2 and consider a sequence U = ( U k ) k = supported on the index set { 0 , , M } with M 0 . Then, there exists a finite sequence of filters { w ( j ) } j = 1 p , each supported in { 0 , , s } , where p M s 1 , such that the following convolutional factorization is satisfied:
U = w ( p ) w ( p 1 ) w ( 2 ) w ( 1 ) .
For m N and y = { y 1 , , y m } S d 1 , we define U as a sequence supported on { 0 , , m d 1 } , given by: U ( l 1 ) d + ( d k ) ( y l ) k for l { 1 , , m } and k { 1 , , d } . Let M = m d 1 , p M s 1 . There exists a sequence of filters w = { w ( j ) } j = 1 J 1 supported on { 0 , , s } with J 1 M s 1 that satisfies the convolutional factorization:
U = w ( J 1 ) w ( J 1 1 ) w ( 2 ) w ( 1 ) .
Here, for j = p + 1 , , J 1 , we take w ( j ) to be a delta sequence, i.e., { 1 , 0 , , 0 } :
T ( J 1 ) T ( 2 ) T ( 1 ) = T ( J 1 , 1 ) = ( U i k ) i = 1 , , d + J s , k = 1 , , d R ( d + J 1 s ) × d ,
where T ( j ) represents the Toeplitz matrix in (6).
The next step involves constructing the bias vectors in the neural network. We define this as:
w 1 = k = | w k | ,
For the first layer, the bias is set as:
b ( 1 ) = w 1 I d 0
and
b ( j ) = Π p = 1 j 1 w ( p ) 1 T ( j ) I d j 1 Π p = 1 j w ( p ) 1 I d j
and for layers j = 2 , , J 1 , the bias vectors are:
b s + 1 ( j ) = = b d j s ( j ) .
Note that for x 1 for x S d 1 . Let h = max { h j : j = 1 , , q } for a vector of functions h : S d 1 R q . It is known that for h : S d 1 R d j 1 , we have the following inequality:
T ( j ) h w ( j ) 1 h ,
Therefore, the components of h ( J 1 ) ( x ) satisfy the following:
h ( J 1 ) ( x ) l d = y l , x + B ( J 1 ) , l = 1 , , m ,
where B ( J 1 ) = Π p = 1 J 1 w ( p ) 1 . We have
D d h ( J 1 ) ( x ) = y 1 , x y m , x 0 0 + B ( J 1 ) I ( d + J 1 s ) / d .
Since J 1 m d 1 s 1 , we obtain the following:
d + J 1 s d 1 + m d 1 d s s 1 > 1 + m d 1 d m ,
Therefore, we have d J 1 = ( d + J 1 s ) / d m . To satisfy the constraint J 1 m d s 1 for the DCNNs, and m = C d + 1 n d 1 for the cubature formula, we need to select n = ( s 1 ) J 1 C d + 1 d 1 d 1 .
Considering that | y l · x | 1 , we define re exist two groups of convolutional layers with filters { w ( j ) } j = 1 J 2 of uniform length s and biases { b ( j ) } j = 1 J 2 that satisfy the restriction (7) for j J 2 . These layers are downsampled in the J 1 -th layer by a scale factor d, ensuring that the output of the J 2 -th layer, denoted h ( J 2 ) ( x ) R d J 2 is given by:
h ( J 2 ) ( x ) ( i 1 ) d J 1 + l = σ ( y l · x t i ) if 1 l m , 1 i 2 N 1 + 3 , 0 otherwise ,
where J 1 m d 1 s 1 , J 2 J 1 ( 2 N 1 + 3 ) d J 1 s 1 . So d J 1 = ( d + J 1 s ) / d m and d J 2 = d J 1 + ( J 2 J 1 ) s > ( 2 N 1 + 3 ) d J 1 ( 2 N 1 + 2 ) d J 1 + m . Therefore, we choose the bias vector b ( J 2 ) as:
b ( J 2 ) k = B ( J 1 ) Π p = J 1 + 1 J 2 1 w ( p ) 1 T ( J 2 ) I d J 2 1 k + t i , if ( i 1 ) d J 1 + 1 k ( i 1 ) d J 1 + m , 1 i 2 N 1 + 3 , B ( J 1 ) , otherwise .
Finally, we can express h ( J 2 ) ( x ) as:
h ( J 2 ) ( x ) = H 1 T H 2 T H 2 N 1 + 3 T 0 0 T R d J 2 ,
where
H i T = σ ( y 1 · x t i ) σ ( y 2 · x t i ) σ ( y m · x t i ) 0 0 R d J 1 , i = 1 , 2 , , 2 N 1 + 3 .
We define v i = L N 1 { l n ( t k ) } k = 2 2 N 1 + 2 i for i = 1 , 2 , , 2 N 1 + 3 . Additionally, we define the zero matrices: the m × ( d J 1 m ) matrix as O , and the m × ( d J 2 ( 2 N 1 + 2 ) d J 1 m ) matrix as O ^ . Then, we have:
F ( N 1 ) = N 1 v 1 I m O v 2 I m O v 2 N 1 + 3 I m O ^ R m × d J 2 .
Next, to obtain l = 1 m γ l L n ( F τ ) ( y l ) I t ( l n ) ( y l , · ) ( τ = 1 , 2 , , d ), by (27), we obtain as follows:
F ( γ ) = γ 1 L n ( F 1 ) ( y 1 ) γ 2 L n ( F 1 ) ( y 2 ) γ m L n ( F 1 ) ( y m ) γ 1 L n ( F 2 ) ( y 1 ) γ 2 L n ( F 2 ) ( y 2 ) γ m L n ( F 2 ) ( y m ) γ 1 L n ( F d ) ( y 1 ) γ 2 L n ( F d ) ( y 2 ) γ m L n ( F d ) ( y m ) R d × m .
Define the connection matrix F ( α ^ ) R N 2 × d for the fully connected layer as the following:
F ( J 2 + 1 ) = F ( α ^ ) F ( γ ) F ( N 1 ) = α ^ 1 α ^ N 2 F ( γ ) F ( N 1 ) ,
where α ^ k R d ( k = 1 , 2 , , N 2 ). Next, select the bias vector b ( J 2 + 1 ) R N 2 as b ( J 2 + 1 ) = [ b ^ 1 , b ^ 2 , , b ^ N 2 ] T . Additionally, define the coefficient vector c ( J 2 + 1 ) R N 2 as c ( J 2 + 1 ) = [ c ^ 1 , c ^ 2 , , c ^ N 2 ] T .
We refer to N and s as structural parameters because the architecture of the DCNNs is entirely defined by these two parameters. The remaining parameters are termed training parameters, as they can be trained or selected once s and N are specified.
The error rate for smooth features was established in Lemma 1. By setting n = N 1 2 d + 3 + r , we obtain the following expression
F τ l = 1 m γ l L n ( F τ ) ( y l ) I t ( l n ) ( y l , · ) = F τ F ^ τ c ˜ F τ W r ( S d 1 ) N 1 2 r d + 3 + r .
Denote S ( x ) = ( F 1 ( x ) , F 2 ( x ) , , F d ( x ) ) and S ^ ( x ) = ( F ^ 1 ( x ) , F ^ 2 ( x ) , , F ^ d ( x ) ) . We have
| f ( S ( x ) ) f ( S ^ ( x ) ) | f W β c ˜ d max F τ W r ( S d 1 ) τ = 1 d β N 1 2 r β d + 3 + r ,
where c ˜ = C η 2 d 1 + C · 3 d + 2 C 1 C η , d .
Using (46) and Lemma 2, the resulting error rates can be derived as follows:
Taking N 2 = N and N 1 = N 2 d + 3 + r 2 r d , we have
| c ( J 2 + 1 ) · h ( J 2 + 1 ) ( x ) f ( S ( x ) ) | | c ( J 2 + 1 ) · h ( J 2 + 1 ) ( x ) f ( S ^ ( x ) ) | + | f ( S ^ ( x ) ) f ( S ( x ) ) | B F β C d , β f W β N β d + f W β c ˜ d max F τ W r ( S d 1 ) τ = 1 d β N 1 2 r β d + 3 + r c 1 f W β N β d ,
where c 1 = C d , β B F β + c ˜ d max F τ W r ( S d 1 ) τ = 1 d β . □
Remark 1. 
Since N 1 = N d + 3 + r 2 r d , exponent part depend on input dimension d. For approximating functions in W β ( S d 1 ) , the index of inner function r and the number of features d should satisfy the following condition:
d + 3 + r 2 r d < d ,
such that the curse of dimensionality can be reduced.
We give a comparison with [10] in Table 1. Assume that the index of the outer functions β are equal. Our error rate is much faster when the index of the inner function r and the number of features d satisfy d + 3 + r 2 r d < d ( d 3 ).
Theorem 1 concerns the approximation of composite functions in Sobolev spaces on the unit sphere f W β ( S d 1 ) ( β ( 0 , 1 ] ) . In contrast, related previous studies—such as the approximation of composite functions in W β ( [ 0 , 1 ] d ) ( β ( 0 , 1 ] ) — have been discussed in [20]. When d = d and β = 1 , meaning that there are d general smmoth features on sphere, corresponding to that of approximating functions in d-dimension cube [11], the following corollary is obtained. It illustrates that the DCNNs we constructed can also approximate or represent functions in a cube.
Corollary 1. 
Let 2 s d , d 3 , N N . For approximating functions f W 1 ( [ 0 , 1 ] d ) , there exists a downsampled DCNNs as defined in Definition 1 (explicitly constructed as in Section 3.1), with downsampling scaling parameters { d , 1 } at layers { J 1 , J 2 } , respectively, and filters { w ( j ) } j = 1 J 2 of uniform length s, bias vectors { b ( j ) } j = 1 J 2 + 1 , connection matrix F ( J 2 + 1 ) , and coefficient vector c ( J 2 + 1 ) R N , such that
c ( J 2 + 1 ) · h ( J 2 + 1 ) f L ( [ 0 , 1 ] d ) c 1 f W β N 1 d
where c 1 = c 1 ( d , η ) , and η C ( [ 0 , ] ) defined in Section 2.3.

3.2. Approximating Functions with Polynomial Features

The approximation of functions using polynomial features is also discussed in [20], where the domain is considered as [ 0 , 1 ] d . In Theorem 1, we derived the error bounds for functions with smooth features defined on the unit sphere and introduced the reproducing kernel k + λ λ C k λ ( x , y ) associated with the space H k d in Section 2.2.
To reduce the curse of dimensionality, we leverage the reproducing kernel to approximate functions with polynomial features on the unit sphere. Specifically, we consider the function:
f ( x ) = f ( P 1 , q 1 ( x ) , P 2 , q 2 ( x ) , , P d , q d ( x ) ) = f ( P ( x ) ) , x S d 1
where { P τ , q τ ( x ) } τ = 1 d denotes spherical polynomials of degree q τ .
Define B P = max { P τ , q τ } τ = 1 d , assume further that f W β ( [ B P , B P ] d ) ( β ( 0 , 1 ] ). Recall the polynomial k n ( t ) introduced in Section 2.3., suppose max { q τ } τ = 1 d 2 n , we have the following Theorem:
Theorem 2. 
Let 2 s d , d 3 , d d , N N , n N . Let { P τ , q τ ( x ) } τ = 1 d be spherical polynomials of degree q τ , satisfying max { q τ } τ = 1 d 2 n , B P = max { P τ , q τ } τ = 1 d and suppose f W β ( [ B P , B P ] d ) with β ( 0 , 1 ] . For approximating functions of the form (47), there exists a downsampled DCNNs, as defined in Definition 1 (explicitly constructed in Section 3.1), with downsampling scaling parameters { d , 1 } applied at layers { J 1 , J 2 } , respectively. The network uses filters { w ( j ) } j = 1 J 2 of uniform length s, bias vectors { b ( j ) } j = 1 J 2 + 1 , a connection matrix F ( J 2 + 1 ) , and a coefficient vector c ( J 2 + 1 ) R N , such that
c ( J 2 + 1 ) · h ( J 2 + 1 ) f L ( S d 1 ) c 2 f W β · n ( d + 3 ) β · N β d
where c 2 = c 2 ( d , d , β , B P , n , η ) , and η C ( [ 0 , ] ) defined in Section 2.3.
Proof. 
Let P τ , q τ ( x ) be a spherical polynomial of degree q τ , where q τ 2 n . According to (15), we have:
P τ , q τ ( x ) = S d 1 P τ , q τ ( y ) k n ( x , y ) d μ ( y ) .
Applying (18) and (19), and setting m = C d + 1 n d 1 . Since P τ , q τ ( y ) k n ( x , y ) is a spherical polynomial in y of degree at most 4 n , there exist nodes y 1 , y 2 , , y m S d 1 and positive weights γ 1 , γ 2 , , γ m > 0 such that
P τ , q τ ( x ) = l = 1 m λ l P τ , q τ ( y l ) k n ( x , y l ) , x S d 1 .
Similar to the proof of Theorem 1, we can derive the following error rates by selecting N 1 = N 2 = N :
| c ( J 2 + 1 ) · h ( J 2 + 1 ) ( x ) f ( P ( x ) ) | | c ( J 2 + 1 ) · h ( J 2 + 1 ) ( x ) f ( P ^ ( x ) ) | + | f ( P ^ ( x ) ) f ( P ( x ) ) | B P β C d , β f W β N 2 β d + f W β c B P d β n ( d + 3 ) β N 1 2 β c 2 f W β n ( d + 3 ) β N β d ,
where c 2 = B P β C d , β + c B P d β , and c = C · 3 d + 2 C 1 C η , d .
Remark 2. 
Extrcting sphrical polynomial features can also derive the rates of approximating functions in W 1 ( [ 0 , 1 ] d ) when taking d = d and β = 1 .
Remark 3. 
Compared with [20], our spherical polynomial features are not restricted to the same degree, require only less or equal to 2 n .
We provide a comparison with [10] in Table 2. Clearly, for same index β , our results provide faster error rates since d can be much smaller than d ( d 3 ) . Unlike in the case of extracting general smooth features (Theorem 1), the index of the inner function r and the number of features d do not need to be limited to a specific range.

3.3. Approximating Functions with Symmetric Polynomial Features

Even though d much smaller than d, in practical applications of machine learning d is large. In order to further improve the rates of approximation, we consider the polynomial features { P τ , q τ ( x ) } τ = 1 d in (47), which exhibits symmetric structures with degree n. We denote these features by { Q τ , n ( x ) } τ = 1 d , where Q τ , n ( x ) = Q τ , n ( x 1 , x 2 , , x d ) and x S d 1 . For any permutation in the symmetric group Q τ , n :
Q τ , n ( x 1 , , x i , , x j , , x d ) = Q τ , n ( x 1 , , x j , , x i , , x d ) ,
Therefore, the target function is of the form
f ( x ) = f ( Q 1 , n ( x ) , Q 2 , n ( x ) , , Q d , n ( x ) ) , x S d 1 ,
and f W β ( [ B Q , B Q ] d ) ( β ( 0 , 1 ] ) with B Q = max { Q τ , n } τ = 1 d .
Theorem 3. 
Let 2 s d , d 3 , n < d d , N N , n N , { Q τ , n ( x ) } τ = 1 d be a symmetric spherical polynomial of degree n, and f W β ( [ B Q , B Q ] d ) with B Q = max { Q τ , n } τ = 1 d and β ( 0 , 1 ] . To approximate functions of the form (49), one can construct a downsampled DCNNs as described in Definition 1 (explicitly defined in Section 3.1), employing downsampling scaling parameters { d , 1 } at layers { J 1 , J 2 } , respectively. The network uses filters { w ( j ) } j = 1 J 2 of uniform length s, bias vectors { b ( j ) } j = 1 J 2 + 1 , a connection matrix F ( J 2 + 1 ) , a coefficient vector c ( J 2 + 1 ) R N , such that
c ( J 2 + 1 ) · h ( J 2 + 1 ) f L ( S d 1 ) c 3 n ( d + 3 ) β N β n
where c 3 = c 3 ( d , d , β , f , Q τ , n , n , η ) , and η C ( [ 0 , ] ) defined in Section 2.3.
Proof. 
Let the elementary symmetric polynomials on x S d 1 be denoted by
q k ( x ) = 1 i 1 < i 2 < < i k d x i 1 x i 2 x i k , k = 1 , 2 , , d .
According to the fundamental theorem of symmetric polynomials, any symmetric polynomial can be uniquely expressed as a polynomial in the variables ( q 1 ( x ) , q 2 ( x ) , , q d ( x ) ) . Given that the symmetric polynomials { Q τ , n ( x ) } τ = 1 d are of degree n, and n d , they have a unique expression as a polynomial in ( q 1 ( x ) , q 2 ( x ) , , q n ( x ) ) . Consequently, for Q τ , n with n d d , there exists a unique polynomial Q ˜ τ , n such that
Q τ , n ( x ) = Q ˜ τ , n ( q 1 ( x ) , q 2 ( x ) , , q n ( x ) ) , τ = 1 , 2 , d .
Therefore, the target function f ( x ) can expressed as
f ( x ) = f ( Q 1 , n ( x ) , Q 2 , n ( x ) , , Q d , n ( x ) ) = f ( Q ˜ 1 , n ( q 1 ( x ) , q 2 ( x ) , , q n ( x ) ) , , Q ˜ d , n ( q 1 ( x ) , q 2 ( x ) , , q n ( x ) ) ) = f ˜ ( q 1 ( x ) , q 2 ( x ) , , q n ( x ) ) ,
for some function f ˜ defined on R n as
f ˜ ( z ) = f ( Q ˜ 1 , n ( z ) , , Q ˜ d , n ( z ) ) , z R n .
Since the polynomial Q ˜ τ , n W 1 ( [ B q , B q ] n ) with semi-norm | Q ˜ τ , n | W 1 , and where B q = max k { q k } k = 1 n , it follows that for any z 1 z 2 [ B q , B q ] n , we have:
f ˜ ( z 1 ) f ˜ ( z 2 ) = f ( Q ˜ 1 , n ( z 1 ) , , Q ˜ d , n ( z 1 ) ) f ( Q ˜ 1 , n ( z 2 ) , , Q ˜ d , n ( z 2 ) ) d max τ { | Q ˜ τ , n ( z 1 ) Q ˜ τ , n ( z 2 ) | } τ = 1 d β d max τ { | Q ˜ τ | W 1 } τ = 1 d | z 1 z 2 | β .
Since { q k ( x ) } k = 1 n are spherical polynomials of degree up to n, let us denote
q ( x ) = ( q 1 ( x ) , q 2 ( x ) , , q n ( x ) ) a n d q ^ ( x ) = ( q ^ 1 ( x ) , q ^ 2 ( x ) , , q ^ n ( x ) ) .
Similar to the proof of Theorem 2, by selecting N 1 = N 2 = N , we obtain:
| c ( J 2 + 1 ) · h ( J 2 + 1 ) ( x ) f ˜ ( q ( x ) ) | | c ( J 2 + 1 ) · h ( J 2 + 1 ) ( x ) f ˜ ( q ^ ( x ) ) | + | f ˜ ( q ^ ( x ) ) f ˜ ( q ( x ) ) | B q β C d , β f ˜ W β N 2 β n + B q max τ { | Q ˜ τ | W 1 } τ = 1 d d β ( c n d + 3 ) β N 1 2 β B q β C d , β B q max τ { | Q ˜ τ | W 1 } τ = 1 d d β + f W β N 2 β n + B q max τ { | Q ˜ τ | W 1 } τ = 1 d d β ( c n d + 3 ) β N 1 2 β , c 3 n ( d + 3 ) β N β n ,
where c 3 = B q max τ { | Q ˜ τ | W 1 } τ = 1 d d β ( c β + B q β C d , β ) + B q β C d , β f W β , and c = C · 3 d + 2 C 1 C η , d .

4. Conclusions

This paper proposes the construction of DCNNs with multiple downsampled layers for approximating functions with multi-features—namely, general smooth features, polynomial features and symmetric polynomial features on the unit sphere. The result of Theorem 1 implies that if the index of the inner function r and the number of features d satisfy d + 3 + r 2 r d < d , the curse of dimensionality can be reduced. For further reducing the curse of dimensionality, we also derive error rates for cases where the features are either polynomials or symmetric polynomials on the unit sphere, as shown in Theorem 2 and 3. This implies that the DCNNs developed in Section 3.2 demonstrate the capability to extract three types of features: general smooth features, polynomial features, and symmetric polynomial features. In comparison, the DCNNs in [20] are limited to extracting only polynomial and symmetric polynomial features. In addition, compare to [20], when extracting polynomial features on sphere, our spherical polynomial features are not restricted to the same degree. Although our input vector x is spherical, we show that the DCNNs designed in Section 3.2 can also approximate or represent functions in W 1 ( [ 0 , 1 ] d ) by extrcating general smooth or spherical polynomial features, as shown in Corollary 1. These results underscore the superior learning capacity of the proposed DCNNs framework in capturing complex and high-dimensional feature representations on the unit sphere.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Goodfellow, I. ; Bengio, Y. and Courville, A. Deep Learning. MIT Press, 2016.
  2. Hinton, G. E.; Osindero, S. and Teh, Y. W. A fast learning algorithm for deep belief nets. Neural Computation. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
  3. Krizhevsky, A.; Sutskever, I. and Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017, 60, 84–90. [Google Scholar] [CrossRef]
  4. DeVore, R.; Hanin, B. and Petrova, G. Neural network approximation. ACTA Numerica. 2021, 30, 327–444. [Google Scholar] [CrossRef]
  5. Elbrächter, D.; Perekrestenko, D.; Grohs, P. and Böelcskei, H. Deep Neural Network Approximation Theory. IEEE Transactions on Information Theory. 2021, 67, 2581–2623. [Google Scholar] [CrossRef]
  6. Bartolucci, F.; De Vito, E.; Rosasco, L. and Vigogna, S. Understanding neural networks with reproducing kernel Banach spaces. Applied and Computational Harmonic Analysis. 2023, 62, 194–236. [Google Scholar] [CrossRef]
  7. Song, L. H.; Liu, Y.; Fan, J. and Zhou, D. X. Approximation of smooth functionals using deep ReLU networks. Neural Networks. 2023, 166, 424–436. [Google Scholar] [CrossRef]
  8. Zhou, D. X. Universality of deep convolutional neural networks. Applied and Computational Harmonic Analysis. 2020, 48, 787–794. [Google Scholar] [CrossRef]
  9. Zhou, D. X. Theory of deep convolutional neural networks: Downsampling. Neural Networks. 2020, 124, 319–327. [Google Scholar] [CrossRef]
  10. Fang, Z. Y.; Feng, H.; Huang, S. and Zhou, D. X. Theory of deep convolutional neural networks II: Spherical analysis. Neural Networks. 2020, 131, 154–162. [Google Scholar] [CrossRef] [PubMed]
  11. Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Networks. 2017, 94, 103–114. [Google Scholar] [CrossRef] [PubMed]
  12. Zhou, D. X. Deep distributed convolutional neural networks: Universality. Analysis and Applications. 2018, 16, 895–919. [Google Scholar] [CrossRef]
  13. Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory. 1993, 39, 930–945. [Google Scholar] [CrossRef]
  14. Klusowski, J. and Barron, A. R. Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With 1 and 0 Controls. IEEE Transactions on Information Theory. 2018, 64, 7649–7656. [Google Scholar] [CrossRef]
  15. Suzuki,T. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality. International Conference on Learning Representations. New Orleans, United States, 2019.
  16. Montanelli, H. and Du, Q. New Error Bounds for Deep ReLU Networks Using Sparse Grids. SIAM Journal on Mathematics of Data Science. 2019, 1, 78–92. [Google Scholar] [CrossRef]
  17. Bach, F. Breaking the Curse of Dimensionality with Convex Neural Networks. Journal of Machine Learning Research. 2017, 18, 19. [Google Scholar]
  18. Bauer, B. and Kohler, M. On deep learning as a remedy for the curse of dimensionality in nonparametric regression. Annals of Statistics. 2019, 47, 2261–2285. [Google Scholar] [CrossRef]
  19. Feng, H.; Huang, S. and Zhou, D. X. Generalization Analysis of CNNs for Classification on Spheres. IEEE Transactions on Neural Networks and Learning Systems. 2023, 34, 6200–6213. [Google Scholar] [CrossRef]
  20. Mao, T.; Shi, Z. J. and Zhou, D. X. Approximating functions with multi-features by deep convolutional neural networks. Analysis and Applications. 2023, 21, 93–125. [Google Scholar] [CrossRef]
  21. Mao, T.; Shi, Z. J. and Zhou, D. X. Theory of deep convolutional neural networks III: Approximating radial functions. Neural Networks. 2021, 144, 778–790. [Google Scholar] [CrossRef]
  22. Dai, F. and Xu, Y. Approximation Theory and Harmmonic Analysis on Spheres and Balls. Springer, 2013.
  23. Brown, G. and Dai, F. Approximation of smooth functions on compact two-point homogeneous spaces. Journal of Functional Analysis. 2005, 220, 401–423. [Google Scholar] [CrossRef]
  24. De Boor, C. and Fix, G. J. Spline approximation by quasiinterpolants. Journal of Approximation Theory. 1973, 8, 19–45. [Google Scholar] [CrossRef]
  25. Feng, H.; Hou, S. Z.; Wei, L. Y. and Zhou, D. X. CNN models for readability of chinese texts. Mathematical Foundations of Computing. 2022, 5, 351–362. [Google Scholar] [CrossRef]
  26. Ahmed, R.; Fahim, A. I.; Islam, M.; Islam, S. and Shatabda, S. Dolg-next: Convolutional neural network with deep orthogonal fusion of local and global features for biomedical image segmentation. Neurocomputing. 2023, 546, 126362. [Google Scholar] [CrossRef]
  27. Silver, D.; Schrittwieser, J.; Simonyan, K.; et al. Mastering the game of Go without human knowledge. Nature. 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
  28. Herrmann, L.; Opschoor, J. and Schwab, C. Constructive Deep ReLU Neural Network Approximation. Journal of Scientific Computing. 2022, 90, 75. [Google Scholar] [CrossRef]
  29. Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems. 1989, 2, 303–314. [Google Scholar] [CrossRef]
  30. Hornik, K. ; Stinchcombe,M. and White,H. Multilayer feedforward networks are universal approximators. Neural Networks. 1989, 2, 359–366. [Google Scholar] [CrossRef]
  31. DeVore, R.; Howard, R. and Micchelli, C. Optimal nonlinear approximation. Manuscripta Mathematica. 1989, 63, 469–478. [Google Scholar] [CrossRef]
  32. Mallat, S. Understanding deep convolutional networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2016, 374, 20150203. [Google Scholar] [CrossRef]
  33. Wiatowski, T. and Bölcskei, H. A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. IEEE Transactions on Information Theory. 2018, 64, 1845–1866. [Google Scholar] [CrossRef]
  34. Mhaskar, H. and Poggio, T. Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications. 2016, 14, 829–848. [Google Scholar] [CrossRef]
  35. Mhaskar, H. N. and Micchelli, C. A. Approximation by superposition of sigmoidal and radial basis functions. Advances in Applied Mathematics. 1992, 13, 350–373. [Google Scholar] [CrossRef]
Table 1. Approximation rates of extracting features or not.
Table 1. Approximation rates of extracting features or not.
Regularity Range Error rate features
f W β ( S d 1 ) [10] 0 < β 1 [10] O ( N β d + 1 ) [10] None [10]
f W β ( S d 1 ) 0 < β 1 O ( N β d ) General Smooth
Table 2. Approximation rates of extracting features or not.
Table 2. Approximation rates of extracting features or not.
Regularity Range Error rate features
f W β ( S d 1 ) [10] 0 < β 1 [10] O ( N β d + 1 ) [10] None [10]
f W β ( S d 1 ) 0 < β 1 O ( N β d ) Spherical Polynomial
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated