Preprint
Article

This version is not peer-reviewed.

Asymmetric Kernel Density Estimation for Biased Data

A peer-reviewed article of this preprint also exists.

Submitted:

27 June 2023

Posted:

28 June 2023

You are already at the latest version

Abstract
Nonparametric density estimation for nonnegative data is considered in a situation where a random sample is not directly available but the data are instead observed from the length-biased sampling. Due to the boundary bias problem of the location-scale kernel, the approach in this paper is an application of asymmetric kernel. Two nonparametric density estimators are proposed. The mean integrated squared error, strong consistency, and asymptotic normality of the estimators are investigated. Some simulations illustrate the finite sample performance of the estimators.
Keywords: 
;  ;  ;  

1. Introduction

Nonparametric density estimation is a major issue of statistics and econometrics. Although the kernel density estimator (KDE) of location-scale type, originally proposed by Rosenblatt (1956) and Parzen (1962), is perhaps the most popular in the literature, its boundary bias problem is an important matter of concern when the support of the density to be estimated is not the whole real line R . Various remedies for avoiding the boundary bias problem have been discussed, on the basis of renormalization, reflection, and generalized jackknifing (Jones (1993)), transformation (Marron and Ruppert (1994)), advanced reflection (Zhang et al. (1999)), and so on. Recently, there has been a vast literature on this subject, especially, with the renewed interest in revisiting an asymmetric kernel method after Chen (1999, 2000).
For the standard statistical inference problem of nonnegative data, it is assumed that a random sample X 1 , , X n } of a size n is drawn from a population with density f ( x ) , x 0 . Practically, one encounters many situations where the available data are observed only under a certain biased sampling scheme (see, e.g., Patil and Rao (1978)). Then, it may be reasonable to regard a functional of the biased distribution as the inferential target. But, some analyses need the estimation referring to the original distribution. Cox (1969) considered estimating the population mean μ = 0 t f ( t ) d t ( > 0 ) and cumulative distribution function F ( x ) = 0 x f ( t ) d t , x > 0 . An overview of nonparametric functional estimation under the biased sampling scheme is found in Cristóbal and Alcalá (2001).
Nonparametric density estimation from the biased data started in the late 1980s. Two important papers often quoted are Bhattacharyya et al. (1988) and Jones (1991) (see also Richardson et al. (1991) and Guillamón et al. (1998)). Jones (1991) studied a kernel smoothing on the basis of Cox’s (1969) distribution estimator, without cares about the boundary bias problem of the classical Rosenblatt–Parzen location-scale KDE. The main contribution of this paper is to revisit the nonparametric density estimation under the biased sampling scheme, using asymmetric kernel method which enables us to avoid the boundary bias problem and then have desirable asymptotic properties. Our approach is different from Mnatsakanov and Ruymgaart (2003,2006) on moment-type density estimation motivated by the so-called moment problem, and Chaubey et al. (2010) using Hill’s lemma (Feller (1971; (1.5) of page 220)).
The rest of this paper is organized as follows. Section 2 describes the length-biased (LB) distribution and illustrates, in detail, the boundary behavior of a convolution integral. After a brief introduction of the asymmetric kernel method, two density estimators from the LB data are proposed, in parallel with those of Bhattacharyya et al. (1988) and Jones (1991). Section 3 and Section 4 state the required assumptions and main results of this paper. All proofs are given in the appendix. In Section 5, some simulations illustrate the finite sample performance of the density estimators.
As usual, we use the notation | | h | | S = sup x S | h ( x ) | for any bounded function h on S . We write, for j N , h ( j ) ( x ) = ( d / d x ) j h ( x ) (if it exists), and h ( 0 ) ( x ) = h ( x ) . For an estimator f ^ ( x ) of f ( x ) , where x 0 , the mean squared error (MSE) is defined by
M S E [ f ^ ( x ) ] = E [ { f ^ ( x ) f ( x ) } 2 ] = B i a s 2 [ f ^ ( x ) ] + V [ f ^ ( x ) ] ,
and the mean integrated squared error (MISE); M I S E [ f ^ ] = 0 M S E [ f ^ ( x ) ] d x is a global measure of discrepancy of f ^ from f.

2. Preliminaries

2.1. LB Density

Nonparametrically, we wish to estimate the density f with nonnegative support, in a situation where a random sample { X 1 , , X n } is not directly available but a sample { Y 1 , , Y n } = Y n (say) is instead observed from the LB distribution having the density
f LB ( x ) = x f ( x ) μ , x 0 .
Throughout this paper, if there is no confusion, the X i s are iid copies of the random variable X having the density f, whereas the Y i ’s are iid copies of the random variable Y having the LB density f LB . We repeatedly use the fact that, for a measurable function G and a real number r,
E [ Y r G ( Y ) ] = 0 G ( t ) t r f LB ( t ) d t = 1 μ 0 G ( t ) t r + 1 f ( t ) d t = 1 μ E [ X r + 1 G ( X ) ] ( i f   i t   e x i s t s ) .

2.2. Boundary Bias Problem and Asymmetric Kernel Method

When supp ( g ) = [ 0 , ) , it is well known that a usual approximation (near the origin) of a certain smooth nonnegative function g as the convolution integral does not hold when g ( 0 ) > 0 . We emphasize that the following fact is a starting point in the present paper: Even for the case g ( 0 ) = 0 , the convergence rate near the origin x = 0 is slower when g ( 0 ) 0 ; a typical example is g ( x ) = x e x .
More precisely, if k is a symmetric density on [ 1 , 1 ] (say) and h = h n > 0 is a bandwidth which tends to zero as n (hereafter, we will omit the phrase “as the sample size n tends to the infinity” unless otherwise stated, and we denote by x the location where the density estimation is made), then, as shown in, e.g., Jones (1993),
0 1 h k x s h g ( s ) d s g ( x ) = 1 min ( x / h , 1 ) k ( t ) g ( x h t ) d t g ( x ) = 1 1 k ( t ) g ( x h t ) d t g ( x ) h 2 g ( x ) 2 1 1 t 2 k ( t ) d t , 1 p k ( t ) g ( x h t ) d t g ( x ) g ( 0 ) p 1 k ( t ) d t + h g ( 0 ) p 1 ( p + t ) k ( t ) d t ,
according to x h or x = h p ( 0 p 1 ), because the kernel k ( ( x · ) / h ) / h creates a mass outside [ 0 , ) when the location x is at or near the origin. This is the motivation that, instead of the location-scale kernel k ( ( x · ) / h ) / h , we focus on an application of an asymmetric kernel k ( · ; β , x ) , whose support matches the support of the target function, where β = β n > 0 is a smoothing parameter, with β 0 . It should be remarked that, after Chen (2000), the notation β (rather than h) is common in the asymmetric kernel method; indeed, the parameter β for the asymmetric kernel under consideration has no meaning of the bandwidth h as in the classical Rosenblatt–Parzen location-scale kernel with a compact support, although β corresponds to h 2 and controls the bias-variance trade-off. This is why we refer to β as a smoothing parameter (rather then a bandwidth).
We formally require that
0 k ( s ; β , x ) g ( s ) d s = g ( x ) + O ( β ) f o r   a n y x 0 ,
if supp ( g ) = [ 0 , ) . To the best of our knowledge, Silverman (1986; page 28) first mentioned a fairly simple idea of using gamma and log-normal (LN) kernels for the nonnegative data, where the kernel shape varies according to ( β , x ) . Perhaps, Chen’s (2000) gamma kernel, defined by1
k ( G ) ( s ; β , x ) = 1 β s β x / β e s / β Γ ( x / β ) , s , x 0 ,
plays a central role in this area; however, some simulations reveal that when the the target density at the origin is zero, the gamma kernel is disadvantage compared to the LN kernel (e.g., Igarashi (2016)). Subsequent authors have discussed various kernels like LN, inverse Gaussian (IG), reciprocal IG (RIG), Birnbaum–Saunders (BS), and inverse gamma. The analysts can now choose what they like, among many options available for the kernel with support [ 0 , ) . According to Igarashi and Kakizawa (2020) (see also Kakizawa (2021)), let us choose k ( · ; β , x ) in the following form:
Definition 1
(Igarashi and Kakizawa (2020)). Given a baselined density p ( · ; · ) , we set
k ( s ; β , x ) = 1 β p s β ; x β , s , x 0 ,
where the functional form of p, with nonnegative support, is independent of β and x.
To make this formulation clear, let
f g ( q B S ) ( s ; θ 1 , θ 2 , θ 3 ) = C g g a q ( s / θ 2 ) θ 1 θ 3 2 A q ( s / θ 2 ) θ 1 θ 2 , s 0 ,
be a symmetrical-based qBS density, associated with a density generator g, where θ 1 , θ 2 > 0 , θ 3 R ,
a q ( t ) = 1 2 q ( t q t q ) , q 0 , log t , q = 0 , a n d A q ( t ) = 1 2 ( t q 1 + t ( q + 1 ) )
(due to the fact ( t q t q ) / q = ( t | q | t | q | ) / | q | , it is enough to take q 0 ). Note that the density C g g ( u 2 ) on R is symmetric about the origin, where 1 / C g = 0 y 1 / 2 g ( y ) d y . Here, it is common to standardize g so that u 2 C g g ( u 2 ) d u = 1 , without loss of generality. Indeed, as long as u 2 C g g ( u 2 ) d u = J for some constant J = J g > 0 , this standardization can always be imposed with the replacement of g ( y ) by J 1 / 2 g ( J y ) ; the nomalizing constant C g is then invariant, i.e., J 1 / 2 g ( J u 2 ) d u = g ( t 2 ) d t = 1 / C g .
Given constants q 0 and c > 0 , a family of symmetrical-based non-central qBS kernels (Kakizawa (2021))2 is defined by
k g ( q B S ) ( s ; β , x ) = f g ( q B S ) s ; 1 / ( x / β + c ) , β ( x / β + c ) , θ 1 / ( x / β + c ) , s , x 0 ,
where θ R . Kakizawa (2018) considered the central case θ = 0 . Such a family k g ( q B S ) is flexible via the (infinite-dimensional) density generator g, as well as the parameter q 0 . In some numerical studies of Section 5, we will put q = 1 / 2 (symmetrical-based BS kernel) or q = 0 (log-symmetrical (LS) kernel), with ( θ , c ) = ( 0 , 1 ) , for simplicity, and use the power exponential (PE) generator; g PE [ p ] ( y ) = exp ( λ p y p ) , p 1 / 2 , where the particular choice λ p = { Γ ( 3 / ( 2 p ) ) / Γ ( 1 / ( 2 p ) ) } p ensures that the PE density has the variance 1, like the standard normal density. Other generators (Kotz-type, generalized Pearson-type VII, and generalized logistic-type III) are found in Kakizawa (2018,2021). A symmetrical-based central qBS kernel belongs to a family of symmetrical-based qMIG kernels (MIG is an abbreviation of a mixture of IG and RIG), which is enlarged, linking to a class of skew-BS type kernels. See Kakizawa (2018,2021).
Remark 1.
Chaubey and Li (2013) applied Scaillet’s (2004) RIG kernel. As pointed out by Igarashi and Kakizawa (2014), an RIG KDE, however, also suffered from the boundary bias problem, so that the re-formulated RIG kernel should be applied.

2.3. Two density estimators under LB sampling scheme

Using E [ Y 1 ] = 0 t 1 f LB ( t ) d t = 1 / μ , as well as the relation
f ( x ) = μ x f LB ( x ) = x 1 f LB ( x ) E [ Y 1 ] , x > 0 ,
our first estimator is defined in parallel with that of Bhattacharyya et al. (1988), as follows:
f ˜ β , ϵ ( x ) = ( n x ) 1 i = 1 n k ( Y i ; β , x ) n 1 i = 1 n ( Y i + ϵ ) 1 + ϵ , x > 0 .
Except for a technical issue that needs to take a small ϵ n 1 / 2 , this estimator may be natural in the sense that f LB ( x ) , x > 0 , is consistently estimable by the asymmetric KDE n 1 i = 1 n k ( Y i ; β , x ) (Bhattacharyya et al. (1988) used the classical Rosenblatt–Parzen KDE n 1 i = 1 n ( 1 / h ) k ( ( x Y i ) / h ) ). On the other hand, Jones’s (1991) idea;
E [ Y 1 ( 1 / h ) k ( ( x Y ) / h ) ] E [ Y 1 ] = 0 1 h k x s h f ( s ) d s
(see Cox (1969)) is also reasonable. However, in order to solve its boundary bias problem unless f ( 0 ) = f ( 0 ) = 0 (see Introduction), our second estimator
f ^ β , ϵ ( x ) = n 1 i = 1 n Y i 1 k ( Y i ; β , x ) n 1 i = 1 n ( Y i + ϵ ) 1 + ϵ , x 0
is proposed, for which some asymptotic properties will be studied in Subsection 4.1.
Before we proceed with description of our required assumptions (Section 3) and specific asymptotic results (Section 4), we highlight a novelty, compared with Jones (1991). Suppose that f ( 0 ) = 0 but f ( 0 ) 0 (we have an example f ( x ) = x e x , x 0 ). Then,
  • Jones’s (1991) estimator f ^ h , Jones , based on the location-scale kernel with the support [ 1 , 1 ] (say), suffers from the boundary bias problem, i.e., B i a s [ f ^ h , Jones ( x ) ] = O ( h 2 ) for x h and B i a s [ f ^ h , Jones ( x ) ] = O ( h ) for 0 x h , and, as a result, it is shown that
    M I S E [ f ^ h , Jones ] = O ( h 3 + ( n h ) 1 ) ( = O ( n 3 / 4 ) ) ;
  • our estimator f ^ β , ϵ achieves the convergence rate n 4 / 5 of the MISE (see Theorem 4).
We notice that f ^ β , ϵ is more preferable, since f ˜ β , ϵ has the factor x 1 (it is numerically unstable near the origin); besides, a rigorous error analysis for the (unweighted) MISE of f ˜ β , ϵ seems to be hard technically (or it might be impossible), although the pointwise MSE (for x > 0 ) and weighted MISE of f ˜ β , ϵ are tractable (see Subsection 4.2).

3. Assumptions

We use the notation f j ( · ) = f ( · ) / ( · ) j . In order to prove asymptotic properties of f ^ β , ϵ (Subsection 4.1), the following set of assumptions, labeled as F, is imposed for the density f to be estimated:
F.
(i) 1. f is a twice continuously differentiable function on [ 0 , ) , where f , f , and f are bounded;
2. f is a Hölder-continuous function (with exponent 0 < η 1 ) on [ 0 , ) , i.e., there exists a constant L > 0 , such that | f ( u ) f ( v ) | L | u v | η for any u , v 0 .
(ii) 1. f 1 is a bounded function on [ 0 , ) ;
2. f 1 is a Hölder-continuous function (with exponent 0 < η 1 ) on [ 0 , ) .
(iii) the inverse moment of X; E [ X 1 ] = 0 t 1 f ( t ) d t = 0 f 1 ( t ) d t = μ 1 (say) exists (note that E [ Y 2 ] = μ 1 E [ X 1 ] ).
(iii ) E [ X ( 1 + q ) ] = 0 f ( 1 + q ) ( t ) d t = μ ( 1 + q ) (say) exists for some constant q > 0 .
(iv) 0 { f ( t ) } 2 d t , 0 { t f ( t ) } 2 d t , and 0 f 3 / 2 ( t ) d t exist.
Remark 2.
Under the boundedness of f 1 , given in F(ii.1), the density f to be estimated must have a constraint f ( 0 ) = 0 (note that F(iii ) for some q > 1 implies f 1 ( 0 ) = 0 ). However, as illustrated earlier, even in the case f ( 0 ) = 0 , Jones’s (1991) estimator suffers from the boundary bias problem when f ( 0 ) 0 (we have an example f ( x ) = x e x , x 0 ).
On the other hand, for f ˜ β , ϵ (Subsection 4.2), we additionally make some assumptions on the corresponding LB density, labeled as F :
F .
1. In addition to F(i.1), f LB , f LB , and f LB are bounded functions3 on [ 0 , ) ;
2. f LB is a Hölder-continuous function (with exponent 0 < η 1 ) on [ 0 , ) .
Lastly, high-level conditions on the kernel k ( · ; β , x ) , labeled as A, are needed, whose details will be given in the top of Appendix. For notational simplicity, we write
B g ( x ) = ζ 1 , 1 g ( x ) + ζ 2 , 1 2 x g ( x ) , V g ( x ) = ζ g ( x ) x 1 / 2 ,
where the constants ζ 1 , 1 , ζ 2 , 1 , and ζ ( ζ 2 , 1 , ζ > 0 ) appear in Assumption A1.2–3 of Appendix. It is convenient for us to define J β k , g ( x ) = 0 k ( s ; β , x ) g ( s ) d s , B β k , g ( x ) = J β k , g ( x ) g ( x ) , and J β k 2 , g ( x ) = 0 k 2 ( s ; β , x ) g ( s ) d s . Obviously, the following inequalities hold:
J β k , g ( x ) | | g | | [ 0 , ) , | B β k , g ( x ) | 2 | | g | | [ 0 , ) , J β k 2 , g ( x ) sup s 0 k ( s ; β , x ) J β k , g ( x ) .
It should be remarked that most of the items in Assumption F (or F ) are needed to approximate J β · , · ( x ) , whose error analyse is found in Appendix. Roughly speaking, we obtain J β k , f ( x ) f ( x ) + β B f ( x ) under F(i) and μ J β k 2 , f 1 ( x ) β 1 / 2 μ V f 1 ( x ) under F(ii), for the estimator f ^ β , ϵ . Also, for the estimator f ˜ β , ϵ , it is shown that, under F ,
μ x J β k , f LB ( x ) μ x [ f LB ( x ) + β B f LB ( x ) ] = f ( x ) + β B ( x ) , μ x 2 J β k 2 , f LB ( x ) μ x 2 V f LB ( x ) β 1 / 2 = μ V f 1 ( x ) β 1 / 2 ,
where B ( x ) = ( μ / x ) B f LB ( x ) . Note that
B ( x ) = ζ 1 , 1 { x 1 f ( x ) + f ( x ) } + ζ 2 , 1 f ( x ) + x 2 f ( x ) = B f ( x ) + ζ 1 , 1 f 1 ( x ) + ζ 2 , 1 f ( x ) .
We can see that, under F(i.1 and iv), 0 B f 2 ( x ) d x = I B f 2 (say) and 0 V f 1 ( x ) d x = I V f 1 (say) are well-defined; besides4, 0 B 2 ( x ) d x = I B 2 (say) is well-defined (we assume F(ii.1)).

4. Main Results

We assume that
B( ι ).
β = n ι ( n ) , where is a (positive) slowly varying function.
Note that all powers of log y and a function L ( y ) approaching a positive limit vary slowly. For the achievement of the optimal rate of the M(I)SE, β = C n 2 / 5 must be feasible, at least, where C > 0 is a constant, independent of n.
In what follows, let ϵ = C n 1 / 2 , where C > 0 is a constant, independent of n. We write
ω β , η ( x ) = β 3 / 2 x 1 / 2 + β 2 + ( β x ) 1 + η / 2 , ω β , η ( x ; g ) = β 1 / 2 V g ( x ) ( β x 1 ) 1 / 2 + χ { 0 < η < 1 } ( β x ) ( η 1 ) / 2 ,
where χ S is the indicator function of the set S.

4.1. Asymptotic Properties of f ^ β , ϵ

Theorem 1.
Suppose that AssumptionsA1, A2.1( ν = 0 , 1 ); see Appendix, and F(i–iii)hold. UnderB( 0 < ι < 1 ) , given constants c L > 0 and 0 < τ < 1 , we have
1 . sup 0 x c L β τ | B i a s [ f ^ β , ϵ ( x ) ] | = O ( β τ + n 1 / 2 ) , sup 0 x c L β τ V [ f ^ β , ϵ ( x ) ] = O ( n 1 β 1 + n 1 / 2 β 2 τ ) ; 2 . f o r x c L β τ , B i a s [ f ^ β , ϵ ( x ) ] = β B f ( x ) + R β Bias ( x ) , V [ f ^ β , ϵ ( x ) ] = n 1 β 1 / 2 μ V f 1 ( x ) + R β V ( x ) ,
with
| R β Bias ( x ) | M ω β , η ( x ) + n 1 / 2 + n 1 / 2 β | B f ( x ) | , | R β V ( x ) | M n 1 { ω β , η ( x ; f 1 ) + 1 + V f 1 ( x ) } + n 1 / 2 { β 2 B f 2 ( x ) + ω β , η 2 ( x ) } ,
where M , M > 0 are constants, independent of n , β , and x. Also, we have
B i a s [ f ^ β , ϵ ( 0 ) ] = β f ( 0 ) 0 u p ( u ; 0 ) d u + O ( n 1 / 2 + β 2 ) , V [ f ^ β , ϵ ( 0 ) ] = n 1 β 1 μ f 1 ( 0 ) 0 p 2 ( u ; 0 ) d u + O ( n 1 β η 1 + n 1 + n 3 / 2 β 1 + n 1 / 2 β 2 ) .
Remark 3.
(i) The asymptotic bias and variance of f ^ β , ϵ ( x ) when x is near the origin; x / β κ , where κ 0 is finite, can be obtained, as in Kakizawa (2018,2021). The details are omitted.
(ii) The pointwise MSE of f ^ β , ϵ is a corollary of Theorem 1(2), as follows: For fixed x > 0 ,
M S E [ f ^ β , ϵ ( x ) ] = A M S E x [ β ] + o ( β 2 + n 1 β 1 / 2 ) ,
where
A M S E x [ β ] = β 2 B f 2 ( x ) + n 1 β 1 / 2 μ V f 1 ( x ) 5 4 4 / 5 { B f 2 ( x ) } 1 / 5 { μ V f 1 ( x ) } 4 / 5 n 4 / 5 i f B f ( x ) 0
(the equality holds iff β = [ { μ V f 1 ( x ) } / { 4 B f 2 ( x ) } ] 2 / 5 n 2 / 5 ).
(iii) We have M S E [ f ^ β , ϵ ( 0 ) ] = O ( β 2 + n 1 β 1 ) ( = O ( n 2 / 3 ) if f ( 0 ) f 1 ( 0 ) 0 ).
Theorem 2.
Suppose that AssumptionsA1, A2.1( ν = 0 , 1 ); see Appendix, andF(i–iii)hold. UnderB( 0 < ι < 1 / 2 ) , we have f ^ β , ϵ ( x ) a . s . f ( x ) for fixed x > 0 (note that f ^ β , ϵ ( 0 ) a . s . f ( 0 ) = 0 ).
Theorem 3.
Suppose that AssumptionsA1, A2.2( ν = 0 , 1 ); see Appendix, andF(i–iii )hold.
(i)UnderB( 0 < ι < min { 2 q / ( 2 + q ) , 1 } ), we have
( n β 1 / 2 ) 1 / 2 { f ^ β , ϵ ( x ) E [ f ^ β , ϵ ( x ) ] } d N ( 0 , μ V f 1 ( x ) ) f o r   f i x e d x > 0 .
(ii)UnderB( 0 < ι < q / ( 2 + q ) ), we have
( n β ) 1 / 2 { f ^ β , ϵ ( 0 ) E [ f ^ β , ϵ ( 0 ) ] } d N 0 , μ f 1 ( 0 ) 0 p 2 ( u ; 0 ) d u .
For fixed x > 0 , the replacement of E [ f ^ β , ϵ ( x ) ] by f ( x ) is a routine by combining Theorem 3(i) with a bias of Theorem 1(2): If Assumption F(iii ) holds for some q > 1 / 2 , and if 2 / 5 ι < min { 2 q / ( 2 + q ) , 1 } (for the extreme case ι = 2 / 5 , assume n β 5 / 2 0 ), then,
( n β 1 / 2 ) 1 / 2 { f ^ β , ϵ ( x ) f ( x ) } d N ( 0 , μ V f 1 ( x ) ) f o r f i x e d x > 0 .
Theorem 4.
Suppose that Assumptions A1, A2.1( ν = 0 , 1 ), A3( H = 6 / η + 1 + δ 0 ) ; see Appendix, and F hold, where 0 t 2 ( 3 / η + 1 ) + δ 0 f ( t ) d t exists for some constant δ 0 > 0 . UnderB( 0 < ι < 1 ) , we have
M I S E [ f ^ β , ϵ ] = A M I S E [ β ] + o ( β 2 + n 1 β 1 / 2 ) ,
where
A M I S E [ β ] = β 2 I B f 2 + n 1 β 1 / 2 μ I V f 1 5 4 4 / 5 ( I B f 2 ) 1 / 5 ( μ I V f 1 ) 4 / 5 n 4 / 5 i f B f ( x ) 0
(the equality holds iff β = { ( μ I V f 1 ) / ( 4 I B f 2 ) } 2 / 5 n 2 / 5 = β opt (say)).

4.2. Asymptotic Properties of f ˜ β , ϵ

Due to the presence of the factor x 1 in the estimator f ˜ β , ϵ , the case x = 0 is excluded throughout this subsection; besides, in Theorem 8 (see also Remark 5), we will consider a truncated MISE of f ˜ β , ϵ , for the global performance.
Theorem 5.
Suppose that AssumptionsA1, A2.1( ν = 0 ); see Appendix,F(ii.1 and iii), andF hold. UnderB( 0 < ι < 1 ) , given constants c L > 0 and 0 < τ < 1 , we have, for x c L β τ ,
B i a s [ f ˜ β , ϵ ( x ) ] = β B ( x ) + R β Bias ( x ) , V [ f ˜ β , ϵ ( x ) ] = n 1 β 1 / 2 μ V f 1 ( x ) + R β V ( x ) ,
with
| R β Bias ( x ) | M x 1 ω β , η ( x ) + n 1 / 2 ( 1 + x 1 ) + n 1 / 2 β | B ( x ) | , | R β V ( x ) | M [ n 1 { ω β , η ( x ; f 1 ) + 1 + x 2 + V f 1 ( x ) } + n 1 / 2 { β 2 B 2 ( x ) + x 2 ω β , η 2 ( x ) } ] ,
where M , M > 0 are constants, independent of n , β , and x.
Remark 4.
As a corollary of Theorem 5, we have, for fixed x > 0 ,
M S E [ f ˜ β , ϵ ( x ) ] = A M S E x [ β ] + o ( β 2 + n 1 β 1 / 2 ) ,
where
A M S E x [ β ] = β 2 B 2 ( x ) + n 1 β 1 / 2 μ V f 1 ( x ) 5 4 4 / 5 { B 2 ( x ) } 1 / 5 { μ V f 1 ( x ) } 4 / 5 n 4 / 5 i f B ( x ) 0
(the equality holds iff β = [ { μ V f 1 ( x ) } / { 4 B 2 ( x ) } ] 2 / 5 n 2 / 5 ).
Theorem 6.
Suppose that AssumptionsA1, A2.1( ν = 0 ); see Appendix,F(ii.1 and iii), andF hold. UnderB( 0 < ι < 1 ) , we have f ˜ β , ϵ ( x ) a . s . f ( x ) for fixed x > 0 .
Theorem 7.
Suppose that AssumptionsA1, A2.2( ν = 0 ); see Appendix,F(ii.1 and iii), andF hold. UnderB( 0 < ι < 1 ), we have
( n β 1 / 2 ) 1 / 2 { f ˜ β , ϵ ( x ) E [ f ˜ β , ϵ ( x ) ] } d N ( 0 , μ V f 1 ( x ) ) f o r f i x e d x > 0 .
For fixed x > 0 , the replacement of E [ f ˜ β , ϵ ( x ) ] by f ( x ) is a routine by combining Theorem 7 with a bias of Theorem 6: If 2 / 5 ι < 1 (for the extreme case ι = 2 / 5 , assume n β 5 / 2 0 ), then,
( n β 1 / 2 ) 1 / 2 { f ˜ β , ϵ ( x ) f ( x ) } d N ( 0 , μ V f 1 ( x ) ) f o r f i x e d x > 0 .
Theorem 8.
Suppose that AssumptionsA1, A2.1( ν = 0 ), A3( H = 2 / η + 1 + δ 0 ) ; see Appendix,F(ii.1, iii, and iv), andF hold, where 0 t 2 ( 1 / η + 1 ) + δ 0 f LB ( t ) d t exists for some constant δ 0 > 0 . UnderB( 0 < ι < 1 ) , we have, for every 0 < τ < 1 / 2 ,
β τ M S E [ f ˜ β , ϵ ( x ) ] d x = A M I S E [ β ] + o ( β 2 + n 1 β 1 / 2 ) ,
where
A M I S E [ β ] = β 2 I B 2 + n 1 β 1 / 2 μ I V f 1 5 4 4 / 5 ( I B 2 ) 1 / 5 ( μ I V f 1 ) 4 / 5 n 4 / 5 i f B ( x ) ¬ 0
(the equality holds iff β = { ( μ I V f 1 ) / ( 4 I B 2 ) } 2 / 5 n 2 / 5 ).
Remark 5.
Whether or not there exists a 0 < τ < 1 / 2 , such that 0 β τ M S E [ f ˜ β , ϵ ( x ) ] d x is order o ( β 2 + n 1 β 1 / 2 ) (under some additional unnecessary stronger conditions) would be rather technical. We do not pursue the issue any more.

5. Simulation Studies

To demonstrate the finite sample performance of the proposed density estimator f ^ β , ϵ , we generated 1000 random samples of size n = 200 , 300 , 500 from the LB density f LB ( x ) = x 2 e x / 2 , and computed the PE [ p ] -based BS/log KDEs ( p = 1 , 3 / 2 ) and gamma KDE for the original density f ( x ) = x e x . In the simulation, we used the least squared cross-validated (LSCV) smoothing parameter, for each sample. Then, the average integrated squared errors (ISEs), ( 1 / 1000 ) = 1 1000 0 { f ^ β , ϵ , [ ] ( x ) f ( x ) } 2 d x , were reported in Table 1, where f ^ β , ϵ , [ ] is computed from the th sample.
As expected, all average ISEs decreased as the sample size n increased, which is in agreement with the MISE result. The BS/LN KDEs ( p = 1 ) were, overall, improved by the estimators with p = 3 / 2 ; such a tendency can be illustrated via the AMISE relative efficiency index
AMISE opt ( p ) AMISE opt ( 1 ) = 2 1 1 / ( 2 p ) p π Γ 1 / 2 ( 3 / ( 2 p ) ) Γ 3 / 2 ( 1 / ( 2 p ) ) 4 / 5
(see Kakizawa (2018,2021)), since
AMISE opt ( p ) = 5 4 ( I B f 2 ) 1 / 5 C g PE [ p ] 2 C g PE [ p ] 2 μ 0 f 3 / 2 ( x ) d x 4 / 5 n 4 / 5 .
Needless to say, the best implemented smoothing parameter β opt , given in Theorem 4, depends on the unknown f, so that the data-driven procedure is crucial. We tried to conduct the LSCV smoothing parameter selection5. Unlike the direct sample (Kakizawa (2018,2021)), the present LB setting, for small sample size n = 100 (not being reported here), produced multiple local minima for the LSCV score (in many cases, it was rather unstable numerically), whereas such an undesirable behavior seemed to be fixed when n = 300 . A further issue of considering a plug-in selection with a pilot estimator is left in future.

6. Discussion

Our asymptotic results under the LB sampling can be extended to more general biased sampling, i.e., a weighted distribution for a known (positive) weight function ω , given by f ω ( x ) ω ( x ) f ( x ) . The LB density is a special case of ω ( x ) = x , and another example is ω ( x ) = x 2 (the area-biased density). Also, the d-variate weighted density is defined by f ω ( x ) ω ( x ) f ( x ) , x = ( x 1 , , x d ) . Ahmad (1995) extended Jones’s (1991) estimator to the d-variate case. Note that the product-kernel-method, using the product asymmetric kernel j = 1 d k ( X i j ; β j , x j ) , x [ 0 , ) d , instead of j = 1 d k ( ( x j X i j ) / h j ) / h j , or the non-product kernel (Igarashi (2018) and Kakizawa (2022)), can be straightforwardly applied to solve the boundary bias problem of Ahmad’s (1995) estimator.

Funding

The author has been supported in part by the Japan Society for the Promotion of Science; Grant-in-Aid for Scientific Research (C), 20K11700 and 23K11002.

Data Availability Statement

Not available.

Acknowledgments

Some preliminary results were first announced, without face-to-face talking (due to the pandemic of covid-19), at Japanese Joint Statistical Meeting 2021 (Japanese federation of statistical science associations), autumn meeting 2021 (Mathematical Society of Japan), and the 5th International Conference on Econometrics and Statistics (EcoStat2022).

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
KDE kernel density estimator
LB length-biased
MSE mean squared error
MISE mean integrated squared error
LN log-normal
IG inverse Gaussian
RIG reciprocal inverse Gaussian
BS Birnbaum–Saunders
LS log-symmetrical
PE power exponential
MIG mixture of IG and RIG
LSCV least squared cross-validated
ISE integrated squared error
ROT rule of thumb

Appendix A

Appendix A.1. Technical Conditions on k(·;β,x)

There are three indispensable requirements on the kernel k ( s ; β , x ) , s , x 0 :
(I) approximations of μ j ( k ( · ; β , x ) ) = 0 ( s x ) j k ( s ; β , x ) d s and 0 k 2 ( s ; β , x ) d s ;
(II) uniform/nonuniform bounds of sup s 0 k ( s ; β , x ) ;
(III) behavior of the tail integral of k ( s ; β , x ) with respect to x (the regularity as x will be required only for dealing with an asymptotic expansion of the MISE rigorously).
More precisely, we assume (e.g., Igarashi and Kakizawa (2020) and Kakizawa (2021)):
A1.
In addition to Definition 1, there exists a density p ( · ; · ) , such that
1.
0 u 2 p ( u ; 0 ) d u exists and, for any y 0 , 0 u p ( u ; y ) d u ( C ˜ ( 1 + y ) ) exists, where C ˜ > 0 is a constant, independent of y;
2.
given constants 0 η ˜ < 1 and c L > 0 , for all sufficiently small β > 0 , x c L β η ˜ implies that
μ j ( k ( · ; β , x ) ) = β ζ 1 , 1 + r 1 , β ( x ) , j = 1 , β ζ 2 , 1 x + r 2 , β ( x ) , j = 2 , r 4 , β ( x ) , j = 4 ,
with | r 1 , β ( x ) | M ˜ 1 β 3 / 2 / x 1 / 2 , | r 2 , β ( x ) | M ˜ 2 β 2 , 0 < r 4 , β ( x ) M ˜ 4 β 2 ( x + β ) 2 , where ζ 1 , 1 , ζ 2 , 1 ( ζ 2 , 1 > 0 ) and M ˜ 1 , M ˜ 2 , M ˜ 4 > 0 are constants, independent of β and x;
3.
0 p 2 ( u ; 0 ) d u exists, and, given constants 0 < η ˜ < 1 and c L > 0 , for all sufficiently small β > 0 , x c L β η ˜ implies that
0 k 2 ( s ; β , x ) d s ζ ( β x ) 1 / 2 M ˜ ( β x ) 1 / 2 β x + β 1 / 2 ,
where ζ , M ˜ > 0 are constants, independent of β and x.
A2( ν ).
u β , ν ( x ) = sup s 0 { ( β / s ) ν k ( s ; β , x ) } satisfies:
1.
sup x 0 u β , ν ( x ) L K , ν β 1 , where L K , ν > 0 is a constant, independent of β ;
2.
for x > 0 , u β , ν ( x ) L K , ν ( β x ) 1 / 2 , where L K , ν > 0 is a constant, independent of β and x.
A3(H).
Given a constant τ > 0 , and for all sufficiently small β > 0 ,
0 ( β τ k ( s ; β , x ) g ( s ) d x d s ) = O ( β τ ( H + 1 ) )
(assume that 0 s H + 1 g ( s ) d s exists).
Most of the existing asymmetric kernels satisfy Assumptions A1, A2 ( 0 ) , and A3(H); see Igarashi and Kakizawa (2020) and Kakizawa (2021). For instance, the constants ζ 1 , 1 , ζ 2 , 1 , and ζ (given in A1.2–3), associated with k g ( q B S ) ( · ; β , x ) , are given by ζ 1 , 1 = c + θ + J g / 2 , ζ 2 , 2 = J g , and ζ = C g 2 / C g 2 , independent of q 0 , where J g = u 2 C g g ( u 2 ) d u . Of course, we need to impose a set of requirements on the density generator g, under which A1, A2(0) and A3(H) hold for the asymmetric kernel k g ( q B S ) ( · ; β , x ) ; see Kakizawa (2018,2021). For simplicity, we assume that there exist constants M g , B > 0 such that g ( y ) M g e B y for every y 0 . It remains to discuss A2 ( ν > 0 ) . Note that A2( ν = 0 , 1 ) is technically required to prove Theorems 1–4 (i.e., (A8) and (A11) under A2( ν = 1 )); indeed, (A11) is crucial for Lemma A3. On the other hand, A2( ν = 0 ) is enough for the proofs of Theorems 5–8.
Property A1.
The kernel k g ( q B S ) ( · ; β , x ) satisfies Assumption A2(ν) for any ν 0 , with
u β , ν ( q B S ) ( x ) C g exp ν + 1 c max ( θ , 0 ) c ν { β ( x + β c ) } 1 / 2 sup u R exp ν + 1 c 1 / 2 | u | g ( u 2 ) , q = 0 , C g 8 q 2 c 1 + θ 2 c + 2 ( q + ν + 1 ) / ( 2 q ) c ν { β ( x + β c ) } 1 / 2 M g , ν ( q ) , q > 0 ,
where M g , ν ( q ) = sup y 0 [ ( y + 1 ) ( q + ν + 1 ) / ( 2 q ) g ( y ) ] .
Proof. 
Recall that, with α ( y ) = 1 / ( y + c ) 1 / 2 (note that sup y 0 α ( y ) = 1 / c 1 / 2 ),
k g ( q B S ) ( s ; β , x ) = C g { β ( x + β c ) } 1 / 2 g a q ( α 2 ( x / β ) ( s / β ) ) α ( x / β ) θ α ( x / β ) 2 A q ( α 2 ( x / β ) ( s / β ) ) .
This, together with β / s ( x + β c ) / ( c s ) = β / { c s α 2 ( x / β ) } , yields
u β , ν ( q B S ) ( x ) C g c ν { β ( x + β c ) } 1 / 2 sup t 0 g a q ( t ) α ( x / β ) θ α ( x / β ) 2 A q ( t ) t ν , ν 0 .
It suffices to bound t ν A q ( t ) = ( 1 / 2 ) t ν ( t q 1 + t ( q + 1 ) ) , q 0 , in the same manner as Kakizawa (2018). □

Appendix A.2. Auxiliary Lemmas

We mention (without proof) the following basic lemma; (ii) is a slight modification of Kakizawa (2021):
Lemma A1.
(i)Let g be a twice continuously differentiable function on [ 0 , ) , where g , g , and g are bounded; besides, g is Hölder-continuous with exponent 0 < η 1 . Under AssumptionA1.1–2, given constants c L > 0 and 0 < τ < 1 , we have, for all sufficiently small β > 0 ,
1 . sup 0 x c L β τ | B β k , g ( x ) | = O ( β τ ) ; 2 . B β k , g ( 0 ) = β g ( 0 ) 0 u p ( u ; 0 ) d u + O ( β 2 ) ; 3 . f o r x c L β τ , B β k , g ( x ) = β B g ( x ) + E β ( x ) ,
with | E β ( x ) | M g ω β , η ( x ) , where M g > 0 is a constant, independent of β and x.
(ii)Let g be a bounded and Hölder-continuous function (with exponent 0 < η 1 ) on [ 0 , ) . Under AssumptionsA1andA2( ν = 0 ), given constants c L > 0 and 0 < τ < 1 , we have, for all sufficiently small β > 0 ,
1 . J β k 2 , g ( 0 ) = β 1 g ( 0 ) 0 p 2 ( u ; 0 ) d u + O ( β η 1 ) ; 2 . f o r x c L β τ , J β k 2 , g ( x ) = β 1 / 2 V g ( x ) + E β ( x ) ,
with | E β ( x ) | M g { ω β , η ( x ; g ) + 1 } , where M g > 0 is a constant, independent of β and x.
We can verify that, according to, e.g., Igarashi and Kakizawa (2020) and Kakizawa (2021),
0 { B β k , g ( x ) } 2 d x = β 2 I B g 2 + o ( β 2 ) a n d 0 J β k 2 , g ( x ) d x = β 1 / 2 I V g + o ( β 1 / 2 )
if 0 t 2 { 3 / min ( η , η ) + 1 } + δ 0 g ( t ) d t exists for some constant δ 0 > 0 , as follows:
1.
(i) For 2 / 3 < τ < 1 , 0 β τ { B β k , g ( x ) } 2 d x = O ( β 3 τ ) = o ( β 2 ) using Lemma A1(i.1).
(ii) Take a constant 0 < τ < η / ( 3 + η ) ( 1 / 4 ). Using Lemma A1(i.3), we have, for any 0 < τ < 1 ,
β τ β τ { B β k , g ( x ) } 2 d x β 2 I B g 2 β 2 0 β τ + β τ B g 2 ( x ) d x + 2 β I B g 2 β τ β τ E β 2 ( x ) d x 1 / 2 + β τ β τ E β 2 ( x ) d x = o ( β 2 ) .
(iii) Under A3( H = 6 / η + 1 + δ 0 ), we have, for 2 / ( H + 1 ) < τ < η / ( 3 + η ) ,
β τ { B β k , g ( x ) } 2 d x 2 | | g | | [ 0 , ) 0 β τ k ( s ; x , β ) g ( s ) d x d s + β τ g ( x ) d x = O ( β τ ( H + 1 ) ) = o ( β 2 ) .
2.
(i) For τ > 1 / 2 , 0 β τ J β k 2 , g ( x ) d x L K , 0 β τ 1 | | g | | [ 0 , ) = o ( β 1 / 2 ) by A2.1( ν = 0 ).
(ii) Take a constant 0 < τ < η / ( 1 + η ) ( 1 / 2 ). Using Lemma A1(ii.2), we have, for any 0 < τ < 1 ,
β τ β τ J β k 2 , g ( x ) d x β 1 / 2 I V g β 1 / 2 0 β τ + β τ V g ( x ) d x + β τ β τ | E β ( x ) | d x = o ( β 1 / 2 ) .
(iii) Under A3( H = 6 / η + 1 + δ 0 ), we have, for 1 / { 2 ( H + 1 ) } < τ < η / ( 1 + η ) ,
β τ J β k 2 , g ( x ) d x L K , 0 β 1 0 β τ k ( s ; x , β ) g ( s ) d x d s ( b y   A 2.1 ( ν = 0 ) ) = O ( β { τ ( H + 1 ) 1 / 2 } 1 / 2 ) = o ( β 1 / 2 ) .

Appendix B

Define
ζ ϵ = 1 n i = 1 n μ ( Y i + ϵ ) 1 + μ ϵ 1 .
Then,
f ^ β , ϵ ( x ) = f ( x ) + ζ β ( x ) 1 + ζ ϵ , w i t h ζ β ( x ) = 1 n i = 1 n μ Y i 1 k ( Y i ; β , x ) f ( x ) .
Also,
f ˜ β , ϵ ( x ) = f ( x ) + ( μ / x ) ζ β ( x ) 1 + ζ ϵ , w i t h ζ β ( x ) = 1 n i = 1 n k ( Y i ; β , x ) f LB ( x ) .

Appendix B.1. Some basic results for ζ ϵ

For ease of reference, we first mention the tail probability/moment bounds of ζ ϵ . Rewrite
ζ ϵ = Δ ¯ ϵ + δ ϵ ,
where δ ϵ = μ E [ ( Y + ϵ ) 1 ] + μ ϵ 1 , and Δ ¯ ϵ is the average of zero-mean independent random variables
Δ i , ϵ = μ ( Y i + ϵ ) 1 μ E [ ( Y i + ϵ ) 1 ] , i = 1 , , n ,
with | Δ i , ϵ | μ ϵ 1 , V [ Δ i , ϵ ] μ 2 E [ ( Y + ϵ ) 2 ] μ ϵ 1 (we also have V [ Δ i , ϵ ] μ 2 E [ Y 2 ] ). Then, Bernstein’s inequality yields the exponential bound of the tail probability
P [ | Δ ¯ ϵ | t ] 2 exp n ϵ t 2 2 ( 1 + t / 3 ) μ f o r a l l t > 0
(hence, Δ ¯ ϵ a . s . 0 if ( n ϵ ) / log n ).
Suppose that E [ Y 2 ] exists. Using μ ( Y + ϵ ) 1 μ Y 1 = μ ϵ Y 1 ( Y + ϵ ) 1 , we have
| δ ϵ | μ ϵ ( 1 + E [ Y 2 ] ) .
Furthermore, it is easy to see that
V [ Δ ¯ ϵ ] = n 1 μ 2 V [ ( Y + ϵ ) 1 ] n 1 μ 2 E [ Y 2 ] , E [ Δ ¯ ϵ 4 ] = 1 n 3 E [ Δ 1 , ϵ 4 ] + 3 ( n 1 ) n 3 ( V [ Δ 1 , ϵ ] ) 2 μ 4 ( n ϵ 2 ) 1 + 3 E [ Y 2 ] n 2 E [ Y 2 ] .
It follows that
E [ ζ ϵ 2 ] = δ ϵ 2 + V [ Δ ¯ ϵ ] = O ( n 1 ) , E [ ζ ϵ 4 ] 8 ( δ ϵ 4 + E [ Δ ¯ ϵ 4 ] ) = O ( n 2 ) .

Appendix B.2. Some preliminary results for ζ β (x)

We next list some facts about ζ β ( x ) , including the tail probability/moment bounds and asymptotic normality. Rewrite
ζ β ( x ) = Δ ¯ β ( x ) + B β k , f ( x ) ,
where Δ ¯ β ( x ) is the average of zero-mean independent random variables
Δ i , β ( x ) = μ Y i 1 k ( Y i ; β , x ) μ E [ Y i 1 k ( Y i ; β , x ) ] , i = 1 , , n ,
with | Δ i , β ( x ) | β 1 u β , 1 ( x ) μ , V [ Δ i , β ( x ) ] μ 2 E [ { Y 1 k ( Y ; β , x ) } 2 ] β 1 u β , 1 ( x ) μ J β k , f ( x ) (we also have V [ Δ i , β ( x ) ] u β , 0 ( x ) μ J β k , f 1 ( x ) ). Assumption A2.1( ν = 1 ) and Bernstein’s inequality yield the exponential bound of the tail probability
P [ | Δ ¯ β ( x ) | t ] 2 exp n β 2 t 2 2 ( | | f | | [ 0 , ) + t / 3 ) L K , 1 μ f o r   a l l   t > 0
(hence, Δ ¯ β ( x ) a . s . 0 if ( n β 2 ) / log n , which is implied by, e.g., β n ι for some constant 0 < ι < 1 / 2 ). On the other hand, F(iii ), A1.3, and A2.2( ν = 0 ) imply that, for fixed x > 0 (assume f ( x ) > 0 ), we have, under B( 0 < ι < min { 2 q / ( 2 + q ) , 1 } ) (note that ι = 2 / 5 is feasible when F(iii ) holds for some q > 1 / 2 ),
1 n 2 + q i = 1 n E [ | Δ i , β ( x ) | 2 + q ] ( V [ Δ ¯ β ( x ) ] ) 1 + q / 2 μ 1 + q μ ( 1 + q ) n { 2 L K , 0 ( β x ) 1 / 2 } 2 + q ( n 2 V [ Δ ¯ β ( x ) ] ) 1 + q / 2 = O ( n q / 2 β ( 1 + q / 2 ) / 2 ) = o ( 1 ) ,
hence,
Δ ¯ β ( x ) ( V [ Δ ¯ β ( x ) ] ) 1 / 2 d N ( 0 , 1 ) , i . e . , ( n β 1 / 2 ) 1 / 2 Δ ¯ β ( x ) d N ( 0 , μ V f 1 ( x ) )
using Lyapunov’s theorem (for triangular arrays), together with n β 1 / 2 V [ Δ ¯ β ( x ) ] μ V f 1 ( x ) for fixed x > 0 , which will be shown in (A18) below. Similarly, F(iii ), A1.3, and A2.1( ν = 0 ) imply that, for the case f 1 ( 0 ) > 0 , we have, under B( 0 < ι < q / ( 2 + q ) ) (note that ι = 1 / 3 is feasible when F(iii ) holds for some q > 1 (in this case, f 1 ( 0 ) = 0 ), i.e., ι = 1 / 3 is unfortunately infeasible for 0 < q 1 ),
1 n 2 + q i = 1 n E [ | Δ i , β ( 0 ) | 2 + q ] ( V [ Δ ¯ β ( 0 ) ] ) 1 + q / 2 μ 1 + q μ ( 1 + q ) n ( 2 L K , 0 β 1 ) 2 + q ( n 2 V [ Δ ¯ β ( 0 ) ] ) 1 + q / 2 = O ( n q / 2 β ( 1 + q / 2 ) ) = o ( 1 ) ,
hence,
Δ ¯ β ( 0 ) ( V [ Δ ¯ β ( 0 ) ] ) 1 / 2 d N ( 0 , 1 ) , i . e . , ( n β ) 1 / 2 Δ ¯ β ( 0 ) d N 0 , μ f 1 ( 0 ) 0 p 2 ( u ; 0 ) d u
using Lyapunov’s theorem, together with n β V [ Δ ¯ β ( 0 ) ] μ f 1 ( 0 ) 0 p 2 ( u ; 0 ) d u , which will be shown in (A14) below.
It is easy to see that
V [ Δ ¯ β ( x ) ] = n 1 μ 2 E [ { Y 1 k ( Y ; β , x ) } 2 ] { μ E [ Y 1 k ( Y ; β , x ) ] } 2 = n 1 μ J β k 2 , f 1 ( x ) { J β k , f ( x ) } 2 ( n 1 μ J β k 2 , f 1 ( x ) ) , E [ ζ β 2 ( x ) ] = { B β k , f ( x ) } 2 + V [ Δ ¯ β ( x ) ] { B β k , f ( x ) } 2 + n 1 μ J β k 2 , f 1 ( x ) = D β ( x ) ( s a y ) , E [ Δ ¯ β 4 ( x ) ] = 1 n 3 E [ Δ 1 , β 4 ( x ) ] + 3 ( n 1 ) n 3 ( V [ Δ 1 , β ( x ) ] ) 2 { ( n β ) 1 L K , 1 β 1 μ } 2 D β ( x ) + 3 D β 2 ( x ) ( b y   A 2.1 ( ν = 1 ) ) .
In addition to sup x 0 | B β k , f ( x ) | 2 | | f | | [ 0 , ) ,
sup x 0 V [ Δ ¯ β ( x ) ] n 1 L K , 0 β 1 μ | | f 1 | | [ 0 , ) ( b y   A 2.1 ( ν = 0 ) ) ,
B β k , f ( 0 ) = β f ( 0 ) 0 u p ( u ; 0 ) d u + O ( β 2 ) ( b y   L e m m a   A 1 ( i . 2 ) ) , V [ Δ ¯ β ( 0 ) ] = n 1 β 1 μ f 1 ( 0 ) 0 p 2 ( u ; 0 ) d u
+ O ( n 1 β η 1 + n 1 ) ( b y   L e m m a   A 1 ( i i . 1 ) ) ,
Lemma A1 implies that, given a constant 0 < τ < 1 ,
sup 0 x β τ | B β k , f ( x ) | = O ( β τ )
(obviously, 0 β τ { B β k , f ( x ) } 2 d x = O ( β 3 τ ) and 0 β τ V [ Δ ¯ β ( x ) ] d x = O ( n 1 β τ 1 ) ), and that, for x β τ ,
| B β k , f ( x ) β B f ( x ) | M f ω β , η ( x ) ,
| J β k 2 , f 1 ( x ) β 1 / 2 V f 1 ( x ) | M f 1 { ω β , η ( x ; f 1 ) + 1 } ,
| V [ Δ ¯ β ( x ) ] n 1 β 1 / 2 μ V f 1 ( x ) | n 1 [ M f 1 μ { ω β , η ( x ; f 1 ) + 1 } + | | f | | [ 0 , ) 2 ] .
Remark A1.
If 0 t 2 { 3 / min ( η , η ) + 1 } + δ 0 f ( t ) d t exists for some constant δ 0 > 0 (this ensures 0 t 2 { 3 / min ( η , η ) + 1 } + δ 0 f 1 ( t ) d t { 0 t 2 { 3 / min ( η , η ) + 1 } + δ 0 f ( t ) d t } 6 / min ( η , η ) + 1 + δ 0 2 { 3 / min ( η , η ) + 1 } + δ 0 exists), in line with, e.g., Kakizawa (2021), Assumption A3( H = 6 / min ( η , η ) + 1 + δ 0 ) about the behavior of the tail integral of k ( s ; β , x ) with respect to x is crucial for proving the negligibility of the integral β τ E [ ζ β 2 ( x ) ] d x ( β τ D β ( x ) d x ), as follows: We have, for any constant τ > 2 / ( H + 1 ) ,
β τ D β ( x ) d x 2 | | f | | [ 0 , ) 0 β τ k ( s ; x , β ) f ( s ) d x d s + β τ f ( x ) d x + n 1 L K , 0 β 1 μ 0 β τ k ( s ; x , β ) f 1 ( s ) d x d s ( b y A 2.1 ( ν = 0 ) ) = O ( β τ ( H + 1 ) ( 1 + n 1 β 1 ) ) = o ( β 2 + n 1 β 1 / 2 ) .
Then, the approximation
0 E [ ζ β 2 ( x ) ] d x = β 2 I B f 2 + n 1 β 1 / 2 μ I V f 1 + o ( β 2 + n 1 β 1 / 2 )
can be verified, since, by taking 2 / ( H + 1 ) < τ < min { η / ( 3 + η ) , η / ( 1 + η ) } ,
0 β τ E [ ζ β 2 ( x ) ] d x = O ( β 3 τ + n 1 β τ 1 ) + β τ β τ [ D β ( x ) n 1 { J β k , f ( x ) } 2 ] d x = β 2 I B f 2 + n 1 β 1 / 2 μ I V f 1 + o ( β 2 + n 1 β 1 / 2 )
(we also take 2 / 3 < τ < 1 ; see the argument after Lemma A1).

Appendix B.3. Some Preliminary Results for ζ β † (x)

We here list some facts about ζ β ( x ) , including the tail probability/moment bounds and asymptotic normality. Rewrite
ζ β ( x ) = Δ ¯ β ( x ) + B β k , f LB ( x ) ,
where Δ ¯ β ( x ) is the average of zero-mean independent random variables
Δ i , β ( x ) = k ( Y i ; β , x ) E [ k ( Y i ; β , x ) ] , i = 1 , , n ,
with | Δ i , β ( x ) | u β , 0 ( x ) , V [ Δ i , β ( x ) ] E [ k 2 ( Y ; β , x ) ] u β , 0 ( x ) J β k , f LB ( x ) . As in, e.g., Igarashi and Kakizawa (2020) and Kakizawa (2021), by Assumption A2.1( ν = 0 ), an application of Bernstein’s inequality yields the exponential bound of the tail probability
P [ | Δ ¯ β ( x ) | t ] 2 exp n β t 2 2 ( | | f LB | | [ 0 , ) + t / 3 ) L K , 0 f o r   a l l   t > 0
(hence, Δ ¯ β ( x ) a . s . 0 if ( n β ) / log n ), whereas, if β 0 and n β 1 / 2 , then, A2.2( ν = 0 ) implies that, for fixed x > 0 (assume f LB ( x ) > 0 ),
1 n 2 + p i = 1 n E [ | Δ i , β ( x ) | 2 + p ] ( V [ Δ ¯ β ( x ) ] ) 1 + p / 2 = ( L K , 0 ) 2 ( β x ) 1 n ( n V [ Δ ¯ β ( x ) ] ) p / 2 = O ( ( n β 1 / 2 ) p / 2 ) f o r   a n y   p > 0 ,
hence,
Δ ¯ β ( x ) ( V [ Δ ¯ β ( x ) ] ) 1 / 2 d N ( 0 , 1 ) , i . e . , ( n β 1 / 2 ) 1 / 2 μ x Δ ¯ β ( x ) d N ( 0 , μ V f 1 ( x ) )
using Lyapunov’s theorem, together with n β 1 / 2 ( μ / x ) 2 V [ Δ ¯ β ( x ) ] μ V f 1 ( x ) for fixed x > 0 , which will be shown in (A26) below.
It is easy to see that
V [ Δ ¯ β ( x ) ] = n 1 J β k 2 , f LB ( x ) { J β k , f LB ( x ) } 2 ( n 1 J β k 2 , f LB ( x ) ) , E [ { ζ β ( x ) } 2 ] = { B β k , f LB ( x ) } 2 + V [ Δ ¯ β ( x ) ] { B β k , f LB ( x ) } 2 + n 1 J β k 2 , f LB ( x ) = D β ( x ) ( s a y ) ,
E [ { Δ ¯ β ( x ) } 4 ] 1 n 3 E [ { Δ 1 , β ( x ) } 4 ] + 3 ( n 1 ) n 3 ( V [ Δ 1 , β ( x ) ] ) 2 ( n 1 L K , 0 β 1 ) 2 D β ( x ) + 3 { D β ( x ) } 2 ( b y A 2.1 ( ν = 0 ) ) .
In addition to sup x 0 | B β k , f LB ( x ) | 2 | | f LB | | [ 0 , ) and
V [ Δ ¯ β ( x ) ] n 1 L K , 0 β 1 | | f LB | | [ 0 , ) ( b y A 2.1 ( ν = 0 ) ) ,
Lemma A1 implies that, given a constant 0 < τ < 1 , we have, for x β τ ,
μ x B β k , f LB ( x ) β B ( x ) M f LB μ x ω β , η ( x ) ,
μ x 2 J β k 2 , f LB ( x ) β 1 / 2 μ V f 1 ( x ) M f LB μ ω β , 1 ( x ; f 1 ) + μ x 2 ,
μ x 2 V [ Δ ¯ β ( x ) ] n 1 β 1 / 2 μ V f 1 ( x ) n 1 [ M f LB μ ω β , 1 ( x ; f 1 ) + μ x 2 { M f LB + | | f LB | | [ 0 , ) 2 } ] n 1 M f LB { ω β , 1 ( x ; f 1 ) + x 2 } ,
where M f LB = M f LB μ + ( M f LB + | | f LB | | [ 0 , ) 2 ) μ 2 .
Remark A2.
As mentioned in Theorem 8, we take an arbitrary constant 0 < τ < 1 / 2 and consider a weighted MISE, with a weight function w ( t ) = χ [ β τ , ) ( t ) (say). In line with, e.g., Kakizawa (2021), if 0 t 2 ( 1 + η ) / η + δ 0 f LB ( t ) d t exists for some constant δ 0 > 0 , Assumption A3( H = 2 / η + 1 + δ 0 ), together with sup x β τ ( μ / x ) 1 (say) for any constant τ > 0 , is crucial for proving the negligibility of β τ ( μ / x ) 2 E [ { ζ β ( x ) } 2 ] d x ( β τ D β ( x ) d x ), as follows: We have, for any constant 2 / ( H + 1 ) < τ < η / ( 1 + η ) ,
β τ D β ( x ) d x 2 | | f LB | | [ 0 , ) 0 β τ k ( s ; x , β ) f LB ( s ) d x d s + β τ f LB ( x ) d x + n 1 L K , 0 β 1 0 β τ k ( s ; x , β ) f LB ( s ) d x d s ( b y A 2.1 ( ν = 0 ) ) = O ( β τ ( H + 1 ) ( 1 + n 1 β 1 ) ) = o ( β 2 + n 1 β 1 / 2 ) .
We can verify that
β τ μ x 2 E [ { ζ β ( x ) } 2 ] d x = β 2 I B 2 + n 1 β 1 / 2 μ I V f 1 + o ( β 2 + n 1 β 1 / 2 )
for every 0 < τ < 1 / 2 , since
β τ β τ μ x B β k , f LB ( x ) 2 d x β 2 I B 2 β 2 0 β τ + β τ B 2 ( x ) d x + 2 β M f LB I B 2 β τ β τ μ x 2 ω β , η 2 ( x ) d x 1 / 2 + M f LB 2 β τ β τ μ x 2 ω β , η 2 ( x ) d x = o ( β 2 ) , β τ β τ μ x 2 V [ Δ ¯ β ( x ) ] d x n 1 β 1 / 2 μ I V f 1 n 1 β 1 / 2 μ 0 β τ + β τ V f 1 ( x ) d x + M f LB β τ β τ { ω β , 1 ( x ; f 1 ) + x 2 } d x = o ( n 1 β 1 / 2 ) .

Appendix B.4. Proofs of Main Results

Before proving Theorems 1–4, we prepare two lemmas (Lemmas A2 and A3):
Lemma A2.
Suppose that E [ Y 2 ] exists. Then,
E [ ζ β 2 ( x ) ζ ϵ 2 ] M 1 n 1 D β ( x ) ,
where M 1 > 0 is a constant, independent of n , β , and x.
Proof. 
It is easy to see that
E [ Δ ¯ β 2 ( x ) Δ ¯ ϵ 2 ] = 1 n 3 E [ Δ 1 , β 2 ( x ) Δ 1 , ϵ 2 ] + n 1 n 3 V [ Δ 1 , β ( x ) ] V [ Δ 1 , ϵ ] + 2 { C o v [ Δ 1 , β ( x ) , Δ 1 , ϵ ] } 2 1 n 3 E [ Δ 1 , β 2 ( x ) Δ 1 , ϵ 2 ] + 3 n 2 μ J β k 2 , f 1 ( x ) μ 2 E [ Y 2 ] ,
where
E [ Δ 1 , β 2 ( x ) Δ 1 , ϵ 2 ] 4 μ 4 E [ ( Y + ϵ ) 2 Y 2 k 2 ( Y ; β , x ) ] + 3 μ 2 E [ Y 2 k 2 ( Y ; β , x ) ] μ 2 E [ ( Y + ϵ ) 2 ] 4 μ 3 ( ϵ 2 + 3 E [ Y 2 ] ) J β k 2 , f 1 ( x ) .
The result follows from
E [ ζ β 2 ( x ) ζ ϵ 2 ] 4 { E [ Δ ¯ β 2 ( x ) Δ ¯ ϵ 2 ] + V [ Δ ¯ β ( x ) ] δ ϵ 2 } + 2 { B β k , f ( x ) } 2 E [ ζ ϵ 2 ] .
Now, we rewrite (A1) as
f ^ β , ϵ ( x ) = f ( x ) 1 ζ ϵ + ζ ϵ 2 1 + ζ ϵ + ζ β ( x ) 1 ζ ϵ 1 + ζ ϵ = f ( x ) + L β , ϵ ( x ) + R β , ϵ ( x ) ,
where
L β , ϵ ( x ) = ζ β ( x ) f ( x ) ζ ϵ , R β , ϵ ( x ) = f ( x ) ζ ϵ 2 1 + ζ ϵ ζ β ( x ) ζ ϵ 1 + ζ ϵ .
We need to evaluate E [ R β , ϵ 2 ( x ) ] .
Lemma A3.
Suppose that AssumptionsA2.1( ν = 1 )andBhold, and that E [ Y 2 ] exists. Under the boundedness of f, we have
E [ R β , ϵ 2 ( x ) ] M [ 4 n 2 + n 1 D β ( x ) ] ,
where M > 0 is a constant, independent of n , β , and x
Proof. 
Considering an event S n = { Y n : | ζ ϵ | < 1 / 2 } (say), we have
| R β , ϵ ( x ) | χ S n 2 { f ( x ) ζ ϵ 2 + | ζ β ( x ) ζ ϵ | } χ S n , | R β , ϵ ( x ) | ( 1 χ S n ) ( μ ϵ ) 1 { f ( x ) + | ζ β ( x ) { } + f ( x ) + | f ( x ) | ζ ϵ | + | ζ β ( x ) | } ( 1 χ S n ) ( μ ϵ ) 1 { f ( x ) + | ζ β ( x ) | } + 2 { 3 f ( x ) ζ ϵ 2 + | ζ β ( x ) ζ ϵ | } ( 1 χ S n ) .
Using { Y n : | ζ ϵ | 1 / 2 } { Y n : | Δ ¯ ϵ | 1 / 4 } (say) for all sufficiently large n, it can be shown that
E [ R β , ϵ 2 ( x ) ] 16 9 f 2 ( x ) E [ ζ ϵ 4 ] + E [ ζ β 2 ( x ) ζ ϵ 2 ] + 4 ( μ ϵ ) 2 { f 2 ( x ) + D β ( x ) } P [ | Δ ¯ ϵ | 1 / 4 ] + { E [ Δ ¯ β 4 ( x ) ] P [ | Δ ¯ ϵ | 1 / 4 ] } 1 / 2 .
Then, the result follows from (A4), (A6), (A12), and Lemma A2, under Assumption B (recall that ϵ = C n 1 / 2 ; in this case, P [ | Δ ¯ ϵ | 1 / 4 ] 2 e ϱ n 1 / 2 , where ϱ > 0 is a constant, independent of n). □
We are ready to prove Theorems 1–4.
Proof (Proof of Theorem 1
We start with
E [ f ^ β , ϵ ( x ) ] = f ( x ) + B β k , f ( x ) f ( x ) δ ϵ + E [ R β , ϵ ( x ) ] ,
f ^ β , ϵ ( x ) E [ f ^ β , ϵ ( x ) ] = Δ ¯ β ( x ) f ( x ) Δ ¯ ϵ + R β , ϵ ( x ) E [ R β , ϵ ( x ) ] ,
where
V [ f ^ β , ϵ ( x ) ] = V [ Δ ¯ β ( x ) ] + 2 C o v [ Δ ¯ β ( x ) , f ( x ) Δ ¯ ϵ + R β , ϵ ( x ) ] + V [ f ( x ) Δ ¯ ϵ + R β , ϵ ( x ) ] .
Also,
C o v [ Δ ¯ β ( x ) , Δ ¯ ϵ ] = n 1 μ 2 E [ ( Y + ϵ ) 1 Y 1 k ( Y ; β , x ) ] μ E [ Y 1 k ( Y ; β , x ) ] μ E [ ( Y + ϵ ) 1 ] ,
hence,
| C o v [ Δ ¯ β ( x ) , Δ ¯ ϵ ] | n 1 [ μ J β k , f 1 ( x ) + J β k , f ( x ) ] n 1 [ μ | | f 1 | | [ 0 , ) + | | f | | [ 0 , ) ] .
It is shown from (A28) and (A30) that
B i a s [ f ^ β , ϵ ( x ) ] B β k , f ( x ) | | f | | [ 0 , ) δ ϵ + { E [ R β , ϵ 2 ( x ) ] } 1 / 2 , V [ f ^ β , ϵ ( x ) ] V [ Δ ¯ β ( x ) ] M 2 n 1 + { n 1 μ J β k 2 , f 1 ( x ) E [ R β , ϵ 2 ( x ) ] } 1 / 2 + E [ R β , ϵ 2 ( x ) ] ,
where M 2 > 0 is a constant, independent of n , β , and x. Using Lemma A3 and
{ n 1 μ J β k 2 , f 1 ( x ) E [ R β , ϵ 2 ( x ) ] } 1 / 2 M 1 / 2 [ 2 n 1 D β 1 / 2 ( x ) + n 1 / 2 D β ( x ) ] M 1 / 2 [ n 1 + ( n 1 + n 1 / 2 ) D β ( x ) ] ,
we have
B i a s [ f ^ β , ϵ ( x ) ] B β k , f ( x ) M 3 n 1 / 2 + n 1 / 2 | B β k , f ( x ) | ( a s s u m e n 1 β 1 = o ( 1 ) ) , V [ f ^ β , ϵ ( x ) ] V [ Δ ¯ β ( x ) ] M 4 n 1 + n 1 / 2 D β ( x ) ,
where M 3 , M 4 > 0 are constants, independent of n , β , and x. Then, using (A12)–(A18), the proof is completed. □
Proof (Proof of Theorem 2
Recall (A1) (also (A3) and (A7)). The strong consistency follows from (A4), (A5), and (A8), together with B β k , f ( x ) = O ( β ) for fixed x > 0 (we also have B β k , f ( 0 ) = O ( β ) ). See Lemma A1(i.2–3). □
Proof (Proof of Theorem 3
By Lemma A3, we notice that R β , ϵ ( x ) E [ R β , ϵ ( x ) ] = o p ( ( n β 1 / 2 ) 1 / 2 ) for fixed x > 0 , and that R β , ϵ ( 0 ) E [ R β , ϵ ( 0 ) ] = o p ( ( n β ) 1 / 2 ) . Recalling (A29), where Δ ¯ ϵ = O p ( n 1 / 2 ) , we have
( n β 1 / 2 ) 1 / 2 ( f ^ β , ϵ ( x ) E [ f ^ β , ϵ ( x ) ] ) = ( n β 1 / 2 ) 1 / 2 Δ ¯ β ( x ) + o p ( 1 ) f o r   f i x e d x > 0 , ( n β ) 1 / 2 ( f ^ β , ϵ ( 0 ) E [ f ^ β , ϵ ( 0 ) ] ) = ( n β ) 1 / 2 Δ ¯ β ( 0 ) + o p ( 1 ) .
The results (i) and (ii) follow from (A9) and (A10), respectively. □
Proof (Proof of Theorem 4
In the same way as the argument of Remark , a rigorous derivation of the MISE is made by splitting the integral into three parts (we set H = 6 / η + 1 + δ 0 ), as follows:
  (i)
Theorem 1 yields 0 β τ M S E [ f ^ β , ϵ ( x ) ] d x = O ( β 3 τ + n 1 β τ 1 ) = o ( β 2 + n 1 β 1 / 2 ) for 2 / 3 < τ < 1 .
 (ii)
Take a constant τ > 2 / ( H + 1 ) . The inequality
ϵ 2 β τ E [ ζ β 2 ( x ) ζ ϵ 2 ] d x O ( n 1 ) 0 β τ k ( s ; β , x ) { f 1 ( s ) + f ( s ) } d x d s + O ( 1 ) β τ D β ( x ) d x ( L e m m a   A 2 ) ,
together with (A6), enables us to see that, by (A27),
β τ E [ { f ^ β , ϵ ( x ) f ( x ) } 2 ] d x 4 [ β τ D β ( x ) d x + | | f | | [ 0 , ) E [ ζ ϵ 2 ] + ( μ ϵ ) 2 | | f | | [ 0 , ) E [ ζ ϵ 4 ] + β τ E [ ζ β 2 ( x ) ζ ϵ 2 ] d x ] = o ( β 2 + n 1 β 1 / 2 ) ( s e e   t h e   a r g u m e n t   i n   R e m a r k   A 1 ) .
(iii)
Taking 2 / 3 < τ < 1 and 2 / ( H + 1 ) < τ < min { η / ( 3 + η ) , η / ( 1 + η ) } , Theorem 1 yields
β τ β τ { B i a s 2 [ f ^ β , ϵ ( x ) ] + V [ f ^ β , ϵ ( x ) ] } d x ( β 2 I B f 2 + n 1 β 1 / 2 μ I V f 1 ) β 2 0 β τ + β τ B f 2 ( x ) d x + 2 β I B f 2 β τ β τ { R β Bias ( x ) } 2 d x 1 / 2 + β τ β τ { R β Bias ( x ) } 2 d x + n 1 β 1 / 2 μ 0 β τ + β τ V f 1 ( x ) d x + β τ β τ { R β V ( x ) | d x = o ( β 2 + n 1 β 1 / 2 ) .
Before proving Theorems 5–8, we prepare two lemmas (Lemmas A4 and A5):
Lemma A4.
Suppose that E [ Y 2 ] exists. Then,
E [ { ζ β ( x ) } 2 ζ ϵ 2 ] M 1 n 1 D β ( x ) ,
where M 1 > 0 is a constant, independent of n , β , and x.
Proof. It is easy to see that
E [ { Δ ¯ β ( x ) } 2 Δ ¯ ϵ 2 ] = 1 n 3 E [ { Δ 1 , β ( x ) } 2 Δ 1 , ϵ 2 ] + n 1 n 3 V [ Δ 1 , β ( x ) ] V [ Δ 1 , ϵ ] + 2 { C o v [ Δ 1 , β ( x ) , Δ 1 , ϵ ] } 2 1 n 3 E [ { Δ 1 , β ( x ) } 2 Δ 1 , ϵ 2 ] + 3 n 2 J β k 2 , f LB ( x ) μ 2 E [ Y 2 ] ,
where
E [ { Δ 1 , β ( x ) } 2 Δ 1 , ϵ 2 ] 4 μ 2 E [ ( Y + ϵ ) 2 k 2 ( Y ; β , x ) ] + 3 E [ k 2 ( Y ; β , x ) ] μ 2 E [ ( Y + ϵ ) 2 ] 4 μ 2 ( ϵ 2 + 3 E [ Y 2 ] ) J β k 2 , f LB ( x ) .
The result follows from
E [ { ζ β ( x ) } 2 ζ ϵ 2 ] 4 { E [ { Δ ¯ β ( x ) } 2 Δ ¯ ϵ 2 ] + V [ Δ ¯ β ( x ) ] δ ϵ 2 } + 2 { B β k , f LB ( x ) } 2 E [ ζ ϵ 2 ] .
Now, (A2) can be rewritten as
f ˜ β , ϵ ( x ) = f ( x ) 1 ζ ϵ + ζ ϵ 2 1 + ζ ϵ + μ x ζ β ( x ) 1 ζ ϵ 1 + ζ ϵ = f ( x ) + μ x { L β , ϵ ( x ) + R β , ϵ ( x ) } ,
where
L β , ϵ ( x ) = ζ β ( x ) f LB ( x ) ζ ϵ , R β , ϵ ( x ) = f LB ( x ) ζ ϵ 2 1 + ζ ϵ ζ β ( x ) ζ ϵ 1 + ζ ϵ .
We need to evaluate E [ { R β , ϵ ( x ) } 2 ] .
Lemma A5.
Suppose that AssumptionsA2.1( ν = 0 )andBhold, and that E [ Y 2 ] exists. Under the boundedness of f LB , we have
E [ { R β , ϵ ( x ) } 2 ] M [ n 2 + n 1 D β ( x ) ] ,
where M > 0 is a constant, independent of n , β , and x.
Proof. We have
| R β , ϵ ( x ) | χ S n 2 { f LB ( x ) ζ ϵ 2 + | ζ β ( x ) ζ ϵ | } χ S n , | R β , ϵ ( x ) | ( 1 χ S n ) [ ( μ ϵ ) 1 { f LB ( x ) + | ζ β ( x ) | } + f LB ( x ) + { f LB ( x ) | ζ ϵ | + | ζ β ( x ) | } ] ( 1 χ S n ) ( μ ϵ ) 1 { f LB ( x ) + | ζ β ( x ) | } + 2 { 3 f LB ( x ) ζ ϵ 2 + | ζ β ( x ) ζ ϵ | } ( 1 χ S n ) .
Using { Y n : | ζ ϵ | 1 / 2 } { Y n : | Δ ¯ ϵ | 1 / 4 } (say) for all sufficiently large n, it can be shown that
E [ { R β , ϵ ( x ) } 2 ] 16 9 f LB 2 ( x ) E [ ζ ϵ 4 ] + E [ { ζ β ( x ) } 2 ζ ϵ 2 ] + 4 ( μ ϵ ) 2 { f LB 2 ( x ) + D β ( x ) } P [ | Δ ¯ ϵ | 1 / 4 ] + { E [ { Δ ¯ β ( x ) } 4 ] P [ | Δ ¯ ϵ | 1 / 4 ] } 1 / 2 .
Then, the result follows from (A4), (A6), (A22), and Lemma A4, under Assumption B (recall that ϵ = C n 1 / 2 ; in this case, P [ | Δ ¯ ϵ | 1 / 4 ] 2 e ϱ n 1 / 2 , where ϱ > 0 is a constant, independent of n). □
We are ready to prove Theorems 5–8.
Proof (Proof of Theorem 5) We start with
E [ f ^ β , ϵ ( x ) ] = f ( x ) + B β k , f ( x ) f ( x ) δ ϵ + E [ R β , ϵ ( x ) ] ,
f ^ β , ϵ ( x ) E [ f ^ β , ϵ ( x ) ] = Δ ¯ β ( x ) f ( x ) Δ ¯ ϵ + R β , ϵ ( x ) E [ R β , ϵ ( x ) ] ,
where
V [ f ^ β , ϵ ( x ) ] = V [ Δ ¯ β ( x ) ] + 2 C o v [ Δ ¯ β ( x ) , f ( x ) Δ ¯ ϵ + R β , ϵ ( x ) ] + V [ f ( x ) Δ ¯ ϵ + R β , ϵ ( x ) ] .
Also,
C o v [ Δ ¯ β ( x ) , Δ ¯ ϵ ] = n 1 μ 2 E [ ( Y + ϵ ) 1 Y 1 k ( Y ; β , x ) ] μ E [ Y 1 k ( Y ; β , x ) ] μ E [ ( Y + ϵ ) 1 ] ,
hence,
| C o v [ Δ ¯ β ( x ) , Δ ¯ ϵ ] | n 1 [ μ J β k , f 1 ( x ) + J β k , f ( x ) ] n 1 [ μ | | f 1 | | [ 0 , ) + | | f | | [ 0 , ) ] .
It is shown from (A32) and (A34) that
B i a s [ f ^ β , ϵ ( x ) ] B β k , f ( x ) | | f | | [ 0 , ) δ ϵ + { E [ R β , ϵ 2 ( x ) ] } 1 / 2 , V [ f ^ β , ϵ ( x ) ] V [ Δ ¯ β ( x ) ] M 2 n 1 + { n 1 μ J β k 2 , f 1 ( x ) E [ R β , ϵ 2 ( x ) ] } 1 / 2 + E [ R β , ϵ 2 ( x ) ] ,
where M 2 > 0 is a constant, independent of n , β , and x. Using Lemma A5 and
{ n 1 μ J β k 2 , f 1 ( x ) E [ R β , ϵ 2 ( x ) ] } 1 / 2 M 1 / 2 [ 2 n 1 D β 1 / 2 ( x ) + n 1 / 2 D β ( x ) ] M 1 / 2 [ n 1 + ( n 1 + n 1 / 2 ) D β ( x ) ] ,
we have
B i a s [ f ^ β , ϵ ( x ) ] B β k , f ( x ) M 3 n 1 / 2 + n 1 / 2 | B β k , f ( x ) | ( a s s u m e n 1 β 1 = o ( 1 ) ) , V [ f ^ β , ϵ ( x ) ] V [ Δ ¯ β ( x ) ] M 4 n 1 + n 1 / 2 D β ( x ) ,
where M 3 , M 4 > 0 are constants, independent of n , β , and x. Then, using (A24)–(A26), the proof is completed. □
Proof Proof of Theorem 6)
Recall (A2) (also (A3) and (A19)). The strong consistency follows from (A4), (A5), and (A20), together with ( μ / x ) B β k , f LB ( x ) = O ( β ) for fixed x > 0 (see (A24)). □
Proof (Proof of Theorem 7) Recall (A33), where Δ ¯ ϵ = O p ( n 1 / 2 ) . For fixed x > 0 , Lemma A5 implies that R β , ϵ ( x ) E [ R β , ϵ ( x ) ] = o p ( ( n β 1 / 2 ) 1 / 2 ) , i.e.,
( n β 1 / 2 ) 1 / 2 ( f ˜ β , ϵ ( x ) E [ f ˜ β , ϵ ( x ) ] ) = ( n β 1 / 2 ) 1 / 2 μ x Δ ¯ β ( x ) + o p ( 1 ) .
The asymptotic normality follows from (A21)). □
Proof (Proof of Theorem 8
We assume that 0 t 2 ( 1 / η + 1 ) + δ 0 f LB ( t ) d t = I (say) exists for some constant δ 0 > 0 (then, 0 t 2 ( 1 / η + 1 ) + δ 0 f ( t ) d t = μ 0 t 2 / η + 1 + δ 0 f LB ( t ) d t μ I 2 / η + 1 + δ 0 2 ( 1 / η + 1 ) + δ 0 exists). In the same way as the argument of Remark , a rigorous derivation is made by splitting the integral into two parts (we set H = 2 / η + 1 + δ 0 ), as follows:
 (i)
Take a constant τ > 2 / ( H + 1 ) . The inequality
ϵ 2 β τ E [ { ζ β ( x ) } 2 ζ ϵ 2 ] d x O ( n 1 ) 0 β τ k ( s ; β , x ) { f ( s ) + f LB ( s ) } d x d s + O ( 1 ) β τ D β ( x ) d x ( L e m m a   A 4 ) ,
together with (A6) and sup x β τ ( μ / x ) 1 (say), enables us to see that, by (A31),
β τ E [ { f ˜ β , ϵ ( x ) f ( x ) } 2 ] d x 4 [ β τ D β ( x ) d x + | | f | | [ 0 , ) E [ ζ ϵ 2 ] + ( μ ϵ ) 2 | | f | | [ 0 , ) E [ ζ ϵ 4 ] + β τ E [ { ζ β ( x ) } 2 ζ ϵ 2 ] d x ] = o ( β 2 + n 1 β 1 / 2 ) ( r e p e a t   t h e   s a m e   a r g u m e n t   i n   R e m a r k   A 2 ) .
(ii)
Taking 2 / ( H + 1 ) < τ < η / ( 1 + η ) , Theorem 5 enables us to see that, for every 0 < τ < 1 / 2 ,
β τ β τ { B i a s 2 [ f ˜ β , ϵ ( x ) ] + V [ f ˜ β , ϵ ( x ) ] } d x ( β 2 I B 2 + n 1 β 1 / 2 μ I V f 1 ) β 2 0 β τ + β τ B 2 ( x ) d x + 2 β I B 2 β τ β τ { R β Bias ( x ) } 2 d x 1 / 2 + β τ β τ { R β Bias ( x ) } 2 d x + n 1 β 1 / 2 μ 0 β τ + β τ V f 1 ( x ) d x + β τ β τ | R β V ( x ) | d x = o ( β 2 + n 1 β 1 / 2 ) .

References

  1. Ahmad, I.A. On multivariate kernel estimation for samples from weighted distributions. Statist.Probab.Lett. 1995, 22, 121–129. [Google Scholar] [CrossRef]
  2. Bhattacharyya, B.B.; Franklin, L.A.; Richardson, G.D. A comparison of nonparametric unweighted and length-biased density estimation of fibres. Comm.Statist.Theory Methods 1988, 17, 3629–3644. [Google Scholar]
  3. Chaubey, Y.P.; Li, J. Asymmetric kernel density estimator for length-biased data. In Contemporary Topics in Mathematics and Statistics with Applications; Adhikari, A., Adhikari, M.R., Chaubey, Y.P., Eds.; Asian Books Private Ltd: New Delhi, 2013; pp. 1–28. [Google Scholar]
  4. Chaubey, Y.P.; Sen, P.K.; Li, J. Smooth density estimation for length-biased data. J.Indian Soc.Agricultual Statist. 2010, 64, 145–155. [Google Scholar]
  5. Chen, S.X. 1999. Beta kernel estimators for density functions. Comput.Statist.Data.Anal. 1999, 31, 131–145. [Google Scholar] [CrossRef]
  6. Chen, S.X. Probability density function estimation using gamma kernels. Ann.Inst.Stat.Math. 2000, 52, 471–480. [Google Scholar] [CrossRef]
  7. Cox, D.R. Some sampling problems in technology. In New Developments in Survey Sampling; Johnson, N.L., Smith, H.Jr., Eds.; Wiley: New York, 1969; pp. 506–527. [Google Scholar]
  8. Cristóbal, J.A.; Alcalá, J.T. An overview of nonparametric contributions to the problem of functional estimation from biased data. Test 2001, 10, 309–332. [Google Scholar] [CrossRef]
  9. Feller, W. An Introduction to Probability Theory and its Applications, 2nd ed.; Wiley: New York, 1971; Volume II. [Google Scholar]
  10. Guillamón, A. , Navarro, J., and Ruiz, J.M. 1998. Kernel density estimation using weighted data. M. 1998. Kernel density estimation using weighted data. Comm.Statist.Theory Methods 27 2123–2135.
  11. Igarashi, G. Weighted log-normal kernel density estimation. Comm.Statist.Theory Methods 2016, 45, 6670–6687. [Google Scholar] [CrossRef]
  12. Igarashi, G. Multivariate density estimation using a multivariate weighted log-normal kernel. Sankyhā 2018, 80, 247–266. [Google Scholar] [CrossRef]
  13. Igarashi, G.; Kakizawa, Y. Re-formulation of inverse Gaussian, reciprocal inverse Gaussian, and Birnbaum–Saunders kernel estimators. Statist.Probab.Lett. 2014, 84, 235–246. [Google Scholar] [CrossRef]
  14. Igarashi, G.; Kakizawa, Y. Multiplicative bias correction for asymmetric kernel density estimators revisited. Comput.Statist.Data.Anal. 2020, 141, 40–61. [Google Scholar] [CrossRef]
  15. Jones, M.C. Kernel density estimation for length biased data. Biometrika 1991, 78, 511–519. [Google Scholar] [CrossRef]
  16. Jones, M.C. Simple boundary correction for kernel density estimation. Stat.Comput. 1993, 3, 135–146. [Google Scholar] [CrossRef]
  17. Kakizawa, Y. Nonparametric density estimation for nonnegative data, using symmetrical-based inverse and reciprocal inverse Gaussian kernels through dual transformation. J.Statist.Plann.Inference 2018, 193, 117–135. [Google Scholar] [CrossRef]
  18. Kakizawa, Y. A class of Birnbaum–Saunders type kernel density estimators for nonnegative data. Comput.Statist.Data.Anal. 2021, 161, 107249. [Google Scholar] [CrossRef]
  19. Kakizawa, Y. Multivariate elliptical-based Birnbaum–Saunders kernel density estimation for nonnegative data. J.Multivariate.Anal. 2022, 187, 104834. [Google Scholar] [CrossRef]
  20. Marron, J.S. and Ruppert, D. 1994. Transformations to reduce boundary bias in kernel density estimation. J.R.Stat.Soc. 1994, 56, 653–671. [Google Scholar]
  21. Mnatsakanov, R.M.; Ruymgaart, F.H. Some properties of moment-empirical cdf’s with application to some inverse estimation problems. Math.Methods Statist. 2003, 12, 478–495. [Google Scholar]
  22. Mnatsakanov, R.M.; Ruymgaart, F.H. On moment-density estimation in some biased models. In Optimality: The Second Erich L. Lehmann Symposium; Institute of Mathematical Statistics, 2006; pp. 322–333. [Google Scholar]
  23. Parzen, E. On estimation of a probability density function and mode. On estimation of a probability density function and mode. Ann.Math.Statist. 33 1962, 1065–1076. [Google Scholar] [CrossRef]
  24. Patil, G.P.; Rao, C.R. Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 1978, 34, 179–189. [Google Scholar] [CrossRef]
  25. Richardson, G.D.; Kazempour, M.K.; Bhattacharyya, B.B. Length biased density estimation of fibres. J.Nonparametr.Stat. 1991, 1, 127–141. [Google Scholar] [CrossRef]
  26. Rosenblatt, M. Remarks on some nonparametric estimates of density functions. Ann.Math.Statist. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  27. Scaillet, O. Density estimation using inverse and reciprocal inverse Gaussian kernels. J.Nonparametr.Stat. 2004, 16, 217–226. [Google Scholar] [CrossRef]
  28. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hal: London, UK, 1986. [Google Scholar]
  29. Zhang, S. , Karunamuni, R.J., and Jones, M.C. 1999. An improved estimator of the density function at the boundary. C. 1999. An improved estimator of the density function at the boundary. J.Amer.Statist.Assoc. 94 1231–1241.
1
Clearly, the kernel is a rescaled version of the (standard) gamma density s θ 1 e s / Γ ( θ ) , with substitution of x / β + 1 for the shape parameter θ . Here, the shape θ is limited to be greater than or equal to 1, so as to ensure that the resulting kernel is bounded.
2
It is easy to see that
k g ( q B S ) ( s ; β , x ) } 2 = ( C g 2 / C g 2 ) β ( x + c β ) } 1 / 2 f g 2 ( q B S ) s ; 1 / ( x / β + c ) , β ( x / β + c ) , θ 1 / ( x / β + c ) A q s β ( x / β + c ) , s , x 0 ,
provided that g 2 is also a density generator.
3
It automatically means that f LB is a Lipschitz-continuous function (i.e., Hölder-continuous function, with exponent η = 1 ) on [ 0 , ) , with f LB ( 0 ) = 0 , and that f LB and f LB are continuous functions on [ 0 , ) ; besides, under F(ii.1), f ( 0 ) = 0 and f LB ( 0 ) = 0 . Note that f LB ( j ) ( t ) = μ 1 j f ( j 1 ) ( t ) + t f ( j ) ( t ) } , j = 1 , 2 .
4
Note that
0 f 1 ( t ) d t = 0 1 + 1 f 1 ( t ) d t 0 1 f 3 / 2 ( t ) d t + 1 f ( t ) d t < 0 f 3 / 2 ( t ) d t + 1 .
This, together with F(ii.1), implies that 0 f 1 2 ( t ) d t exists.
5
The rule of thumb (ROT) procedure, with the gamma or LN reference, was also considered. But, these results were not reported here, since, not surprisingly, the ROT was non-robust for the misspecification, although the computational speed of the ROT was very high.
Table 1. Average ISEs × 1000 of the estimators using the LSCV selected smoothing parameter. The value in the parenthesis stands for the standard deviation.
Table 1. Average ISEs × 1000 of the estimators using the LSCV selected smoothing parameter. The value in the parenthesis stands for the standard deviation.
PE [ p ] -based BS KDE PE [ p ] -based log KDE Gamma KDE
n   p = 1 p = 3 / 2   p = 1 p = 3 / 2
200  12.882 12.440 12.678 12.128 12.698
(14.863) (14.960) (15.083) (14.326) (13.601)
300  9.767 9.699 9.809 9.833 9.939
(11.067) (11.587) (11.414) (12.160) (10.339)
500  6.896 6.703 7.123 6.886 7.567
(7.873) (7.376) (8.257) (7.973) (8.373)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated