Generalized RIF Regressions

Javier Alejo; Antonio Galvao; Julián Martínez-Iriarte; Gabriel Montes-Rojas

doi:10.20944/preprints202501.2077.v1

Submitted:

27 January 2025

Posted:

28 January 2025

You are already at the latest version

Abstract

This paper suggests a generalization of covariates shifts to study distributional impacts on inequality and distributional measures. It builds on the recentered influence function (RIF) regression method designed for location shifts on covariates, but it extends to general policy interventions such as location-scale or asymmetric interventions. Numerical simulations for Gini, Theil and Atkinson indexes show it has a very good performance on a myriad of cases and distributional measures. An empirical application to studying changes in Mincerian equations illustrates the method.

Keywords:

Influence functions

;

RIF regression

;

Gini

;

Atkinson

;

counterfactual distributions

Subject:

Business, Economics and Management - Econometrics and Statistics

1. Introduction

The recentered influence function (RIF) regression has become a powerful tool for analyzing how changes in covariates impact the distributional statistics of an outcome variable. It is particularly valuable in the study of income distribution, inequality, and poverty. Introduced in the seminal paper by [1], this method consists of a regression on a suitable transformation of the outcome. The transformation is based on the influence function of the feature of the distribution of the outcome that is of interest. The influence function—a fundamental statistical concept that captures how individual observations affect unconditional statistics- is readily available from textbooks,1 making the approach very convenient and easy to implement. In a nutshell, by regressing the RIF on covariates, researchers can estimate the marginal effects of these covariates on various distributional measures, offering a flexible and insightful approach to understanding economic and social disparities.

RIF regression has been widely adopted in empirical research, particularly for studying unconditional quantiles. In this case, the parameter of interest is usualy referred to as the unconditional quantile effect (UQE), and the RIF regression is called unconditional quantile regression (UQR). Other distributional measures such as the Gini coefficient, the Theil index, and polarization indices can be readily implement as long as the influence function is known. The method has gained traction across disciplines, including labor economics, health economics, and public policy, where understanding the distributional effects of observed characteristics is essential. Subsequent research has further developed RIF-based methodologies. For instance, [3] provide an extensive derivation of influence functions for various distributional statistics, which serve as foundational elements in RIF models. Additionally, [4] extend RIF regression to Oaxaca-Blinder decomposition models, while further refinements have been made by [5] and [6].

The original RIF regression is in fact just one potential use of these effects. That is, the model of [1] is concerned with a location-shift effect in one covariate on the unconditional distribution of the variable of interest. In other words, it shifts only the location of distribution of that particular covariate (or a marginal shift in the probability for a binary regressor). However, in practice, a policy intervention not only refers to a uniform increase in the entire range of the covariate but could also represent a general change. A broader approach may consider changes in both location and scale, or asymmetric changes of the covariate. This paper proposes a simple and intuitive representation for these types of effects, using a parametric model. Our contribution is to study general parametric covariate shifts for the general RIF environment, thus proposing a simple application to any distributional functional. As a result we can evaluate any policy intervention that has any general functional form.

Different approaches have been recently developed for extending the location-only effects, mostly concerned with UQR. [7] consider this case as a way to transform the status quo population, although they do not consider the limiting/marginal case as in [1].2 They, however, do not develop the specific methodologies for general shifts in covariates and for different functionals. [9] consider location and scale shift effects to estimate UQR models and it provides a framework for general policy interventions. The estimation procedure, however, is not based on RIF regressions, but on maximum likelihood.3 Other work extending the UQR methods include, for instance, [10] for a two sample problem, [11] for the high dimensional case, and [12] consider the relationship between conditional quantiles and unconditional quantiles.

Our Monte Carlo simulations use as a distributional functional the Gini coefficient to illustrate how this method works. The simulations show that the proposed method works very well for many parametric functions of a shift. Finally, we propose an empirical application to study the effect on inequality of different policy interventions with different shifts. In particular, we study changes in education and experience in a Mincerian equation framework.

This paper is organized as follows. Section 2 reviews the RIF regression method and presents our proposed estimands for different shift covariate effects. This section also discusses some empirical examples where the proposed estimators can be used in the literature. Section 3 presents the corresponding estimators and discusses their asymptotic properties. Section 4 shows finite sample evidence on the estimator performance using Monte Carlo experiments. Section 5 applies the proposed estimators to the case of shifts in education and age in a Mincer equation. Finally, Section 6 concludes.

2. RIF Regression

2.1. RIF Regression Framework for Pure Location-Shifts

Consider a functional

v (\cdot)

defined on the distribution function

F_{Y}

of a random variable Y. Changes in v that measure the influence or impact of a given observation through

F_{Y}

are studied using the influence function (IF). More formally, the IF is the directional derivative of

v (F_{Y})

at

F_{Y}

and measures the effect of a small perturbation in

F_{Y}

. The IF can be calculated at each data point in the domain of Y. Let y be be such point, that contains a Dirac probability mass of

ϵ_{y}

, such that we define

I F (y; v; F_{Y}) = lim_{t \to 0} \frac{v [t ϵ_{y} + (1 - t) F_{Y}] - v (F_{Y})}{t} .

When

I F (\cdot; v; F_{Y})

is evaluated at Y, it becomes a random variable. Importantly, it has zero mean by construction, that is,

E_{Y} [I F (Y, v, F_{Y})] = 0

. For example, if

v (F_{Y}) = F_{Y}^{- 1} (τ)

, the

τ

-quantile of

F_{Y}

, then

I F (y; v; F_{Y}) = \frac{τ - 1 \{y \leq F_{Y}^{- 1} (τ)\}}{f_{Y} (F_{Y}^{- 1} (τ))} .

Note that

E [I F (Y; v; F_{Y})] = \frac{τ - E [1 \{Y \leq F_{Y}^{- 1} (τ)\}]}{f_{Y} (F_{Y}^{- 1} (τ))} = 0,

because

E [1 \{Y \leq F_{Y}^{- 1} (τ)\}] = F_{Y} (F_{Y}^{- 1} (τ)) = τ .

The appendix contains the influence function for the Gini, Atkinson, and Theil indices.

The recentered influence function (RIF) is simply defined as

R I F (Y, v, F_{Y}) = v (F_{Y}) + I F (Y, v, F_{Y}),

and by the previous result it satisfies

v (F_{Y}) = E_{Y} [R I F (Y, v, F_{Y})] .

This is a key property as it allows to implement the Law of Iterated Expectations and to work with conditional models. In particular, if we consider a set of covariates of interest X, then we have that

v (F_{Y}) = E \{E [R I F (Y, v, F_{Y}) ∣ X]\} .

(1)

[1] key result is based on modeling of the conditioning part,

m_{v} (x) \equiv E [R I F (Y, v, F_{Y}) ∣ X = x]

using standard regression tools, as this is referred to as RIF regression. Then, by the properties above, integrating over X links to the unconditional statistic of interest, i.e.

v (F_{Y}) = E_{X} [m_{v} (X)]

.

[1] focuses on marginal effects on covariates X, modeled as a location shift

X + δ

where

δ

is a small real valued perturbation. Changes in X have a corresponding effect on Y, the outcome of interest. To study this, consider a statistical tool built upon the IF. Define

F_{Y_{δ}}

as the distribution of Y after the perturbation in X. Then we consider a functional derivative given by

lim_{δ \to 0} \frac{v (F_{Y_{δ}}) - v (F_{Y})}{δ} = lim_{δ \to 0} \frac{E [m_{v} (X + δ)] - E [m_{v} (X)]}{δ} = E [lim_{δ \to 0} \frac{m_{v} (X + δ) - m_{v} (X)}{δ}] .

The last equality is valid if we assume that limit and expectation are interchangeable, which we assume throughout the paper. In fact, if we assume that

m_{v} (x)

is differentiable at any point x, then we obtain the main result of RIF regression models, that is,

Π_{v} \equiv lim_{δ \to 0} \frac{v (F_{Y_{δ}}) - v (F_{Y})}{δ} = E [\frac{\partial m_{v} (x)}{\partial x}] = E [\frac{\partial E [R I F (Y, v, F_{Y}) ∣ X = x]}{\partial x}] .

Π_{v}

measures the impact of a given location shift in covariates on the unconditional functional statistic v of Y. Statistically, the expression for

Π_{v}

, is called an average derivative, and is an object whose estimation has been thoroughly studied. An excellent textbook reference is chapter 4 of [13].

2.2. General Unconditional Effects

While [1] focus mainly on a location shift, one can consider a general shift on a given covariate,

X_{δ} = h (X, δ),

(2)

where

δ \geq 0

,

h (x, 0) = x

, and as a function of

δ

the function

h (X, δ)

is a continuously differentiable function. Then we consider general unconditional effects as

Π_{v, h} = lim_{δ \to 0} \frac{v (F_{Y_{δ}}) - v (F_{Y})}{δ}

(3)

for general h functions.

Here we index the unconditional effect by h, which denotes the type of counterfactual policy analyzed. Naturally, depending on the particular form of

h (X, δ)

we obtain different effects. Here are few examples:

Location shift This is the case developed above taken from [1] analysis that consider a location shift change in one covariate of the form

$X_{δ} = X + δ .$

(4)
Location-scale shift

$X_{δ} = \frac{X - μ_{X}}{s (δ)} + μ_{X} + ℓ (δ)$

(5)

where $δ \geq 0$ , $s (δ) > 0$ and $ℓ (δ)$ are continuously differentiable functions with $s (0) = 1$ and $ℓ (0) = 0$ , respectively. Here $s (δ)$ acts as a shrinking parameter such that an increment in this parameter reduces the overall impact of the X variable. The pure location shift in [1] can be obtained by setting $s (δ) = 1$ and $ℓ (δ) = δ$ . [9] have a similar shift written in a different manner. In fact different alternatives can be developed based on how to combine the location and scale joint shifts.
Asymmetric shift

$X_{δ} = X + a (δ) {(X_{m a x} - X)}^{λ}$

(6)

where $δ \geq 0$ , and the map $δ \mapsto a (δ)$ satisfies: $a (δ) > 0$ , $a (0) = 0$ , and is continuously differentiable. The factor $X_{m a x}$ is maximum value in the support of X or an upper bound. The parameter $λ$ determines the asymmetry of the shift effecs: if $λ < 0$ then the shift is biased towards upper values, if $λ > 0$ the shift is biased towards lower values, if $λ = 0$ this would be a pure location-shift. Although this type of shift was applied as a numerical simulation exercise in [14], the novelty of our proposal is to include it analytically within the RIF regression strategy.

From the definition of RIF, one can compute the parameter of interest in equation for the three cases described previously. Let

m_{v} (x) = E [R I F (Y, v) | X = x]

for a given functional

v = v (F_{Y})

. Using the same derivation strategy as in Section 2.1 above, it is straightforward to compute that the estimand of interest is

\begin{matrix} Π_{v, h} & = E [\frac{\partial m_{v} (x)}{\partial x} |_{x = X} \frac{\partial h (X, δ)}{\partial δ} |_{δ = 0}] . \end{matrix}

(7)

The estimand is the product of the derivative of two functions. First the

m_{v} (x)

function, which depends on the functional of interest, and second, of

h (x, δ)

which depends on the type of counterfactual policy. The formula in (7) can then be applied to each particular effect that the user is trying to analyze.

Location shift

$Π_{v, h} = E [\frac{\partial m_{v} (x)}{\partial x} |_{x = X}] .$

This is indeed the estimand of [1] and the most popular amongst RIF regression empirical applications.
Location-scale shift

$Π_{v, h} = E [\frac{\partial m_{v} (x)}{\partial x} |_{x = X} (- (X - μ_{X}) \frac{s^{'} (0)}{s^{2} (0)} + ℓ^{'} (0))] .$

As in [9] we can define the location shift effect as

$Π_{v}^{L} : = E [\frac{\partial m_{v} (x)}{\partial x} |_{x = X} ℓ^{'} (0)],$

which corresponds to the [1] RIF regression coefficient representation, and the scale effect is

$Π_{v}^{S} : = E {[- \frac{\partial m_{v} (x)}{\partial x}|}_{x = X} (X - μ_{X}) \frac{s^{'} (0)}{s^{2} (0)}],$

such that $Π_{v, h} = Π_{v}^{L} + Π_{v}^{S}$ .
Asymmetric shift

$Π_{v, h} = E [\frac{\partial m_{v} (x)}{\partial x} |_{x = X} a^{'} (0) {(X_{m a x} - X)}^{λ}] .$

(8)

2.3. Empirical Examples to Motivate the Estimands

Here we discuss some empirical examples where these models could be used.

Effect of increasing education on wage inequality. In a Mincer equation, log wages are modeled as a function of certain observable covariates such as years of education. Changes in education levels have different effects on income inequality and a positive shift may result in augmenting it, the so-called `paradox of progress’ (see [15]). A study of the effect of a shift in education on wage inequality could be implemented using our proposed framework. We can accommodate a counterfactual policy experiment where there may be not only a general increase in the education level but also a change in its dispersion or an asymmetric shift towards the highest possible level of education.

Smoking and birth weight. Consider a cigarrete consumption tax. Under some assumptions, the consumption X will be reduced to

X / (1 + δ)

for some

δ

that depends on the elasticities of the supply and demand. One might wonder about the final impact of this tax policy on the distribution of birth weights. This is analyzed in [9].

Wage controls and earnings distribution [16] studies the effect of a more uniform (less dispersion) distribution of wage control brackets on the distributions of earnings, in the context of a policy that was implemented during War World II.

3. Generalized RIF Estimator

The proposed estimands can be estimated using the RIF regression method. This is implememented in three steps.

First, we need to specify a model for the

m_{v} (x)

function, that is, the relationship between the RIF of the v functional together with an ad-hoc model on how it relates to the covariates, i.e.,

\frac{\partial m_{v} (x)}{\partial x}

. Let

\hat{\frac{\partial m_{v} (x)}{\partial x}}

be such estimator. Assume a parametric model

\frac{\partial m_{v} (x)}{\partial x} \equiv g_{1} (x, η_{0})

for some parameter

η

with true value at

η_{0}

. The estimator is then

g_{1} (x_{i}, \hat{η})

, where

\hat{η})

is a consistent estimator of

η

. [1] have several alternatives, for which RIF OLS and RIF Logit are the most commonly used. Similar procedures can be applied to any functional as it is usually done in the applied literature. In turn, this requires that the proposed model is an appropriate representation of the true model.

η

can then be interpreted as the parameters from the RIF regression model and

\hat{η}

as the corresponding estimator.

For example, a third degree polynomial is a popular choice:

m_{v} (x) = η_{0} + η_{1} x + η_{2} x^{2} + η_{3} x^{3} .

(9)

Second, depending on the proposed

h (X, δ)

model different alternatives arise as show in Section 2.2. This depends on the policy intervention of interest and the associated shift in covariates. Then, we can compute

h_{δ}^{'} (x, 0)

for each case. This derivative may also include estimated parameters. Let

\hat{h_{δ}^{'} (x, 0)}

be such estimator. Assume a parametric model

h_{δ}^{'} (x, 0) \equiv g_{2} (x, γ_{0})

for some parameter

γ

with true value at

γ_{0}

and that

\hat{γ}

is a consistent estimator of

γ

. Thus, the estimator is

g_{2} (x_{i}, \hat{γ})

.

Third, the unconditional effects can then be obtained by sample averages of the product formed by the two elements in the last paragraphs.

{\hat{Π}}_{v, h} = \frac{1}{n} \sum_{i = 1}^{n} \hat{\frac{\partial m_{v} (x_{i})}{\partial x}} \hat{h_{δ}^{'} (x_{i}, 0)} = \frac{1}{n} \sum_{i = 1}^{n} g_{1} (x_{i}, \hat{η}) g_{2} (x_{i}, \hat{γ}) .

(10)

Now define

g (x, α) \equiv g_{1} (x, η) g_{2} (x, γ)

for

α = (η, γ)

. Consistency of

{\hat{Π}}_{v, h}

will follow from a uniform law of large numbers over

α

, and the correct specification of the parametric model for

m_{v} (x)

. For a uniform law of large numbers, sufficient conditions can be found in Lemma 4.3 in Newey and McFadden [17].

Assumption A1.

Assume that

\frac{\partial m_{v} (x)}{\partial x} |_{x = X} h_{δ}^{'} (X, 0) = g (X, α_{0})

is continuous at

α_{0}

with probability one, and there is a neighborhood

{\tilde{A}}_{0}

of

α_{0}

such that

E [{sup}_{α \in {\tilde{A}}_{0}} ∥ g (X, α) ∥] < \infty

.

The correct specification of the model for

m_{v} (x)

is quite challenging. Indeed, it is unlikely that a parametric specification is the correct one. However, one can view an specification like the one in (9) as a series approximation to the true model. A rigourous treatment is provided in [1] for the case

h_{δ}^{'} (X, 0) \equiv 1

.

Deriving the asymptotic normality of our estimators is more complex and requires a more intricate analysis. If the influence function were observed, it would exhibit the standard parametric asymptotic convergence rate, i.e.,

\sqrt{n}

, as this follows from a standard OLS (or Logit) analysis. However, in most cases, the IF must be estimated and often involves nonparametric components, such as densities. This necessitates a case-by-case asymptotic analysis for different estimators. As noted by Firpo et al. [1], p.962, “because the density is nonparametrically estimated by kernel methods, the rate of convergence of the three estimators will be dominated by this slower term.” [5] derive the asymptotic distribution for the functionals used here (Gini, Theil, Atkinson), and the asymptotic distribution of our proposed estimators can, in principle, be obtained via the delta method. For the UQR case, the Supplemental Appendix in [1] provides a detailed example for the location-shift effect, which can be extended to any shift function of X. A specific extension is developed by [9], albeit without relying on RIF methods.

Thus, while the asymptotic normality of our proposed estimators can be derived, it involves additional technical challenges, and we leave this as an avenue for further research. In our implementation, we conduct inference using the wild bootstrap, a common approach in RIF regression methods.

4. Monte Carlo Experiments

Consider the Gini coefficient as our distributional target of interest v and the following data generating process (DGP):

Y_{i} = 3 + 2 X_{i} + (1 + θ X_{i}) u_{i}

(11)

with

i = 1, . . ., n

where

X_{i} \sim N (6, 1)

and

u_{i}

is a random variable that is independent of

X_{i}

. The chosen value parameters will exclude

y < 0

. See the Appendix for the details on the Gini index computation together with the RIF.

Then for each DGP we compute

Π_{v}^{δ}

by a simulation with

n = 10 m i l l i o n

sample size. In all cases, we replace

X_{δ}

in eq. (11) thus obtaining

Y_{δ}

and computing numerically

Π_{v} = [v (F_{Y_{δ}}) - v (F_{Y})] / δ

for

δ = 0.0001

.

We consider the following cases:

Pure location shift: $s (δ) = 1$ and $ℓ (δ) = δ$ .
Pure scale shift: $s (δ) = 1 + δ$ and $ℓ (δ) = 0$ .
Location-Scale shift: $s (δ) = 1 + δ$ and $ℓ (δ) = δ$ .
Asymmetric shift: $a (δ) = δ$ and $λ = {- 0.5, 0, 0.5}$ . Here we set $X_{m a x} = 11.31$ such that no values in the simulations will exceed this.

For the RIF regression we use a third degree polynomial such that

m_{v} (x) = η_{0} + η_{1} x + η_{2} x^{2} + η_{3} x^{3}

thus,

\frac{\partial m_{v}}{\partial x} (x_{i}) = η_{1} + 2 η_{2} x_{i} + 3 η_{3} x_{i}^{2}

for

i = 1, . . ., n

.

Then, for the location and scale shifts we have

{\hat{Π}}_{v}^{L} = n^{- 1} \sum_{i = 1}^{n} \hat{\frac{\partial m_{v}}{\partial x} (x_{i})} = {\hat{η}}_{1} + 2 {\hat{η}}_{2} \bar{x} + 3 {\hat{η}}_{3} \bar{x^{2}},

and

{\hat{Π}}_{v}^{S} = n^{- 1} \sum_{i = 1}^{n} - (x_{i} - \bar{x}) \hat{\frac{\partial m_{v}}{\partial x} (x_{i})},

with

\bar{x} = n^{- 1} \sum_{i = 1}^{n} x_{i}

;

\bar{x^{2}} = n^{- 1} \sum_{i = 1}^{n} x_{i}^{2}

and the

η

parameters are estimated by OLS as in the RIF regression method.

Then, for the asymmetric shift we use

{\hat{Π}}_{v} = n^{- 1} \sum_{i = 1}^{n} {(X_{m a x} - x_{i})}^{λ} \hat{\frac{\partial m_{v}}{\partial x} (x_{i})} .

For the simulations we consider the cases of

u_{i} \sim N (0, 1)

(Table 1) and

u_{i} \sim (χ_{1}^{2} - 1) / 2

(Table 2), using

θ \in {0, 3}

. All DGPs and simulation exercises show that the proposed implementation works. Bias and variance reduce monotonically as the sample size increase. Note that, as expected, the case of asymmetric shift with

λ = 0

coincides with the location-shift effect. Moreover, the method works for all shifts models.

5. Empirical Application

This section presents an empirical application. We use an extract from the Merged Outgoing Rotation Group of the Current Population Survey of 1983, 1984 and 1985 for males only. More details about the data can be found in [18]. The variable of interest is Y, the hourly wage, and the covariates X are an indicator of whether the individual is unionized, years of education, whether he is married, non-white and his experience. We use a cubic specification in all continuous covariates to give some flexibility to the RIF model.

The studied effects correspond to education (Table 3) and experience (Table 4) in order to simulate different interventions and their potential effects on measures of inequality. We consider the estimators of the location, scale and asymmetric effects discussed above for the Gini, Theil and Atkinson indexes (see the Appendix for more details on each index). All indexes are multiplied by 100 and all effects correspond to a change in education/experience on the corresponding shift. Figure 1 and Figure 2 shows the modeled shift for each year of education/experience depending on the value chosen for

λ

.

Consider first the effect of education in Table 3. The location effect is positive while the scale effect is negative for all cases. This determines that while a positive shift in education levels has resulted in augmenting inequality (this is also called the `paradox of progress’ by different authors), a reduction in the overall impact of education has a decreasing effect on inequality. In fact, the location-scale joint effect is negative, thus reducing inequality. In the case of the asymmetric shift, note that hourly wages distribution is more equalitarian when the shift is biased towards lower educational levels (

λ > 0

), as long as inequality is measured using the Gini or Theil coefficient. However, if the index used is more inequality-averse (Atkinson(1) and Atkinson(2)), then all effects are unequalizing.

Consider now the effect of experience in Table 4. In this case, the effects are clearly negative for both location and scale effects. In fact, in all cases there is a reducing effect on inequality measured by all indices. In the case of asymmetric change, the equalizing effect is greater when the increase in experience is biased towards lower levels (

λ > 0

).

6. Concluding Remarks

This paper proposes a simple generalization of covariates shifts to study distributional impacts on inequality and distributional measures. It builds on RIF regression methods designed for location shifts on covariates as in [1], and it extends its use using a similar framework of [9]. The simulations show it has a very good performance on a myriad of cases.

This line of research can be extended in several directions. First, more accurate asymptotic expansions can be obtained if the asymptotic distribution of functionals estimators of the influence functions are known. Moreover, different approximation and numerical methods can be used to estimate them when there is no analytical solution. Second, calibrated covariates shifts can be considered to match empirical policy interventions. For instance, any targeted policy intervention can be mapped into a counterfactual distribution of covariates, from which we can apply our proposed method.

Funding

This research received no external funding.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Functionals and Their RIF

Appendix A.1. Gini Index

The Gini coefficient is defined as

v^{Gini} = 1 - 2 \frac{R (F_{Y})}{μ},

where

μ = E (Y)

and

R (F_{Y}) = \int_{- \infty}^{\infty} \int_{- \infty}^{y} z d F_{Y} (z) d F_{Y} (y)

. Note that if we define

p = F_{Y} (z)

then

R (F_{Y}) = \int_{0}^{1} G L (p, F_{y}) d p

, that is, the term

R (F_{Y}) / μ

acquires the classical interpretation of the area under the Lorenz Curve given by the expression

G L (p, F_{Y}) = \int_{- \infty}^{F_{Y}^{- 1} (p)} z d F_{Y} (z)

.

Following [6], and after some algebra, the RIF for the Gini index

v^{Gini}

can be written as:

R I F (y, v^{Gini}, F_{Y}) = 1 + (\frac{1 - v^{Gini}}{μ}) y + \frac{2}{μ} [y (F_{Y} (y) - 1) - G L (F_{Y} (y), F_{Y})]

Appendix Sample Estimator

Assume a random sample

y_{1}, . . ., y_{n}

, we use a plug-in estimator of the previous equation. First, following [19], consider the observations ordered from smallest to largest

y_{1} \leq y_{2} \leq . . . \leq y_{n}

, then

{\hat{v}}^{Gini} = 1 + \frac{1}{n} - \frac{2}{μ n^{2}} \sum_{i = 1}^{n} y_{i} (n + 1 - i)

Finally, the rest of the components are estimated as follows

\hat{μ} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

{\hat{F}}_{Y} (y) = \frac{1}{n} \sum_{i = 1}^{n} 1 (y_{i} \leq y)

{\hat{G L}}_{Y} (y) = \frac{1}{n} \sum_{i = 1}^{n} y_{i} 1 (y_{i} \leq y)

Appendix Theil Index

The Theil index is defined as

v^{Theil} = \frac{η}{μ} - l n (μ)

where

η = \int_{- \infty}^{\infty} y ln (y) d F_{Y} (y)

and

μ = E (Y)

.

Using this notation, the RIF of the Theil is:

R I F (y, v^{Theil}, F_{Y}) = v^{Theil} + \frac{y ln (y) - η}{μ} - (\frac{η + μ}{μ^{2}}) (y - μ)

Appendix Sample Estimator

Assume a random sample

y_{1}, . . ., y_{n}

, the statistics to implement the RIF as a plug-in estimator are:

\hat{η} = \frac{1}{n} \sum_{i = 1}^{n} y_{i} ln (y_{i})

\hat{μ} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

Appendix Atkinson Index

The Atkinson index is defined as

v^{Atk} (ϵ) = \{\begin{matrix} 1 - \frac{κ {(ϵ)}^{\frac{1}{1 - ϵ}}}{μ} & if & ϵ \neq 1; \\ 1 - \frac{e^{κ (ϵ)}}{μ} & if & ϵ = 1, \end{matrix}

where

μ = E (Y)

,

κ (ϵ) = \int_{- \infty}^{\infty} y^{(1 - ϵ) ω} ln {(y)}^{(1 - ω)} d F_{Y} (y)

,

ω = 1 (ϵ \neq 1)

and

ϵ

is an inequality aversion parameter. The parameter

ϵ

defines the type of social preferences

W = W (y_{1}, . . ., y_{n})

and is a value set by the researcher. The two extreme cases are

ϵ = 0

, when the inequality index reflects a utilitarian welfare function (

W = μ

) while if

ϵ \to \infty

it is a Rawlsian function (W is a Leontief type function).

The RIF for the Atkinson index is:

R I F (y, v^{Atk} (ϵ), F_{Y}) = v^{Atk} (ϵ) + A (ϵ, y) + B (ϵ) (y - μ)

where

A (ϵ, y) = \{\begin{matrix} \frac{κ^{\frac{ϵ}{1 - ϵ}}}{(ϵ - 1) μ} (y^{1 - ϵ} - κ) & if & ϵ \neq 1; \\ - \frac{e^{κ}}{μ} (ln (y) - κ) & if & ϵ = 1, \end{matrix}

and

B (ϵ) = \{\begin{matrix} \frac{κ^{\frac{1}{1 - ϵ}}}{μ^{2}} & if & ϵ \neq 1; \\ \frac{e^{κ}}{μ^{2}} & if & ϵ = 1, \end{matrix}

Appendix Sample Estimator

Assume a random sample

y_{1}, . . ., y_{n}

, the statistics to implement the RIF as a plug-in estimator are:

\hat{μ} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

\hat{k} (ϵ) = \frac{1}{n} \sum_{i = 1}^{n} y_{i}^{(1 - ϵ) ω} ln {(y_{i})}^{(1 - ω)}

where

ω = 1 (ϵ \neq 1)

.

References

Firpo, S.; Fortin, N.; Lemieux, T. Unconditional quantile regression. Econometrica 2009, 77(3), 953–973. [Google Scholar]
van der Vaart, A. Asymptotic Statistics; Cambridge University Press: Cambridge, 1998. [Google Scholar]
Essama-Nssah, B.; Lambert, P. Chapter 6. Influence Functions for Policy Impact Analysis. In Inequality, Mobility and Segregation: Essays in Honor of Jacques Silber Research on Economic Inequality; 2015; Vol. 20, pp. 135–159. [Google Scholar]
Fortin, N.; Lemieux, T.; Firpo, S. Ashenfelter, O., Card, D., Eds.; Decomposition methods in economics. In Handbook of Labor Economics; Elsevier: Amsterdam, 2011. [Google Scholar]
Firpo, S.; Pinto, C. Identification and Estimation of Distributional Impacts of Interventions Using Changes in Inequality Measures. Journal of Applied Econometrics 2016, 31(3), 457–486. [Google Scholar] [CrossRef]
Firpo, S.; Fortin, N.; Lemieux, T. Decomposing Wage Distributions Using Recentered Influence Function Regressions. Econometrics 2018, 6(3), 41. [Google Scholar] [CrossRef]
Chernozhukov, V.; Fernández-Val, I.; Melly, B. Inference on counterfactual distributions. Econometrica 2013, 81(6), 2205–2268. [Google Scholar] [CrossRef]
Martinez-Iriarte, J. Sensitivity Analysis in Unconditional Quantile Effects. Working Paper.
Martínez-Iriarte, J.; Montes-Rojas, G.; Sun, Y. Unconditional Effects of General Policy Interventions. Journal of Econometrics 2024, 238(2), 105570. [Google Scholar] [CrossRef]
Inoue, A.; Li, T.; Xu, Q. Two Sample Unconditional Quantile Effect. ARXIV. Available online: https://arxiv.org/pdf/2105.09445.pdf.
Sasaki, Y.; Ura, T.; Zhang, Y. Unconditional quantile regression with high-dimensional data. Quantitative Economics 2022, 13, 955–978. [Google Scholar] [CrossRef]
Alejo, J.; Galvao, A.F.; Martinez-Iriarte, J.; Montes-Rojas, G. Unconditional Quantile Partial Effects via Conditional Quantile Regression. Journal of Econometrics 2024, 105678. [Google Scholar] [CrossRef]
Pagan, A.; Ullah, A. Nonparametric Econometrics 1999.
Battiston, D.; Garcia-Domench, C.; Gasparini, L. Could an Increase in Education raise Income Inequality? Evidence for Latin America. Latin American Journal of Economics 2014, 51, 1–39. [Google Scholar] [CrossRef]
Bourguignon, F.; Lustig, N.; Ferreira, F. The Microeconomics of Income Distribution Dynamics; Oxford University Press: Washington, 2024. [Google Scholar]
Vickers, C.; Ziebarth, N.L. The Effects of the National War Labor Board on Labor Income Inequality. Working Paper.
Newey, W.K.; McFadden, D. Engle, R.F., McFadden, D.L., Eds.; Large Sample Estimation and Hypothesis Testing. In Handbook of Econometrics; chapter 36; Elsevier, 1994; Vol. 4, pp. 2111–2245. [Google Scholar] [CrossRef]
Lemieux, T. Increasing residual wage inequality: Composition effects, noisy data, or rising demand for skill? American Economic Review 2006, 96(3), 461–498. [Google Scholar] [CrossRef]
Lambert, P. The distribution and redistribution of income; Manchester University Press: Manchester, 2001. [Google Scholar]

1	For some theory, see [2].
2	[8] develops a sensitivity analysis procedure that considers both the marginal and non-marginal (global) effects on unconditional quantiles when covariates are discrete.
3	Another interesting aspect of UQEs is that there is a variety of methods to estimate them. Indeed, [1] rigorously derive three methods.

Figure 1. Asymmetric shift, education.

Figure 2. Asymmetric shift, experience.

Table 1.

v =

Gini index;

u_{i} \sim N (0, 1)

.

Table 1.

v =

Gini index;

u_{i} \sim N (0, 1)

.

Effect	n	$θ = 0$			$θ = 0.3$
Effect	n	Bias	Var	MSE	Bias	Var	MSE
	50	-0.0044	0.7563	0.7564	-0.1031	2.9639	2.9745
	100	0.0081	0.3482	0.3483	-0.0416	1.3842	1.3860
Location	500	-0.0086	0.0547	0.0548	-0.0260	0.2176	0.2183
	1000	-0.0066	0.0299	0.0299	-0.0184	0.1153	0.1157
	5000	-0.0005	0.0062	0.0062	0.0020	0.0252	0.0252
	50	-0.1141	2.2196	2.2326	0.0156	6.9623	6.9625
	100	-0.1114	1.0436	1.0560	0.0144	3.3553	3.3555
Scale	500	-0.0601	0.2052	0.2088	-0.0307	0.6320	0.6329
	1000	-0.0418	0.1078	0.1096	-0.0195	0.3407	0.3411
	5000	-0.0088	0.0205	0.0206	-0.0039	0.0606	0.0606
	50	-0.1185	3.5751	3.5891	-0.0876	10.4689	10.4766
	100	-0.1033	1.7181	1.7288	-0.0273	5.2379	5.2387
Both	500	-0.0688	0.2923	0.2970	-0.0568	0.8887	0.8920
	1000	-0.0485	0.1612	0.1635	-0.0379	0.4704	0.4718
	5000	-0.0094	0.0324	0.0325	-0.0019	0.0882	0.0883
	50	0.0048	0.1453	0.1453	-0.0455	0.5998	0.6019
	100	0.0112	0.0658	0.0659	-0.0179	0.2760	0.2763
Asymmetric	500	0.0016	0.0107	0.0107	-0.0098	0.0439	0.0440
( $λ = - 0.5$ )	1000	0.0019	0.0057	0.0057	-0.0066	0.0235	0.0236
	5000	0.0033	0.0012	0.0012	0.0018	0.0052	0.0052
	50	-0.0044	0.7563	0.7564	-0.1031	2.9639	2.9745
	100	0.0081	0.3482	0.3483	-0.0416	1.3842	1.3860
Asymmetric	500	-0.0086	0.0547	0.0548	-0.0260	0.2176	0.2183
( $λ = 0.0$ )	1000	-0.0066	0.0299	0.0299	-0.0184	0.1153	0.1157
	5000	-0.0005	0.0062	0.0062	0.0020	0.0252	0.0252
	50	-0.0365	4.3397	4.3410	-0.2329	16.0205	16.0748
	100	-0.0086	2.0279	2.0280	-0.0939	7.6135	7.6223
Asymmetric	500	-0.0350	0.3114	0.3127	-0.0665	1.1873	1.1917
( $λ = 0.5$ )	1000	-0.0270	0.1729	0.1737	-0.0473	0.6259	0.6281
	5000	-0.0059	0.0362	0.0363	0.0031	0.1357	0.1357

Note: calculations based on 1000 Monte Carlo experiments, Gini index is multiplied by 100.

Table 2.

v =

Gini index;

u_{i} \sim (χ_{1}^{2} - 1) / \sqrt{2}

.

Table 2.

v =

Gini index;

u_{i} \sim (χ_{1}^{2} - 1) / \sqrt{2}

.

Effect	n	$θ = 0$			$θ = 0.3$
Effect	n	Bias	Var	MSE	Bias	Var	MSE
	50	0.0126	0.7075	0.7077	0.0170	3.8324	3.8327
	100	0.0100	0.2943	0.2944	0.0197	1.6137	1.6140
Location	500	0.0077	0.0566	0.0567	0.0074	0.3119	0.3119
	1000	0.0036	0.0266	0.0266	0.0085	0.1438	0.1439
	5000	0.0002	0.0057	0.0057	-0.0035	0.0298	0.0299
	50	-0.1601	1.8447	1.8704	-0.0454	7.8484	7.8505
	100	-0.1201	0.9508	0.9652	-0.0340	3.9788	3.9800
Scale	500	-0.0499	0.2079	0.2104	-0.0449	0.8343	0.8364
	1000	-0.0381	0.0933	0.0947	0.0000	0.3934	0.3934
	5000	-0.0040	0.0209	0.0209	0.0034	0.0845	0.0845
	50	-0.1477	2.2522	2.2741	-0.0286	5.7489	5.7497
	100	-0.1104	1.2147	1.2268	-0.0144	3.1012	3.1014
Both	500	-0.0424	0.2608	0.2626	-0.0375	0.6949	0.6963
	1000	-0.0347	0.1159	0.1171	0.0084	0.3021	0.3022
	5000	-0.0040	0.0265	0.0265	-0.0002	0.0645	0.0645
	50	0.0137	0.1493	0.1495	0.0089	0.8879	0.8880
	100	0.0118	0.0617	0.0618	0.0109	0.3816	0.3817
Asymmetric	500	0.0084	0.0118	0.0119	0.0065	0.0728	0.0729
( $λ = - 0.5$ )	1000	0.0063	0.0056	0.0056	0.0048	0.0342	0.0343
	5000	0.0036	0.0012	0.0012	0.0000	0.0071	0.0071
	50	0.0126	0.7075	0.7077	0.0170	3.8324	3.8327
	100	0.0100	0.2943	0.2944	0.0197	1.6137	1.6140
Asymmetric	500	0.0077	0.0566	0.0567	0.0074	0.3119	0.3119
( $λ = 0.0$ )	1000	0.0036	0.0266	0.0266	0.0085	0.1438	0.1439
	5000	0.0002	0.0057	0.0057	-0.0035	0.0298	0.0299
	50	-0.0075	3.6517	3.6517	0.0299	17.5340	17.5348
	100	-0.0053	1.5628	1.5629	0.0385	7.3357	7.3372
Asymmetric	500	0.0035	0.3033	0.3033	0.0060	1.4437	1.4437
( $λ = 0.5$ )	1000	-0.0031	0.1410	0.1410	0.0189	0.6523	0.6526
	5000	-0.0038	0.0307	0.0307	-0.0088	0.1349	0.1349

Note: calculations based on 1000 Monte Carlo experiments, Gini index is multiplied by 100.

Table 3. Education.

Effect	Gini	Theil	Atkinson(1)	Atkinson(2)
Location	0.6086***	0.5699***	0.6282***	1.2625***
Location	(0.0173)	(0.0218)	(0.0151)	(0.0233)
Scale	-4.8003***	-4.7931***	-4.1004***	-6.4685***
Scale	(0.0824)	(0.1074)	(0.0714)	(0.1047)
Both	-4.1918***	-4.2232***	-3.4722***	-5.2060***
Both	(0.0777)	(0.1019)	(0.0681)	(0.1031)
Asymmetric ( $λ = - 0.5$ )	0.3303***	0.3111***	0.3347***	0.6600***
Asymmetric ( $λ = - 0.5$ )	(0.0086)	(0.0109)	(0.0075)	(0.0115)
Asymmetric ( $λ = 0$ )	0.6086***	0.5699***	0.6282***	1.2625***
Asymmetric ( $λ = 0$ )	(0.0173)	(0.0218)	(0.0151)	(0.0233)
Asymmetric ( $λ = 0.5$ )	-0.1681***	-0.2530***	0.0941***	0.7458***
Asymmetric ( $λ = 0.5$ )	(0.0383)	(0.0494)	(0.0343)	(0.0563)

Notes: the sample size is 266956 observations; bootstrap standard errors (500 replications) in parentheses, *** for p<0.01, ** for p<0.05 and * for p<0.10.

Table 4. Experience.

Effect	Gini	Theil	Atkinson(1)	Atkinson(2)
Location	-0.4025***	-0.3963***	-0.3291***	-0.4686***
Location	(0.0059)	(0.0068)	(0.0053)	(0.0092)
Scale	-4.5253***	-4.5701***	-3.7068***	-5.2172***
Scale	(0.0929)	(0.1259)	(0.0826)	(0.1267)
Both	-4.9278***	-4.9664***	-4.0358***	-5.6858***
Both	(0.0959)	(0.1292)	(0.0853)	(0.1314)
Asymmetric ( $λ = - 0.5$ )	-0.0620***	-0.0607***	-0.0512***	-0.0744***
Asymmetric ( $λ = - 0.5$ )	(0.0010)	(0.0011)	(0.0009)	(0.0015)
Asymmetric ( $λ = 0$ )	-0.4025***	-0.3963***	-0.3291***	-0.4686***
Asymmetric ( $λ = 0$ )	(0.0059)	(0.0068)	(0.0053)	(0.0092)
Asymmetric ( $λ = 0.5$ )	-2.8433***	-2.8099***	-2.3207***	-3.2877***
Asymmetric ( $λ = 0.5$ )	(0.0410)	(0.0482)	(0.0370)	(0.0626)

Notes: the sample size is 266956 observations; bootstrap standard errors (500 replications) in parentheses, *** for p<0.01, ** for p<0.05 and * for p<0.10.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Generalized RIF Regressions

Abstract

Keywords:

Subject:

1. Introduction

2. RIF Regression

2.1. RIF Regression Framework for Pure Location-Shifts

2.2. General Unconditional Effects

2.3. Empirical Examples to Motivate the Estimands

3. Generalized RIF Estimator

4. Monte Carlo Experiments

5. Empirical Application

6. Concluding Remarks

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Functionals and Their RIF

Appendix A.1. Gini Index

Appendix Sample Estimator

Appendix Theil Index

Appendix Sample Estimator

Appendix Atkinson Index

Appendix Sample Estimator

References

MDPI Initiatives

Important Links

Subscribe