Taking Stock of Some Recent and Notable Contribution to Research in Portfolio Analysis

In this paper we provide a highly selected review and synthesis on some of the recent and notable contribution to research in portfolio analysis. A unique perspective on this development in the literature is offered in this paper by judiciously identifying a few sample eigenvalues adjustment patterns in a portfolio that leads to an improvement in the out-of-sample portfolio Sharpe ratio when the population covariance matrix admits a high-dimensional factor model. These patterns unveil a key insight into a portfolio performance improvement and shed an important light on the effectiveness of a few recently introduced ”robust to estimation errors” covariance matrix estimation approaches, which were not originally designed with the goal to improve the out-of-sample portfolio performance.


Introduction
Portfolio theory has remained a fascinating research topic since the publication by Markowitz (1952). However the problems of estimation error in expected returns and covariance matrix estimators have impeded an effective implementation of this theory, as has been pointed out by numerous authors, such as Merton (1980); Jagannathan and Ma (2003); Kempf and Memmel (2006); Fan et al. (2008); DeMiguel et al. (2009b) among others. Despite this difficulty, considerable progress has been made steadily in improving the quality of estimators for an expected return vector and a covariance matrix of the portfolio. Instead of simply ignoring the expected return vector and focussing on the minimum-variance portfolios 1 , there are a few methods recently introduced in the literature, which seem to work better than those obtained from using a sample estimator for the expected return vector, e.g., Bayesian approach (Jorion (1986); MacKinlay and Pástor (2000)), equilibrium expected returns approach (Black and Litterman (1990)), robust portfolio approach which incorporates estimation errors (Ceria and Stubbs (2006)), etc. In addition, researchers trying to explain the cross-sectional asset returns also have uncovered a considerable amount of pricing anomalies (see Harvey et al. (2016) for a review), and such anomalies, in turn, have enabled investors to find reasonably good proxies for the expected returns on the assets in the portfolio.
Unlike expected returns covariance matrices are usually estimated from the price history, although the quality of the sample covariance matrix can be seriously compromised as the number of assets in the portfolio becomes increasingly large relative to the sample size. A few well-known approaches to address this issue have focussed on an adjustment of the sample covariance matrix, or more specifically, an adjustment of sample eigenvalues. 2 Well-known examples of this include a "shrinkage towards identity" estimator (Ledoit and Wolf (2004)), a more general linear shrinkage estimator (Bodnar et al. (2014)), a "nonlinear shrinkage" estimator ), and a spectral cut-off method (Carrasco and Noumon (2011)). All of these covariance estimators reconstructed from adjusted sample eigenvalues. When they are used to build an optimized port-1 The minimum-variance portfolios implicitly exploit risk-based pricing anomalies (Scherer (2010)) and have been noted for surprisingly high returns and low realized volatilities in both the US market and the global market (Jagannathan and Ma (2003); Clarke et al. (2006)). 2 There are other methods that have been proposed to mitigate the adverse effect of estimation errors in the covariance matrix estimator, which, for instance, focus on constraining portfolio weights (Jagannathan and Ma (2003); DeMiguel et al. (2009a); Fan et al. (2012)), imposing factor structure (Fan et al. (2008(Fan et al. ( , 2011(Fan et al. ( , 2013), shrinking to a target portfolio (Bodnar et al. (2018)), etc. folio, they have been shown empirically to be able to boost the portfolio's out-of-sample Sharpe ratio. However it is important to emphasize that not all of these improved estimators were designed originally to achieve this goal. Notably, Ledoit and Wolf (2004) and Bodnar et al. (2014) minimize the expected Frobenius loss; Carrasco and Noumon (2011) use the expected utility of a mean-variance investor as the objective function. Since the improvement in these objective functions does not necessarily lead to a higher out-of-sample Sharpe ratio, it is not clear at first glance why these methods can lead to an enhanced portfolio performance.
To take stock of the relatively recent and notable contribution to research on this issue in the literature, we investigate how to judiciously adjust sample eigenvalues for the purpose of improving the out-of-sample portfolio Sharpe ratio when the underlying theoretical model for asset returns is a high-dimensional factor model. The latter has become increasingly popular in the literature (Fama and French (1993); Bai and Ng (2002); Fan et al. (2013)). Later it will be seen that how our findings can be used to provide a novel synthesis and perpsective on some of these existing methods. Specifically our focus of analysis in this paper centers on mitigating estimation errors and improving portfolio performance in at least three important aspects.
First, while in the literature most improved covariance matrix or portfolio weight estimators optimize a particular objective function within a certain class of candidate covariance/weight estimators 3 , we focus our analyses in this paper on certain adjustment patterns which, when applied to the sample eigenvalues, could lead to a concrete marginal improvement 4 in terms of the out-ofsample portfolio Sharpe ratio as both the sample size and the portfolio size are sufficiently large. This pattern, if it is suitably identified, can fruitfully shed an important light on the key factors for improving a portfolio's out-of-sample performance. More importantly each realization of the sample covariance matrix benefits, in terms of the out-of-sample portfolio Sharpe ratio calculated in the two-step approach, from such an adjustment. It is important to stress that the motivation for our particular focus on the marginal effects of eigenvalues adjustment in this paper largely stems from the observation that most improved estimators typically deviate only mildly from the sample covariance matrix.
Second, our simultaneous selection of the out-of-sample Sharpe ratio as a target and the covari-ance estimators reconstructed based on adjusted sample eigenvalues as candidates differentiates this paper from its peers in the literature which adopt other objectives (Ledoit and Wolf (2004); Carrasco and Noumon (2011);Bodnar et al. (2014)) and those which consider other candidate estimators (Ledoit and Wolf (2004); Kan and Zhou (2007); Bodnar et al. (2018)). To the best of our knowledge Ledoit and Wolf (2017) are the only authors who optimize (convergence limit of) the out-of-sample Sharpe ratio over the same class of estimators that we consider in this paper.
The main difference between our paper and theirs will be elaborated in the next paragraph.
Third, in this paper we assume in our theoretical analysis that the asset return generating process follows a high-dimensional factor model, where the returns of an increasing number of assets are governed by a fixed number of common factors. The high-dimensional factor model assumption implies a spiked structure in the population covariance matrix: as the number of assets increases to infinity, the first few eigenvalues increase at the same rate and the remaining ones are bounded. The spiked structure with rapidly-growing spiked eigenvalues renders many of the results developed within the framework of the Random Matrix Theory (RMT) practically impotent, in that the spectral convergence results for this type of population covariance matrix have not yet been fully worked out to this date. This technical hurdle prevents us from deriving the optimal eigenvalues adjustment under a factor model. The few papers that borrow results from the RMT to improve portfolios , Engle et al. (2017), Bodnar et al. (2018) 5 ) assume the population covariance matrix to have bounded eigenvalues, which contradicts conventional factor model assumptions. Expoiting the important results obtained by Shen et al. (2016) on the consistency of Principal Component Analysis (PCA) under a variety of assumptions on the population covariance matrix and different asymptotics, we are able to find a suitable adjustment pattern that ensures a positive marginal effect under certain high-dimensional asymptotics.
According to the main theoretical results of this paper, if the population covariance matrix admits a high-dimensional K-factor model, adjusting one of the first K sample eigenvalues has a diminishing effect under the high-dimensional asymptotics that the number of assets p and the sample size n both go to infinity with n = O(p 1+c ) for any c > 0. In addition either shrinking a large but non-spiked (excluding the first K) sample eigenvalues or lifting a small one also has a positive effect on the out-of-sample Sharpe ratio asymptotically. We also analyze a simultaneous 5 It is important to point out that although the authors did not explicitly assume the population covariance matrix to have bounded eigenvalues, in their major technical reference (Rubio and Mestre (2011)), the population covariance matrix is indeed assumed to have a bounded spectral norm. adjustment of multiple eigenvalues. Let { λ i } p i=1 denote decreasingly sorted sample eigenvalues.
Then, for any value of k and any a < 1, simultaneously amplifying the smallest sample eigenvalues { λ i : i ≥ k} to { λ i + λ λ a i : i ≥ k} for some small λ > 0 leads to an improvement in the outof-sample Sharpe ratio under the high-dimensional asymptotics. In addition, if the underlying model is a single-factor model and the factor pricing relation is assumed to hold true, i.e., if both the expected return vector and the covariance matrix are driven by the single-factor model (see a discussion in MacKinlay and Pástor (2000)), the aforementioned way of adjusting eigenvalues also always leads to an improvement in the out-of-sample portfolio Sharpe ratio, regardless of the values of p and n.
Our results shed an important light on the reason for the effectiveness of the shrinkagetowards-identity method (k = 1, a = 0) and the spectral cut-off method (k ≥ min{k : λ k < 1}, a = −∞) in the construction of real-world portfolios in practice. A useful implication of our results is that the key to improving the out-of-sample portfolio performance is to make the eigenvalues overall less dispersed (this will be formally defined later) after the adjustment. Although Ledoit and Wolf (2004) have made a similar statement, the reasoning underlying their argument is based on the fact that sample eigenvalues are more dispersed compared with their population counterparts and, thus, should be corrected to reduce the expected Frobenius loss. By showing that the shrinkage towards identity type estimator has a concrete marginal effect of improving the out-of-sample Sharpe ratio under factor model assumptions, we provide this well-known method with a fresh new insight.
The remaining part of the paper is organized as follows. Section 2 provides an expression for the marginal effect of sample eigenvalues adjustment on the out-of-sample Sharpe ratio and discusses what type of an adjustment pattern would ensure a positive effect. Section 3 uses a simulation experiment to further illustrate the theoretical results established in Section 2. Section 4 provides synthesis on some of the recent research in portfolio analysis by highlighting an important connection of the main results in this paper with a few existing approaches recently introduced in the literature. Section 5 concludes the paper.
Throughout the paper we use bold capital letters to denote matrices, bold lowercase letters to denote vectors, and plain lowercase letters to denote scalars. The term "return" on a given asset denotes the asset's return in excess of the riskless rate. Let X = (x 1 , x 2 , . . . , x n ) denote a p × n matrix containing n independently and identically distributed observations on a system of p(< n) asset returns with some mean vector, a positive definite covariance matrix Σ Σ Σ, and a finite fourth moment. We denote the eigen-decomposition of the population covariance matrix by x i , denote the sample covariance matrix, whose eigen-decomposition is S = U Λ Λ Λ U T . Similarly, we denote Λ Λ Λ = diag{ λ 1 ≥ · · · ≥ λ p } and U = ( u 1 , u 2 , . . . , u p ). Since we focus on occasions where n is greater than p in the theoretical analysis, we assume that S is invertible throughout the paper without loss of generality. In addition, a and a ∞ denote the Euclidean norm and the maximum norm of a vector a respectively. Lastly, E k denotes a conforming diagonal matrix with 1 being its kth diagonal entry and 0 elsewhere; E k + denotes a conforming diagonal matrix with 0 being its first k − 1 diagonal entries and 1 being its diagonal entries beyond (and including) the kth entry.

Problem Setup
Suppose that an investor adopts a two-step approach to construct a single-period maximum Sharpe ratio (MSR) portfolio of p risky assets. Further suppose that the investor believes that the vector of expected returns is an exogenously given vector µ µ µ, and estimates the covariance matrix from assets' price history. We consider such a case because it has been widely noticed that estimating expected returns from price history is a major source of estimation error (Jorion (1985); Jagannathan and Ma (2003); DeMiguel et al. (2009b)) and investors, as a result, resort to alternative approaches to acquire the expected returns (recall the first paragraph of Section 1). In particular many researchers have exploited asset pricing anomalies to find better proxies for the expected returns. Harvey et al. (2016) contains a review of hundreds of cross-sectional return patterns. Given the abundance of choice for µ µ µ, we leave it to the investor to make her/his own individual choice and simply take µ µ µ as exogenously given for the purpose of our analysis in this paper.
When a sample covariance matrix S is adopted as the covariance estimator, the two-step ap-proach based MSR portfolio can be solved from the following optimization program: where 1 is a column vector of ones. Note that in the above Sharpe ratio maximization problem, the objective function represents an ex-ante Sharpe ratio, since µ µ µ represents the investor's subjective belief on expected asset returns over the investment horizon. The solution to the above program, if it exists, is given by and the out-of-sample Sharpe ratio is given by It has become widely accepted in this literature that the out-of-sample Sharpe ratio of the twostep approach based MSR portfolio could deviate substantially from its actual maximum value, especially when p is large relative to n, due to significant estimation errors in the sample covariance matrix. Based on this reason, we exploit the possibility of improving the out-of-sample Sharpe ratio by substituting S with a re-constructed covariance matrix estimator S and assign a portfolio weight as w MSR = S −1 µ µ µ In this paper we focus on adjusting the eigenvalues of the sample covariance matrix, which means that we consider a covariance matrix estimator that takes the form S = U Λ Λ Λ U T , with Λ Λ Λ being a diagonal matrix. This is known as a class of "rotation-equivariant" estimators which was originally introduced by Stein (1986) and which has been considered as candidate covariance estimators by Ledoit and Wolf (2017). We set out our analyses in this paper by slightly pulling a number of sample eigenvalues away from their original levels and explore the marginal effect of such an adjustment on the out-of-sample Sharpe ratio. For this purpose we use a diagonal matrix V = diag{v 1 , v 2 , . . . , v p } and a scalar λ to parameterize the sample eigenvalues adjustment as follows: The parameterization above is quite general in that when V takes different forms, it could reduce to a number of ways in which to adjust sample eigenvalues. For instance, when V is a scalar multiple of an identity matrix, the adjustment is similar to (but not the same as 6 ) the construction of the "shrinkage towards identity" method proposed by Ledoit and Wolf (2004). The shrinkage towards identity estimator is usually referred to as a "linear" shrinkage in the sense that if we plot the eigenvalues after adjustment versus their original counterparts, all points will lie on a straight line. This notion is in contrast to the recent development of a "nonlinear" shrinkage estimator ) in which each eigenvalue is adjusted differently. It is important also to point out that our parameterization allows for a nonlinear shrinkage since it does not restrict the v k 's to be the same. In addition, this setup also accommodates the cases in which only a subset of the sample eigenvalues are adjusted.
The MSR portfolio constructed based on S V,λ has a weight vector given by: It is worth pointing out that the v k λ 's in the above expression for S V,λ is not necessarily positive.
This implies a shrinkage on the kth eigenvalue for a positive v k λ and an amplification for a negative v k λ . With the weight vector w MSR (V, λ ) given in equation (4), the out-of-sample Sharpe ratio could also be expressed as a function of V and λ : Therefore our goal of studying the marginal effect of adjusting sample eigenvalues on the out-ofsample Sharpe ratio can be reduced to one of studying the property of the derivative SR V (0) := As the first step, we derive a simplified expression for SR V (0) as given in the following theorem.
Theorem 2.1. The marginal effect of an eigenvalues adjustment specified by a diagonal matrix V on the out-of-sample Sharpe ratio admits the following expression: Proof. See the Appendix.
Obviously SR V (0) is a random variable in that it depends on the sample covariance matrix.
We cannot observe the realization of the random variable purely based on a sample, because the random variable also relies on the unobservable population covariance matrix Σ Σ Σ. Ideally it will be intriguing to find a set of matrices, denoted by M , such that for any V ∈ M , the inequality SR V (0) > 0 holds for any value of p and n, or less preferably, holds asymptotically in certain limiting scenarios. If such a set can be identified we can improve an MSR portfolio's out-ofsample Sharpe ratio by replacing S with S − λ V for a small positive number λ and some V ∈ M .
Given the complexity of the expression for SR V (0) it is in general quite challenging to determine its sign without imposing any explicit structural assumption on Σ Σ Σ. Therefore, in the next two sections we look for the set of matrices M under two distinctive sets of assumptions on Σ Σ Σ.

High-Dimensional Factor Model
In this section we motivate our theoretical analyses with the assumption that the asset returns are generated by a high-dimensional K-factor model, which is a well-known and extensively used model for cross-sectional financial returns (Fama and French (1992); Bai and Ng (2002); Fan et al. (2008)). To be more specific we assume that the systematic portion of the asset prices movement is driven by K common factors and the idiosyncratic returns are mutually uncorrelated 7 , so that the population covariance matrix has a "low rank + diagonal" structure. However we do not assume that the factor pricing relation to hold 8 , nor do we assume the factor returns and, thus, the asset 7 Uncorrelatedness of residual returns is an assumption of the strict factor model. Recently researchers have been considering a more practical approximate factor model (Fan et al. (2011)) which allows for a sparse residual covariance matrix. But in this paper we focus on the strict factor model for viability reasons. 8 If the factor pricing relation holds, µ µ µ should be estimated according to the factor model. We refer readers to MacKinlay and Pástor (2000) for a discussion. returns to have a mean vector independent of time. Rather we still view µ µ µ as exogenously given.
Our goal is to find a collection of V matrices which make SR V (0) asymptotically positive as both p and n go to infinity.
It is worth mentioning that we are not the first to consider manipulation of sample eigenvalues for the purpose of improving the out-of-sample Sharpe ratio under the high-dimensional asymptotics. For instance, Ledoit and Wolf (2017), after imposing a set of technical assumptions, derived a convergence limit for the out-of-sample Sharpe ratio as both the number of assets p and the sample size n go to infinity at the same rate. In the next step they look for an optimal shrinkage on each sample eigenvalue to maximize the deterministic convergence limit of the out-of-sample Sharpe ratio. However one of the technical assumptions made in Ledoit and Wolf (2017) (Assumption 2) indicates that as p increases, the eigenvalues of the population covariance matrix are contained in a compact set, or to put it simply, the largest population eigenvalue does not increase at the rate O(p). This assumption indicates that the authors work under a framework seemingly at odds with the standard high-dimensional K-factor model framework. In the latter the largest K population eigenvalues increase at the rate O(p). This discrepancy in assumptions, as well as the popularity of the factor model structure, makes it necessary to examine the limiting behavior of the marginal effect random variable under the high-dimensional asymptotics as well as the factor model assumptions.
Under a K-factor model the cross-sectional returns of the p assets are assumed to be driven by K common factors: where y t is a p × 1 vector of asset returns at time t, B is a p × K deterministic matrix of factor loadings, f t is a K ×1 vector of factor returns at time t, and ε ε ε t is a p×1 noise vector independent of f t with a zero mean, a covariance matrix Σ Σ Σ ε , and a finite fourth moment. Our subsequent analysis in this section is based on the following high-dimensional K-factor model assumptions.
Assumption 2.1. The autocovariance functions for both {f t } n t=1 and {ε ε ε t } n t=1 are independent of t.
Assumption 2.2. The K × K covariance matrix cov(f t ) is of full rank.
This assumption implies that none of the common factors can be written into a linear combination of the remaining ones. As a result the rank of the matrix Bcov(f t )B T is K as long as the rank of B is K.
Assumption 2.3. The eigenvalues of p −1 B T B are bounded away from zero for all sufficiently large p. Fan et al. (2013) provide an insightful explanation for this assumption. The authors point out that this assumption easily holds when the factors are pervasive in the sense that a non-negligible fraction of factor loadings should be non-vanishing. This assumption, together with Assumption Assumption 2.4. The residual covariance matrix Σ Σ Σ ε is a constant multiple of the identity matrix, i.e. Σ Σ Σ ε = σ 2 I.
Admittedly this assumption serves purely as a technical purpose without which we would be unable to verify the Theorem 2.2 below. Notwithstanding its restrictiveness, this assumption captures a well-known stylized fact in finance that assets' idiosyncratic variances are of a similar scale. The same assumption is also used by MacKinlay and Pástor (2000) in a factor model.
The first half of the assumption stipulates that a non-vanishing proportion of the assets has a non-zero expected return. This assumption is relatively straightforward and not unduly restrictive.
The second half of the assumption restrains µ µ µ from having an excessive loading on the first K eigenvectors. Typically there is no a priori reason to believe that µ µ µ has an excessive loading on any specific eigenvector, since µ µ µ can be obtained from multiple resources. An example of violation of this assumption is when µ µ µ is a linear combination of the first K eigenvectors, in which case Guo et al. (2019) point out that in such a scenario, even directly using the sample covariance matrix still leads to a consistent portfolio weight estimator. In this paper we focus on the more common cases where manipulation of sample eigenvalues potentially leads to some improvement. So we exclude such extreme scenarios from consideration by imposing Assumption 2.5.
Theorem 2.2. Under Assumptions 2.1 -2.5 and as p, n go to infinity with the rate of n being n = O(p 1+c ) for some c > 0, the following results hold: In addition there also exists an integer K * ∈ {K + 2, . . . , p − 1} such that for each k ∈ {K + 1, . . . , p}, there exists a sequence of almost surely positive random variables denoted by (b) For any a < 1 and any k ∈ {1, . . . , p} there exists a sequence of almost surely positive random variables denoted by {Y where Λ Λ Λ a denotes the diagonal matrix diag{ λ a 1 , λ a 2 , . . . , λ a p }.
Proof. See the Appendix.
Remark 2.1. Theorem 2.2 identifies a few forms of the V matrix that can lead to a marginal improvement in the out-of-sample Sharpe ratio under the high-dimensional asymptotic that both p, n go to infinity with the rate of n being n = O(p 1+c ) for some c > 0. It is important to stress that in the literature, the "high-dimensional asymptotics" term is typically used to refer to the cases when both p and n go to infinity at the same rate. The asymptotics we base our analysis on in this paper is only of a slightly lower-dimensional nature, since c can be arbitrarily small. For this reason, although the ratio p/n eventually converges to 0, we still view these asymptotics, perhaps with a slight abuse of terminology, as a high-dimensional case.
Remark 2.2. Many results in this paper hold "almost surely". Unless otherwise stated, "some random variable is positive almost surely" can be understood as that it takes the value of 0 only when S = Σ Σ Σ, an event that happens with probability 0.
Part ( In addition, according to part (a) of Theorem 2.2 (see eq. (9)), both a mild shrinkage on a large but non-spiked (excluding the first K) sample eigenvalue and a mild amplification on a small one help to improve the out-of-sample Sharpe ratio. This result is in line with the intuition that the sample eigenvalues should be pushed back towards their grand mean for the purpose of improving the MSR portfolio, because sample eigenvalues are more dispersed than their population counterparts (Marčenko and Pastur (1967)). Before stating this theorem formally, we should be cautious about rushing to form such an intuition. The reason is that even though pushing sample eigenvalues back to their grand mean can help alleviating the estimation errors measured by the expected Frobenius loss E( S − Σ Σ Σ F ) (Ledoit and Wolf (2004)), the out-of-sample Sharpe ratio will not be necessarily affected in the same direction. More discussion in this regard will be given in Section 4.
Part (b) of Theorem 2.2 focuses on a joint manipulation of multiple sample eigenvalues. The parameter a specifies the relative intensity of the adjustment on different sample eigenvalues. The result indicates that a joint amplification on a collection of tail sample eigenvalues has the marginal effect of improving the out-of-sample Sharpe ratio, as long as a < 1. It is worth highlighting that the eigenvalues to be adjusted must form a complete "tail" -no matter where we start the adjustment, we should adjust all of the eigenvalues beyond (smaller than) the starting point. The proposed range for a implies that the eigenvalues should become overall less dispersed after the adjustment, in the sense that the inequality λ i λ j ≤ λ i λ j holds for any i < j, where λ i is the ith largest eigenvalue after the adjustment. When k = 1 and a = 1, the adjustment is equivalent to replacing S with its scalar multiple. Such an adjustment will not lead to any change in the out-of-sample Sharpe ratio. This is why a must be strictly less than 1 to ensure the positiveness of SR When k = 1 and a = 0, the adjustment is consistent with the notion of a "linear" shrinkage, since each eigenvalue is lifted by the same amount. Other values of a correspond to a "nonlinear" shrinkage. Instead of using p parameters to parameterize a nonlinear shrinkage, as in Ledoit and Wolf (2017), we use 2 by requiring that the shrinkage intensity matrix V is a power of the sample eigenvalue matrix times an "indicator matrix" specifying the starting point of the manipulation.
This parameterization is rich in its implication despite of its parsimonity.
The out-of-sample Sharpe ratio is scale-invariant in the covariance matrix estimator. Although in a shrinkage towards identity estimator the large eigenvalues are shrunk and the small ones are lifted, the estimator has an equivalent (in the sense of leading to the same out-of-sample Sharpe ratio) rotation-equivariant correspondence each of whose eigenvalues is amplified. Therefore our results provide support to shrinkage estimators with trace at different levels. The connection between our results and the shrinkage estimators will be discussed in detail in Section 4.

Single-Factor Model
In this section we assess the marginal effect of adjusting sample eigenvalue(s) under a single-factor model, which assumes that a single unobservable factor drives the price movement of all assets.
MacKinlay and Pástor (2000) showed by using an "optimal orthogonal portfolio" (MacKinlay (1995)) argument that if the exact single-factor pricing relation holds and the uncorrelated residual returns have equal variance σ 2 , the true expected return (still denoted by µ µ µ) and the covariance matrix has the following relationship: where s h denotes µ h σ h , the Sharpe ratio of the factor portfolio h. In the following Theorem 2.3 we will show that if the exact single-factor pricing relation is satisfied and if the subjective view on expected returns coincides with the true expected returns vector, we can find a set of V matrices that make SR V (0) positive even when both p and n are small. It is important to clarify that Theorem 2.3 could not serve a practical purpose because of the following paradox: for an arbitrary µ µ µ supplied by some "alpha model", if the structural assumption in eq. (11) is not satisfied, the subsequent results in the theorem do not necessarily hold; otherwise, we would immediately obtain the population covariance matrix and the issue of estimation errors will not be an issue any longer. However Theorem 2.3 has a strong theoretical implication in that at least, it identifies a set of V matrices that work when p and n can be any number under a reasonable economic model. Theorem 2.3. If Σ Σ Σ can be expressed by the exogenously given µ µ µ as in eq. (11), then the following results hold for any value of p, n, s h , and σ 2 : (a) There exists an integer K ∈ {2, 3, . . . , p − 1} such that with probability 1, SR E k (0) > 0 for all k < K and SR −E k (0) > 0 for all k > K. Remark 2.3. Since the set of V matrices given in Theorems 2.2 and 2.3 is the same, the latter theorem may seem to be a finite sample analogous to the former, while there is a major difference between these two sets of results: the structural assumption on Σ Σ Σ in eq. (11) implies that µ µ µ only has a non-zero loading on the dominant eigenvector 9 of Σ Σ Σ, but this specific scenario is excluded from the previous analyses by Assumption 2.5. Therefore Theorem 2.3 can be usefully viewed as a complement to Theorem 2.2 that impresses us with its insightful finite sample results.
Part (a) of Theorem 2.3 focuses on adjusting an individual sample eigenvalue. Both an incremental shrinkage on a large sample eigenvalue and an incremental amplification on a small sample eigenvalue lead to an increase in the out-of-sample Sharpe ratio. In addition there also exists a cutting point between such large eigenvalues and small eigenvalues. This result also helps to partly justify the use of the shrinkage towards identity method. Part (b) of Theorem 2.3 focuses on simultaneously adjusting a few eigenvalues. The results indicate that if we apply a mild amplification on a collection of tail eigenvalues, so that the eigenvalues become overall less dispersed, this will result in improvement in the out-of-sample Sharpe ratio of the protfolio.

Simulation Study
In this section we perform a simulation study to further illustrate the theoretical findings reported in Section 2.3. In particular we demonstrate, via simulation, the property of the sample-dependent random variable SR V (0) for some V matrices, which we have discussed in the previous section.

Simulation Setup
In each experiment we pre-specify a p × 1 (subjective) expected returns vector µ µ µ and a p × p population covariance matrix Σ Σ Σ and repeat the following procedure 100 times: (a) Generate n(p) = [p 1.5 ] random vectors from a multivariate normal distribution with an expected return vector µ µ µ and a covariance matrix Σ Σ Σ. Calculate the sample covariance matrix S based on the n simulated vectors.
Once we have collected the 100 realizations of the 2p marginal effect random variables, we can calculate the average value of each, denoted by SR E k (0) and SR −E k + (0) respectively, k = 1, 2, . . . , p. Then we plot SR E k (0) as a function of k to demonstrate the marginal effect of shrinking the kth sample eigenvalue. Likewise we also plot SR −E k + (0) as a function of k to illustrate the marginal effect of simultaneously amplifying the smallest p − k sample eigenvalues.
In this simulation study we consider 8 different combinations of p, Σ Σ Σ, and µ µ µ when specifying the true parameters. To be more specific we provide two choices for each parameter: p: The value of p is set in this simulation exercise to either 100 or 500 in consideration of the required computational time. The idea is that when p = 500, the results that we obtain should better reflect the asymptotic results stated in Theorem 2.2. Note that we do not treat the sample size n as another parameter but simply let it be a function of p to be consistent with the asymptotics we are working with.
Σ Σ Σ: The population covariance matrix Σ Σ Σ either strictly conforms to Assumptions 2.1 -2.4 10 or violates Assumption 2.4 but satisfies the remaining ones. Recall that Assumption 2.4 requires all of the non-spiked population eigenvalues to be equal to each other. This assumption is indispensable for the proof of our theoretical results but is hard to be satisfied in reality. Thus we resort to the simulation study to explore to what extent the numerical results will be affected if we allow for distinct small population eigenvalues. However it is worth pointing out that either choice for Σ Σ Σ admits the K-factor model. In this study we fix the value of K to be 3. µ µ µ: The expected return vector µ µ µ is either an "arbitrary" one in the sense that it has a non-zero loading on each of the eigenvectors of Σ Σ Σ or a "low-rank" one which can be expressed as a linear combination of the first K eigenvectors of Σ Σ Σ. To obtain an "arbitrary" µ µ µ, we simulate a p × 1 vector of independent components generated from the distribution N(0.05, 0.05 2 ).
When a "low-rank" µ µ µ is desired, we project the simulated µ µ µ onto the subspace spanned by the first K eigenvectors of Σ Σ Σ and re-scale the projection so that it remains the same length as the "arbitrary" µ µ µ. The purpose of such a design is to highlight the finding that the marginal effect of manipulating one or more sample eigenvalues not only depends on Σ Σ Σ but also on µ µ µ. 10 Note that Assumption 2.5 is not about the population covariance matrix. In each panel, the x-axis measures k and the y-axis measures SR E k (0). In panels (a) and (c), p = 100 and n = 1000; in panels (b) and (d), p = 500 and n = 11180. In panels (a) and (b), the Σ Σ Σ has 3 spiked eigenvalues and the remaining ones equal to 0.05; in panels (c) and (d), the Σ Σ Σ has 3 spiked eigenvalues and the remaining ones are a sorted sample generated from Uniform(0.025, 0.075). In each panel the red dots correspond to the case where µ µ µ can have loading on all eigenvectors; the blue dots correspond to the case where µ µ µ is a linear combination of {u 1 , u 2 , u 3 }.

Shrinkage on Individual Eigenvalue
In this section we show the simulation results on the marginal effect of shrinking a single sample eigenvalue. Specifically we illustrate how SR E k (0) varies with k in Figure 1.
According to Figure 1, when Assumptions 2.1 -2.5 are satisfied (see the red dots in panels (a) and (b)), the marginal effect of shrinking one of the first K sample eigenvalues is almost 0, shrinking a large eigenvalue beyond the Kth leads to a marginal improvement on the out-of-sample Sharpe ratio, and shrinking a small eigenvalue leads to a deterioration of it. The last observation is equivalent to the statement that amplifying a small eigenvalue has a positive marginal effect on the Sharpe ratio. Moreover the magnitude (in the sense of absolute value) of the marginal effect of adjusting one of the smallest eigenvalues is quite large. The population covariance matrix Σ Σ Σ used in panels (c) and (d) has distinct tail eigenvalues, however, we observe a very similar pattern (in the red dots) to that observed in the two panels at the top.
The blue dots come from the setup where µ µ µ is from the 3-dimensional space span{u 1 , u 2 , u 3 }.
As mentioned in Guo et al. (2019), in such a scenario, the sample-based MSR portfolio is a consistent estimator of the true MSR portfolio, and it, thus, becomes less necessary to seek improvement via adjustment of eigenvalues. What we observe from the blue dots in Figure 1 supports this argument -the magnitude of the marginal effect of eigenvalues adjustment is negligible compared with the case where µ µ µ is an "arbitrary" vector.    Table 1 reports such information. According to the first two rows, when the last p − K population eigenvalues are equal, shrinking a large but non-spiked eigenvalue or amplifying a small one almost always leads to a marginal improvement in the out-of-sample Sharpe ratio. When the 'flat tail" assumption is removed, there is still a high chance (around 85% according to Table 1) that such manipulation leads to further improvement.

Amplification on Tail Eigenvalues
In this section we present the simulation results on the marginal effect of amplifying a collection of tail eigenvalues. In particular we illustrate how SR −E k + (0) varies with k in Figure 2.  blue ones we conclude that when µ µ µ lies in the subspace spanned by the first K eigenvectors, the marginal adjustment effect on the out-of-sample Sharpe ratio is minutely small in magnitude. The reason is the same as we have previously stated, i.e., this happens because the sample-based MSR portfolio is good enough to yield a Sharpe ratio close to the actual maximum one. As we move to panels (c) and (d), where the "flat tail" assumption is violated, no marked difference from the two panels at the top is observed.
In addition, even a quick glance of Figure 2 reminds us that there is some "optimal" k which corresponds to the strongest marginal effect. This result is expected because amplifying the largest few eigenvalues might counteract the improvement brought by amplifying the small ones; so it could be better to solely amplify the small ones. However it is not our specific goal in this paper to find an optimal k. One reason for this is the technical difficulty associated with it, and another reason is that it is not meaningful to look for the optimal k without making sure that a higher SR −E k + (0) leads to a higher SR(−E k + , λ ) − SR(−E k + , 0), where λ is a small positive number. 11 Although we do not intend to discuss how to find the optimal k, it is important to point out that the answer to this problem depends on the relationship between µ µ µ and Σ Σ Σ. A simple illustration on this point is that the optimal k based on the red dots and that based on the blue ones are quite different. Table 2: Proportion of positive SR −E k + (0) among the 100 replications k K K+1 K+5 p-6 p-2 p-1 (a) p = 100, equal tail 1.00 1.00 1.00 1.00 1.00 1.00 (b) p = 500, equal tail 1.00 1.00 1.00 1.00 1.00 1.00 (c) p = 100, distinct tail 1.00 1.00 1.00 1.00 0.94 0.83 (d) p = 500, distinct tail 1.00 1.00 1.00 1.00 0.96 0.85 In this table, we only report results for the cases where µ µ µ is an "arbitrary" vector and can have loading on all eigenvectors. In (a) and (b), the Σ Σ Σ has 3 spiked eigenvalues and the remaining ones equal 0.05; in (c) and (d), the Σ Σ Σ has 3 spiked eigenvalues and the remaining ones are a sorted sample generated from Uniform(0.025, 0.075).  Table 2 reports these proportions. According to the first two rows of Table 2, when all assumptions about the population covariance matrix are met, the six reported marginal effect random variables are positive in all replications. When Assumption 2.4 is violated, as can be seen in the last two rows of Table 2, there are a few occasions where amplifying the last sample eigenvalue does not lead to improvement; when it comes to amplifying the smallest two eigenvalues however, there are fewer such occasions. In all replications amplifying the smallest six eigenvalues has a positive marginal effect on the out-of-sample Sharpe ratio.
The simulation results provide support to the theoretical conclusions reached in Section 2. To recap under a large-dimensional K-factor model (envisaged in the theoretical analsysis), shrinking a large non-spiked sample eigenvalue and amplifying a small one by a small amount both lead to improvement in the out-of-sample Sharpe ratio. In addition a mild linear amplification on any number of tail sample eigenvalues also leads to improvement. 11 Information about the higher-order derivatives is needed here.
In this section we provide synthesis on some of the recent research in the portfolio analysis by highlighting an important connection between our theoretical results and a few recently introduced methods in the literature with the goal to improve optimized portfolios.

Shrinkage towards Identity Estimator
The shrinkage towards identity estimator (Ledoit and Wolf (2004)) is a weighted average of the sample covariance matrix and an identity matrix, i.e., S ST I = s 1 S + s 2 I for some s 1 , s 2 > 0. The effectiveness of this estimator can be partially explained by our theoretical results. Since the out-of-sample Sharpe ratio is scale-invariant in the covariance matrix estimator, an equivalent covariance estimator is S * ST I = S + s 2 s 1 I. Since both s 1 and s 2 are positive, S * ST I conforms to our proposed way of adjusting the sample eigenvalues if we let the parameters in V = − Λ Λ Λ a E k + be a = 0 and k = 1.
Readers may be quick to attribute the improved out-of-sample performance of the minimumvariance portfolio to the reduced expected Frobenius loss brought by the shrinkage estimator.
Actually the link between a reduced expected Frobenius loss and an increased out-of-sample Sharpe ratio is a little bit more tenuous. Although for both objective functions, , optimality is attained at Σ Σ Σ = Σ Σ Σ, an improvement in one of them does not necessarily ameliorate the other. In this paper we adopt a more involved objective function so that the improved out-of-sample performance of the portfolio can be more clearly explained. However this achievement comes at a cost: we can only derive analytically exact results for the marginal effect of adjusting eigenvalues instead of determining the optimal amount of the adjustment. This is why it was emphasized at the beginning of this section that the effectiveness of the shrinkage towards identity method could only be partially explained in this paper.

Nonlinear Shrinkage Estimator
The nonlinear shrinkage estimator ) extends the shrinkage towards identity estimator by allowing different eigenvalues to be adjusted independently. As mentioned in Section 2.3, under a few assumptions about the population covariance matrix, the authors derive a convergence limit of the out-of-sample Sharpe ratio as p and n go to infinity at the same rate. Then they search for the optimal shrinkage which maximizes the limiting out-of-sample Sharpe ratio.
Since there is no explicit formula for the optimal shrinkage, we resort to the numerical results provided in Ledoit and Wolf (2017) to ascertain just how the sample eigenvalues are adjusted. , where λ i is the ith eigenvalue after the nonlinear shrinkage, and i, j (i < j) both index some small eigenvalues. Among the large eigenvalues there is at least one i, j As a resultthere is no guarantee that after a nonlinear shrinkage, the eigenvalues become overall less dispersed. From this perspective our results are unable to provide a clear assurance for the improvement brought by the nonlinear estimator. But readers should note that our theoretical results present only a few sufficient conditions for achieving a marginal out-of-sample Sharpe ratio improvement. So, even if the nonlinear shrinkage estimator does not make each pair of eigenvalues less dispersed, it still can lead to improvement.
As has been mentioned earlier there is some similarity between this paper and Ledoit and Wolf (2017), in particular in terms of the objective function used and the family of estimator considered. However the key difference lies in the assumption about the population covariance matrix: our main theoretical results are based on a high-dimensional K-factor model under which the largest K population eigenvalues increase with p at rate O(p); a technical assumption in Ledoit and Wolf (2017) implies that they work under a framework where the largest population eigenvalue is bounded. Our assumption is seemingly more appropriate for financial asset returns covariance matrix, but the cost of adopting it is the divergence of the out-of-sample Sharpe ratio; conversely, the assumption of bounded eigenvalues is less realistic, but it ensures convergence of the out-ofsample Sharpe ratio and, thus, enables the authors to find an optimal shrinkage.

Spectral Cut-off Method
The spectral cut-off method is a stabilization technique applied to the process of inverting an ill-posed covariance matrix. This method reconstructs the inverse covariance estimator after discarding the eigenvectors associated with the smallest eigenvalues. Carrasco and Noumon (2011) propose a data-driven method for determining the number of eigenvectors to discard. A contemplation about the spectral cut-off method suggests that it can be viewed as a method which amplifies the smallest eigenvalues to infinity while keeping the large ones unchanged. From this perspective this method falls into our framework as a polar extreme case if we let the parameters in V = − Λ Λ Λ a E k + be k ≥ min{k : λ k < 1} and a = −∞. Our theoretical results in Section 2.2 also imply that if we simply target a marginal improvement in the out-of-sample Sharpe ratio, the number of eigenvectors to discard does not play a significant role.

Conclusion
We have provided a highly selected review and synthesis on some of the recent and notable contribution to research in the portfolio analyis. A unique perspective on this development was offered in this paper with emphasis on the ways in which to improve the out-of-sample Sharpe ratio of an MSR portfolio constructed based on the two-step approach. To accomplish this goal we assume a high-dimensional K-factor model in our theoretical analysis and investigate how improvement can be achieved by adjusting the sample eigenvalues according to certain patterns. Our main theoretical results show that simply adjusting one of the first K eigenvalues has a diminishing marginal effect; mildly shrinking a large but non-spiked one and amplifying a small one both lead to an improvement in the out-of-sample portfolio Sharpe ratio under the high-dimensional asymptotics.
The effect of adjusting multiple eigenvalues is also studied in the paper. Our results show that simultaneously amplifying a collection of tail eigenvalues according to certain nonlinear pattern yields a positive effect for the out-of-sample Sharpe ratio improvement. Our theoretical results are subsequently illustrated by means of a set of simulation studies performed in this paper.
Lastly, by judiciously identifying a few eigenvalues adjustment patterns in the portfolio that ensures a marginal improvement in the out-of-sample portfolio Sharpe ratio, we are able to provide a much-needed and critical synthesis on two approaches which have recently been introduced in the literature with the goal to improve the the portfolio performances. These approaches are the shrinkage towards identity method and the spectral cut-off method.

A Proofs of Technical Results
Proof of Theorem 2.1. Let g(V, λ ) := . Taking a partial derivative of g(V, λ ) with respect to λ , we obtain The derivative in eq. (12) can be simplified as follows: According to the definition of S V,λ it is easy to check that Now we can readily evaluate g V (0) with the assistance of eq. (13) and eq. (14). After a few straightforward steps of calculation we obtain: Since . It thus follows that: Proof of Theorem 2.2. Before proceeding to the proof, we introduce a few necessary notations.
Let U F = (u 1 , . . . , u K ) and U I = (u K+1 , . . . , u p ) denote the eigenvectors that correspond to the factors and the idiosyncratic components respectively. Further, let Λ Λ Λ F = diag{λ 1 , . . . , λ K } and According to Theorem 1 case (a) in Shen et al. (2016), as both p and n go to infinity with rate of n being n = O(p 1+c ) for some c > 0, we obtain the following results: (1) λ j λ j a.s.
The angle between a vector and a space is defined as the angle between the vector and its projection onto the space. We first derive the convergence rate for an inner product between population eigenvectors and their sample counterparts based on the results above. According to the definition of an angle, for j = 1, 2, . . . , K, we have: Thus, by applying a Taylor expansion on both sides of the above equation, we obtain: u T j u j = cos(arccos(u T j u j )) = 1 + o a.s. (1), j − 1, 2, . . . , K.
For j = K + 1, . . . , p, we have: Applying the Taylor expansion on both sides of the above equation and taking square of the resulting expression we get: Since u j is of a unit length it follows that: In addition we also can show that for i, j ∈ {K + 1, . . . , p} and i = j, According to eq. (17) and eq. (18), U T I U I U T I U I = I (p−K)×(p−K) + E, where E is a noise matrix whose elements have rate o a.s. (p −1 ).
Using the matrix notations introduced above the matrix U T Σ Σ Σ U can be expanded as: Let b = (b T F , b T I ) T be a p × 1 random vector. According to the convergence results derived above one of the two terms in the summation in each element of eq. (19) is negligible compared with the other term. The quadratic form b T U T Σ Σ Σ Ub, under the large dimensional asymptotics and Assumption 2.4, can be rewritten as: (1)).
The last equality holds because the diagonal elements of Λ Λ Λ F increase at a rate of O(p) and because elements in the matrix U T F U I have a rate of o a.s. (p −1/2 ). This result will be used in the subsequent proof. Now we prove the results one-by-one.
(a) When V takes the form of E k and as p, n go to infinity with rate of n being n = O(p 1+c ) (c > 0), for any k ∈ {1, . . . , p}, the first term inside the bracket in eq. (6) can be expanded as follows: , 1 λ p with probability 1, there exists a K * ∈ {K + 2, . . . , p − 1}, such that with probability 1, X (p) * k > 0 for any K + 1 ≤ k < K * and X (p) * k < 0 for any K * < k ≤ p.
We complete the proof by letting X (b) When V takes the form of E k + and as p, n go to infinity with rate of n being n = O(p 1+c ) (c > 0), for any k ∈ {1, 2, . . . , p}, the first term inside the bracket in eq. (6) can be expanded as follows: Proof of Theorem 2.3. With the additional assumption that Σ Σ Σ = µ µ µ µ µ µ T 1 s 2 h + σ 2 I, the terms inside the bracket in eq. (6) become: V U T µ µ µ µ µ µ T S −2 µ µ µ µ µ µ T S −1 µ µ µ .
Next we prove the two conclusions one-by-one.