Divergent selection on height and cognitive ability: evidence from both genetic distance (Fst) and polygenic scores

Tests of selection based on population differentiation were performed on two highly polygenic traits important for success and quality of life: body height and educational attainment (EA). Polygenic scores (PGS) of EA and height, computed across three public genomic databases revealed differences between populations (1000 Genomes, HGDP, gnomAD) that matched the average IQ and height of ethnic groups (r ~0.9). A moderately strong correlation between latitude and EA PGS (r= 0.67) implies the effect of climate (seasonality or winter temperature) on selection for cognitive ability. The effect of latitude was reduced ( β = 0.28) but remained significant after adding the sub-continental group variable to the regression model. The global Fst index revealed population differentiation at height and EA loci, significantly deviating from random SNPs. This is indicative of directional selection pressures with different strengths across groups. Substantial Linkage Equilibrium (LD) Decay between Africans and Europeans was found (r= 0.6) but there was no correlation between LD decay and population differences in polygenic scores for EA (r= 0.015, p= 0.45), and slight inflation of height PGS difference due to LD decay (r= -0.04, p= 0.0315). Selecting the SNPs most robust to LD decay (r>0.8) resulted in larger PGS gaps for EA, but smaller for height. Finally, it is shown that PGS differences are more sensitive to the significance of GWAS loci than Fst, reflecting the major limitations of Fst as an index of selection.


Introduction
1 that in Iceland, the EA PGS has been decreasing over time because people with higher EA polygenic scores tend to have fewer children.
Education is highly genetically correlated to intelligence (r>0.69), and can be used as a proxy for cognitive abilities (Davies et al, 2018;Hill et al., 2019;Savage et al., 2018).
However, other non-cognitive skills contribute to educational attainment, hence the selection signal might reflect selective pressure on other traits. Recently, a study found genetic associations with EA independent of cognitive abilities. Some traits such as extraversion, agreeableness and conscientiousness were positively correlated to the non-cognitive factor but negatively to intelligence (Demange et al., 2021).
Unfortunately the resulting non-cognitive factor retained some variance in common with intelligence (although to a much smaller extent than the cognitive PGS), preventing us from treating the two factors separately in the population-level analysis.
The existence of non-cognitive traits whose genetic correlation with intelligence and education go in the opposite direction makes it problematic to use EA as a proxy for IQ. For this reason, this study will use the summary statistics from the multi-trait analysis (MTAG) of education with cognitive performance, highest math class completed and self-reported math ability (Lee et al., 2018), which likely reflect the contribution of cognitive genetics more than the pure EA phenotype ("Years of Education"). Indeed, the MTAG PGS explained 11-13% of the variance in EA and 7-10% in cognitive performance (Lee et al., 2018).
These two traits were chosen because they are 1) socially relevant since they both affect success in life (academic, occupational and mating success); 2) highly polygenic and 3) GWAS are available for very large samples. Finally, two novel measures of selection are introduced: 1) the correlation of GWAS allele frequencies across pairs of populations and 2) the ratio between allele frequency difference and mean absolute allele frequency difference.

Methods
Polygenic scores were computed using GWAS SNPs meeting the standard significance threshold (p< 5 x 10 -8 ) and weighed by effect size. The inclusion criteria for GWAS were sample size and predictive power: the GWAS with the largest sample size and predictive power were used to compute polygenic scores. For education, the Lee et al.
(2018) GWAS met these criteria: it had the highest predictive power (~9%) and largest sample size (1.1 million) among GWAS of education and cognition. The EA MTAG (henceforth, "EA3")polygenic score was chosen because it encompassed several cognitive abilities and had the largest predictive power; for height, the GWAS meta-analysis 3 comprising 700,000 individuals by Yengo et al. (2018) had the largest sample size, but the highest predictive power (42.8%) was achieved by Chung et al. (2019), albeit with a smaller sample size (N= 456,837). The latter was used for computing polygenic scores, and the former to compute Fst distances and LD decay, because (unlike Chung et al., 2019) the publicly available statistics provided the necessary p-values to perform LD clumping and p-value thresholding.
The absolute allele frequency difference was calculated as the mean absolute difference of the allele frequencies Fst and allele frequencies were calculated using Plink 1.9 and polygenic scores for each population were computed using R (version 3.6). Code and data are available on OSF: https://osf.io/6dvfc/

Height PGS
The polygenic scores for height were calculated using Chung et al.'s (2019) GWAS, using the effect size in the Lasso+CTPR meta-analysis, which provided the best predictive power.
These PGS showed substantial inter-population variation. The highest values were obtained by northern Europeans (Finns, Orcadians, Northern Europeans from Utah) and the lowest by Native Americans (Karitiana, Surui) and Southeast Asians (Dai, Vietnam).

Correlation between height and education (PGS and average height)
The population-level correlation between EA3 and height PGS was close to 0 (r= 0.02, -0.304, 0.03 in gnomAD, 1000 Genomes and HGDP, respectively). In 1KG, the height PGS had higher average Fst than random SNPs,significantly deviating from Fst values of random PGS (Fig. 8). A similar deviation from random SNPs was observed in the HGDP dataset ( Fig. 9). The global Fst enrichment test yielded positive results for education too ( Fig. 10 and 11). The Fst values for random and GWAS SNPs and the Z scores + empirical p values are reported in table 2. There was no correlation between LD decay and the CEU-YRI PGS difference (r= 0.015, p= 0.451) (Fig. 12), meaning that SNPs that are more robust to LD decay have similar allele frequency differences between Africans and Europeans.

Global Fst
However, after selecting the SNPs with low LD decay (r> 0.8), the difference increased from 2.32% to 2.58%. With even lower LD decay (r> 0.9), the difference became larger (3.46%).

score.
Similar results were obtained for Chinese (CHB) and CEU, although there was a non-significant trend towards SNPs with low LD decay to show larger PGS differences (r= 0.022, p= 0.283. Fig. 13). The overall amount of LD decay was smaller than for CEU-YRI (r= 0.728). After selecting the SNPs with low LD decay (r> 0.8), the difference increased from 1.36% to 1.56% but it increased to 3.8% with even lower LD decay (r>0.9) Figure 13. Association between LD decay and CHB-CEU difference in EA polygenic score. This can be seen from the scatterplot in Fig. 14. After selecting the SNPs with low amount of LD decay (r>0.8), the PGS difference was reduced to 0.53%. With even lower LD decay (r>0.9), the difference became negative (-2.55%), with YRI showing higher scores.
A similar pattern was found for the CHB-CEU pair. There was a marginally significant negative correlation between (lack of) LD decay and PGS difference (r= -0.038, p= 0.055).
The degree of LD decay was lower than for CEU-YRI (r=0.742) and the PGS difference was 1.17 %. This difference decreased to 0.84% with LD > 0.8, and to 0.3% with LD >0.9.

Cross-population LD decay
An interesting question is whether LD decays pattern change across populations: do the SNPs that are less subject to LD decay between two populations, tend to show lower LD decay between other pairs of populations?
The correlation between the LD decay across SNPs for CEU-YRI and CEU-CHB (using the EA SNPs) was 0.3, implying a weak homogeneity in LD decay patterns across different population pairs. This finding makes it difficult to select SNPs that are robust to LD decay However, it is possible to compute for each pair, the PGS deviation from the reference population after taking into account LD decay. To do this, the reference population (CEU) score is constrained to 0, and each population's score is represented by the deviation from 0. The deviation score can be calculated either using the SNPs in low LD (i.e. r>0.8) for each population, or by computing the polygenic score for each SNP weighted by its LD decay coefficient. Table 3 reports the LD-decay weighted PGS for 2 populations. Since computing LD decay requires long computation/server time, this analysis was limited to a few populations.

Correlation between allele LD decay, Fst and allele frequency differences
Fst is a measure of allele frequency differences. However, unlike polygenic scores, it is non-directional, that is, it is "agnostic" about the allele's direction of effect. Hence, Fst suffers from loss of information compared to polygenic scores. Indeed, it was almost perfectly correlated (r= 0.97) to the absolute allele frequency difference (AFD, absolute value of the difference between population frequencies ) (Fig. 15), but had zero correlation with the polygenic score difference. Fst and AFD were also negatively correlated to (lack of) LD decay, that is SNPs with higher LD decay had higher Fst values.
The results were almost identical when using LD decay across European and East Asians (CEU -CHB). However, the extent of LD decay between CEU and CHB was lower than between CEU and YRI (r= 0.727 and 0.608, respectively; t= 22.043, p < 2.2e -16 ).

PGS difference/Fst as signal of selection
If the polygenic score difference between populations is more sensitive to selection than Fst or AFD, the average ratio of PGS difference to Fst across SNPs should be higher for GWAS SNPs under selection than for less selected traits. Since Fst is nearly equivalent to the absolute allele frequency difference, it is a mixture of allele frequency shifts due to drift and to directional selection but it is not possible to disentangle them. Conversely, the PGS difference is more representative of selection pressure. Hence, the AFD/PGS difference ratio could be a useful indicator of selection pressure on a trait. The PGS difference, AFD and their ratios for European-Chinese and European-African pairs are reported in table4.

Simulation with quasi-random GWAS SNPs
The EA GWAS SNPs with p value >0.95 (N= 2331) and MAF> 0.01 were used to compute the PGS delta, the AFD and the correlation with LD Decay. A prediction of the polygenic selection model is that the PGS delta and the PGS delta/absolute delta will be lower than in the GWAS significant SNPs.
The polygenic scores were close to the theoretically expected value of 50% (CEU= 0.4965 and YRI= 0.4997) and the difference was 0.3%. The AFD (0.12) was almost 2 times lower than in the GWAS significant SNPs, and the raw PGS delta/AFD ratio was~4.5 times smaller (0.0259). The lower AFD (Fst) difference is due to the MAF being lower in the less GWAS significant SNPs (0.184 vs 0.311). Indeed, Fst is mathematically dependent on MAF, and has lower values with lower MAF (Jakobsson, Edge & Rosenberg, 2013).

Difference in correlation coefficient between GWAS significant and quasi-random SNPs
Another metric to represent similarity in allele frequencies between a population pair is the correlation coefficient. Unlike the Fst, the correlation coefficient is sensitive to the sign of the difference between each pair of observations. In other words, it represents not only the strength of the relationship between two variables (or lack thereof) such as Fst or the absolute allele frequency difference; in fact, it is also an expression of the direction of such a relationship because it can acquire positive and negative values.
Therefore, directional (divergent) selection on a population pair should depress the correlation coefficient between the (same) alleles in the two populations. A prediction of this model is that the correlation between the allele frequencies of a pair of populations will be lower for GWAS significant SNPs than for less significant (or random) SNPs.
The partial correlation of the population frequencies controlling for LD decay was nearly identical, indicating that LD decay does not mediate the correlation.
Overall, the correlations for the GWAS significant SNPs across populations were lower than for quasi-random SNPs, suggesting the presence of divergent selection at these loci.

Random SNPs
1000 sets of random SNPs, matched for LD score and MAF, were generated using SNPSNAP. To simulate a GWAS, the status of the effect allele was assigned at random for each SNP. The average correlation between the CEU and YRI allele frequencies over 1000 sets was 0.814.

Individual PGS
Polygenic scores were computed for each individual in 1000 Genomes for Europeans and Africans. The scores were standardized (Z-transformed), and 0 represents the mean of the entire sample.
The individual polygenic scores differed by 1.7 SDs across the two groups (t = -20.902, df = 152.74, p-value < 2.2e -16 ). The PGS were calculated for all the populations in the European and African 1000 Genomes groups. Most of the variation was between continents, with little overlap, but considerable overlap among populations from the same continental group (Fig. 15). It is possible to quantify the amount of variance within and between groups using ANOVA. One-way ANOVA was run with the PGS as the dependent variable, and population (CEU and YRI) as the independent variable. The F statistic represents the ratio of the variance between groups and within groups. A value higher than 1 implies that the variance between groups is larger than the variance within groups. Welch's one-way ANOVA was performed on CEU and YRI to account for different variance between groups. The F-statistic was 141.47 (p< 2.2 *10^-16 ).

Discussion
Height and education-associated SNPs were both highly differentiated across populations, as shown by the global Fst enrichment test and polygenic scores. These differences matched differences in average trait (i.e. height and education), reaching correlations~0.9 with average population IQ and height (Figures 3, 4, 6, 7), implying that selection pressure after the out of Africa dispersal acted with different strength on different populations. The results were robust across different datasets (gnomAD, 1000 Genomes and HGDP), yielding more credibility to the findings. The lack of a correlation between LD 23 decay and population differences in polygenic scores (Fig. 12) suggests that low p value EA GWAS SNPs have a causal effect on the phenotype or that they are closely tagging causal variants (i.e. high LD). Another interpretation is that the tag variants represent random noise which does not bias the population means in one direction. However, if this was the case for all tag variants, the average allele frequency for both populations would be around 50% (i.e. see the quasi-random SNPs), and there would not be a significant difference. Indeed, the SNPs with low LD decay (r> 0.9) had larger CEU-YRI frequency difference than the others (3.5 % vs 2.3%), implying that the SNPs with most signal (and least noise) have larger frequency differences between populations.
For height, LD decay had a small but significant effect on the PGS difference, so that the PGS difference computed using SNPs in low LD decay (r> 0.8) was slightly reduced (from 0.67% to 0.53%). The SNPs with even lower LD decay (r> 0.9) produced a reversal of the difference, as Yorubans had higher PGS than CEU. This result must be interpreted with caution because it involves a small subset of SNPs. In this case, LD decay produces a small bias, inflating the European (or deflating the African) polygenic score.
Another way to interpret these findings is that the PGS differences between populations are driven by a subset of variants with very low LD decay (r> 0.8), whereas the majority of the variants add mostly noise which may or may not inflate the frequency differences. A detailed description of this phenomenon, however, is beyond the scope of this paper.
There was a lot of heterogeneity in LD decay patterns across populations (r= 0.3 for CEU-YRI and CEU-CHB with the EA GWAS), Hence, to increase polygenic score accuracy it is advisable to compute PGS differences selecting SNPs that have low LD decay for each specific pair of populations because the number of SNPs that are robust to LD decay across all populations is very small.
Climate is an environmental factor that might have influenced polygenic adaptation for education, via selection for enhanced cognitive ability or life-history traits. Indeed, a positive correlation (r= 0.68) between education PGS and latitude was found (Fig. 5). A simulation using random SNPs matched for minor allele frequency, showed that this result would be highly unlikely under a model of random SNPs with similar minor allele frequency (p= 0.0009) (Fig.6). Remarkably, the height PGS had a much weaker correlation (r= 0.3) with latitude ( Fig. 1), despite Bergmann's rule predicting cold-climate selection for larger sized animals. However, this mirrors findings of a weak positive relationship between height and latitude in human populations (Gustafsson & Lindenfors, 2009).
Ethnic groups of East Asian origin had large positive residuals, that is to say their polygenic scores for EA were higher than predicted by the line of best fit when using latitude in a simple linear regression model. After adding the "sub-continent" variable to the regression model, there was a significant improvement in model fit (p= 2.64 * 10 -7 ), with the variance explained (Adj. R 2 ) increasing from 44.4% to 75.8%. On the other hand, the effect of latitude on EA PGS was more than halved (Beta= 0.67 to 0.28). This suggests the presence of selective pressures (or introgression events) independent of latitude and shared within sub-continents. An alternative explanation is that the current geographical location of some groups doesn't reflect their historical origin. For example, a positive residual for East Asian populations could be due to an origin at higher latitudes followed by a relatively recent move southwards.
Recently, Stern et al. (2021) found evidence for directional selection on EA3 PGS.
The effect was partly mediated by EA3's correlation to a variable measuring skin pigmentation ("sunburning ability"). The authors found that after accounting for selection pressure on skin pigmentation, the selection signal on EA3 was attenuated. Hence, a large share of the selection pressure on EA3 was due to correlated response with another trait. In light of the strong correlation with latitude,an alternative explanation to directional selection on EA3 being a by-product of selection on pigmentation, is that climatic factors could account for the genetic correlation between EA3 and skin pigmentation. There are different mechanisms that can potentially drive a correlated response to selection between two or more traits: a) passive, meaning they are genetically linked (i.e. pleiotropy) but only one trait is beneficial for passing on an organism's genes and the other traits "hitch-hike" because they are influenced by the same genes or genetic variants in strong linkage with the selected trait, (b) the same environment selects for multiple traits at the same time, or (c) different environmental selection pressures happen to act simultaneously on multiple traits The present study also shows the empirical equivalence of Fst and absolute allele frequency difference (r= 0.97). An advantage of using polygenic scores compared to the Fst or absolute allele frequency difference is that it is directional, that is, that each allele's effect is taken into account, whether it is a risk allele or not, or whether it increases or decreases a phenotype. On the contrary, Fst and AFD are non-directional, so the mean Fst or AFD across many genetic variants is independent of alleles with positive effects being overrepresented in a population compared to another one. Hence, Fst or AFD are more representative of drift than of selection. Conversely, the polygenic score difference is more indicative of directional selection because it is dependent on the average direction of selection across many genetic variants. Hence, the ratio between the latter and the mean Fst or AFD is a measure of directional selection net of the effects of drift. Accordingly, there was a much stronger reduction in PGS difference than Fst in a set of low significance GWAS SNPs (p>0.95) , which putatively contain mostly noise and very little selection signal. They had lower (2x) Fst, but much lower (7x) PGS difference, resulting in a 4.5-fold reduction in PGS difference/AFD ratio (table 3).
Another way to represent the coefficient of directional selection is to compute the correlation between the frequencies of (GWAS significant) alleles with positive effect among a pair of populations; one can compare them to non-significant (almost random) SNPs, or a set of random SNPs matched for MAF. The advantage of using correlation of allele frequency across population pairs is that it is intuitive and, unlike Fst, sensitive to the direction of selection on each genetic variant as well as its strength. Divergent selection will make allele frequencies shift in different directions among populations. Indeed, the GWAS significant SNPs for height and EA had much weaker correlation of allele frequencies (r0 .6) compared to non-significant and random SNPs (r>0.8).
Allele frequencies computed by group are traditionally used for tests of selection.
This allows researchers to identify the genetic variants that have the strongest selection signals. However, another way to represent allele frequency differences between groups is with individual PGS: this gives a better idea of how much variation is partitioned between individuals and between groups. An in-depth analysis of this topic is not within the scope of this paper, but it can be seen in Fig. 15 that for polygenic scores of traits under strong polygenic selection, there is much more dispersion between continental groups than between individuals. That is, there is little overlap in the individual polygenic scores across continental groups (Africans and Europeans), as they deviate by more than 1.5 SDs. On the other hand, populations within the same continent have relatively small differences. This suggests that genetic differences in cognitive abilities and life history are due to factors (cultural, climatic, or archaic introgression) that are fairly homogeneous within (sub-)continents. However, selection pressures within super populations do exist, as implied by the existence of specific groups with significantly higher scores compared to their reference super populations (e.g. Ashkenazi Jews, Finns) and the persistence of the effect of latitude after taking into account the super population variable.
Finally, the next generation of GWAS studies are increasingly looking at trans-ethnic samples to improve polygenic predictions in different populations. As GWAS on different populations are available, future studies should investigate whether the SNPs with lower LD decay found using European-based GWAS have higher predictive accuracy in non-European populations.