Comparing Measures of Community Lineage Diversity across North American Forests

Lineage diversity can refer to the number of genetic lineages within species or to the number 1 of deeper evolutionary lineages, such as genera or families, within a community. Community lineage 2 diversity (CLD) is of interest to ecologists, evolutionary biologists, biogeographers, and those setting 3 conservation priorities. Despite its relevance, it is not clear how to best quantify CLD. With North 4 American tree communities as an example, we test which taxonomic and phylogenetic metrics best 5 measure CLD. We find that phylogenetic metrics outperform taxonomic metrics. Faith’s phylogenetic 6 diversity performs well, but is skewed towards the number of lineages in recent time. The best metric 7 is newly derived here, and termed time integrated lineage diversity (TILD). Mapping the lineage 8 diversity of tree communities across the contiguous United States, we find a spatial pattern differing 9 from that of species richness in key areas. The Pacific Northwest, Great Lakes Region, state of Maine, 10 and south-eastern piedmont and coastal plain forests all emerge as areas high in lineage diversity, but 11 relatively lower in species richness. We urge the consideration of lineage diversity, as well as species 12 richness, when setting conservation priorities. 13


Introduction
The evolutionary lineage is a fundamental concept in biology, denoting a group of organisms connected by ancestor-descendent relationships [1].Evolutionary lineages are hierarchically structured; multiple younger evolutionary lineages can be nested within an overarching older lineage, or clade.Thus, multiple genetically diverged lineages can exist within a single taxonomic species, and multiple species can belong to older evolutionary lineages, such as genera, families or orders.Knowing the number of lineages in different ecological communities and biogeographic regions gives insights into evolutionary process, biogeographic history, and conservation priorities.For example, a community or region that houses many lineages, and therefore more evolutionary history, may be a greater priority for conservation than one that houses few.However, the conservation value of lineage diversity has yet to be fully, and persuasively communicated [2][3][4].Providing clear and accurate quantification of lineage diversity may assist its integration into conservation practice.
In its most basic form, counting the number of lineages in communities could consist of counting the number of species.However, the term lineage diversity is generally applied when the units are not species, but a shallower or deeper evolutionary level, i.e. within or above the species taxonomic the contiguous United States to test various metrics by which community lineage diversity (hereafter, CLD) might be quantified using taxonomic and phylogenetic data.
Taxonomy is a hierarchical system for organising biological diversity.As such, it provides an apparently straightforward means of quantifying CLD at different evolutionary depths, for example by tallying the number of genera, families or orders in communities.However, Linnean taxonomic ranks are not 'natural' in the sense that they do not directly correlate to any precise evolutionary age.Some clades of a given taxonomic rank may actually be younger than clades of a putatively lower taxonomic rank.For example, the genus Pinus (Pinaceae) may be as old as 100 million years [15], which is older than most angiosperm families [16].If one were to compare a community of four Pinus species with a community of four angiosperm species belonging to different genera in the same family, and CLD were estimated as the number of genera in each community, the angiosperm community would appear to have 4x higher CLD.However, all four species in the community of Pinus may have diverged from each other prior to the origin of the most recent common ancestor of the four species in the angiosperm community (similar to mock communities B and C in Fig. 1), which could mean that the community of Pinus has greater conservation value in terms of encompassing greater total evolutionary history.directly analogous to constructing a lineage through time plot for a given evolutionary clade [17], and indeed, studies have proposed constructing lineage through time plots for individual communities [18].However, it is not clear at which evolutionary age, or phylogenetic depth, one should be counting lineages.A community that has more lineages at one, deeper time slice might have fewer lineages at another, more recent time slice (compare communities B vs. D in Fig. 1), which could be driven by variation in diversification histories, community assembly, or numerous other processes.It would be ideal to have a single, synthetic metric for CLD that integrates over the evolutionary history of the clade being studied.Community D has 4x as many lineages at 5 Ma.For this reason, researchers have suggested that the amount of PD a community contains above or below that expected given its SR is a better measure of CLD [12,13].However, if we were to follow that approach, then Community C might be considered to have more CLD than Community D (its ratio of PD:SR is twice that of Community D), even though at all phylogenetic depths Community D has the same or more lineages than Community C. Clearly, more work is needed to determine which metrics derived from phylogenies may provide the best measures of CLD that integrate over evolutionary timescales.
The overarching goal of the present manuscript is to empirically compare different community lineage diversity metrics, in order to determine the best synthetic measure of CLD over evolutionary timescales.As our empirical example, we focus on tree communities in the contiguous United States.These communities provide an ideal system for such an empirical study, as over 150,000 forest inventory plots have been sampled in a standardised way by the U.S. National Forest Service and existing time-calibrated phylogenies encompass nearly all species present in the plots.We use this unparalleled community phylogenetic dataset to 1) test the ability of taxonomic and phylogenetic metrics to measure CLD over multiple evolutionary ages, or phylogenetic depths and 2) map spatial patterns of tree alpha CLD in the forests of the contiguous United States, which can serve as a means to highlight areas of high lineage diversity for conservation attention.

Data Sources
We accessed compositional data from 177,549 plots sampled across the contiguous United States by the Forest Inventory and Analysis (FIA) Program of the U.S. Forest Service [20], via the BIEN package [21] for the R Statistical Environment (R Development Core  S1 for species added and associated literature reference), with the added taxon being placed halfway along the branch leading to its sister species or clade in the phylogeny.The branch length leading to the added taxon was set to a value such that the tree remained ultrametric.

Taxonomic Measures
In the absence of phylogenetic data, the number of supraspecific lineages in communities can be calculated as the number of taxa of a higher taxonomic rank.Classification systems are consistent across angiosperms and gymnosperms up to the order level, and we therefore tabulated the following taxonomic measures of lineage diversity for communities: number of genera, number of families and number of orders.

Phylogenetic Measures
Since the advent of molecular phylogenetics, diverse metrics have been developed and implemented to quantify the lineage, or evolutionary, diversity of communities from phylogenies (e.g., [23][24][25]).We focus here on metrics that aim to quantify the 'richness dimension of phylogenetic diversity' [26], as our interest is in 'how much' lineage diversity is in communities, not how diverged lineages are from each other in communities (e.g., as quantified by mean pairwise phylogenetic distance) or how evenly lineages are represented (e.g, as quantified by phylogenetic species evenness; [27]).In addition, conservation prioritisation is generally based on which species are present, not their relative abundance (which could reflect disturbance histories or other idiosyncratic processes), and we therefore focus on presence/absence metrics.This also increases the general utility of our results, as abundance information is not available for many datasets.
We started by calculating the most basic metric of CLD, phylogenetic diversity, or PD [19], which is the sum of all branch lengths in each community, including the branch that goes to the root of all seed plants.We also include its estimate standardised for variation in species richness.This is accomplished by calculating the first two moments of the null expectation for PD, given the number of species in the community, and using them to calculate a standardised effect size.The moments of the null distribution can be calculated by randomly shuffling the tips of the phylogeny many times, but there is an analytical expectation for these moments, which is the approach we used [28].We refer to this metric as the standardised phylogenetic diversity, or sPD.
We also calculated two additional proposed measures of the richness dimension of phylogenetic diversity, the phylogenetic species richness, or PSR [27] and the sum of evolutionary distinctiveness, or sumED [29].PSR can essentially be considered a measure of species richness that takes into account the phylogenetic relatedness of taxa in a community.If the community is comprised entirely of closely related species, this will produce a lower value of PSR than if the community were derived of distantly related taxa.In practice, this measure is obtained by multiplying the mean pairwise phylogenetic distance between species in a community by its species richness (and dividing by two, so that it represents distance to tips from the most recent common ancestor for each pair of species).For sumED, we first calculated the evolutionary distinctiveness of each species in our dataset, based on the entire phylogeny representing all species, following the fair proportions approach of Isaac et al. (2007).This is essentially a measure of how phylogenetically isolated each species is, relative to the given phylogeny.We then summed these evolutionary distinctiveness values for the species in each community, following [29].
As our overarching goal in this study was to quantify CLD over the full evolutionary time of the clade of study (here, seed plants), we developed an additional metric that may better capture this, which we term time integrated lineage diversity, or TILD.If one constructs a lineage through time (LTT) plot for each community (sensu [18]), one can simply integrate the area under this curve as a measure of the total lineage diversity of the community over time.In fact, this integral is mathematically identical to the phylogenetic diversity of the community, when including the root branch.However, in considering an LTT plot built from extant species, as LTT plots for extant communities are, they necessarily monotonically increase towards the present and, under a constant diversification rate, this increase is exponential.The integral therefore is necessarily weighted towards the number of lineages in recent evolutionary time compared to the number of lineages in deeper evolutionary time.Therefore, in order to downweight the number of recent lineages when calculating TILD, we log-transformed the y-axis (i.e. the number of lineages at each point in time) prior to taking the integral.

Statistical Analyses
There is wide variation in the number of individual trees sampled per plot (66 ± 65 inds, mean ± s.d.; range 1-706 inds.), and all of the lineage diversity metrics that we calculated, except sPD, were positively correlated with the number of individuals sampled (Pearson's r = 0.58 -0.67).In order to obtain comparable estimates of CLD, we rarefied communities to the same number of individuals.While rarefaction can be problematic because it excludes communities from analysis below the abundance threshold used and introduces heteroscedasticity in the diversity estimate that is related to the number of individuals sampled [31], we do not know of any estimates of CLD or phylogenetic diversity that are robust to variation in sample size (in terms of number of individuals that is robust to variation in sample size, it measures the divergence dimension of phylogenetic diversity, not the richness dimension [26], and is therefore not of interest to us here. In order to determine the number of individuals to select in rarefactions, we first selected the subset of communities that have at least 100 individuals (39,708 communities).We estimated the species richness of these communities when rarefacted to 100 individuals (i.e.expected number of species per 100 stems).We then rarefacted these communities to smaller numbers of individuals, and observed how the richness estimate for a smaller number of stems correlated with the richness estimate per 100 stems.Once communities were rarefacted to less than 25 stems, the correlation (pearson's r) between the two richness estimates dropped below 0.95.We therefore chose 25 individuals as the size for rarefied communities.We repeated rarefactions 100 times, and calculated the average of each CLD metric over these 100 rarefactions.
In order to assess the performance of each CLD metric over the evolutionary history of seed plants, we calculated the spearman's rank correlation (rho) between a given CLD metric and the number of lineages at different phylogenetic depths (in intervals of 5 Myrs between the present and the root of the seed plant phylogeny at 350 Ma).We used spearman's rank correlation because these relationships are not necessarily linear, and because our goal is to evaluate if communities would be ranked similarly, e.g. for conservation prioritisation, if counting the number of lineages at a particular time slice vs. using a given CLD metric.In order to obtain an overall measure of the performance of a lineage diversity metric, we then obtained the mean of the spearman's rho values at all phylogenetic depths.All analyses were carried out in the R Statistical Environment (R Development Core Team) using functions in the ape [33], picante [34], vegan [35] and PhyloMeasures [28] packages.The analysis script is available in Appendix A.

Results
The majority of communities sampled at least 25 individuals (124,566 of 177,549 communities or 70%), and were therefore included in analyses.We estimated CLD metrics for each community, including constructing lineage through time (LTT) plots for each community to calculate the time integrated lineage diversity (TILD).We show a sample of those LTT plots in Figure 3.The most species-poor communities show little accumulation of lineages over time (Fig. 3a), as might be expected given that they only have 1-2 species (12% of communities with at least 25 individuals have only 1 or 2 species).The most species-rich communities have anywhere from 9 to 17 species per 25 stems (although only 8 of those communities have more than 15 and are not represented in the random sample in Fig. 3c).Considering these species-rich communities, we see a wide range of variation in when these communities accumulated lineage diversity.Although few included more than four lineages before ∼120 Ma.At 150 Ma, there were twice as many gymnosperm as angiosperm lineages (six vs. three; Fig. 2).By 100 Ma, the picture had changed substantially, due almost entirely to the radiation of the Eudicots (Fig. 2).This radiation is evident in the LTT plots for the most species-rich communities (the big jump in number of lineages at ∼120 Ma; Fig. 3c), and to a lesser degree in the LTT plots for communities of average species richness (Fig. 3b).The second-largest step-change in the number of lineages present in communities occurs between 5 Ma and the present, and again the step-change is more pronounced in the most species-rich communities.and c) the most species-rich communities (above the 90% quantile).The dark black lines give the mean LTT for these subsets, while the horizontal dashed grey lines show the minimum and maximum species richness per 25 stems for these subsets.
Except for standardised phylogenetic diversity (sPD), the different CLD metrics we calculated are correlated with each other (Pearson's r = 0.61-0.96)and with the number of species in communities (Pearson's r = 0.59-0.76;Fig. S1).The correlation of species richness with the number of lineages declines with increasing phylogenetic depth, dropping to very low values prior to the radiation of the Eudicots (Fig. 4).The taxonomic measures of CLD all show a similar pattern; i.e. none show a strong correlation with number of lineages prior to ∼120 Ma.Because of this, none of the taxonomic measures of lineage diversity show a mean correlation over evolutionary time greater than 0.5.
The measures of lineage diversity derived from the molecular phylogeny are variable in their ability to consistently measure CLD over the full breadth of the evolutionary history of seed plants (Fig. 4).Neither sPD or sum of evolutionary distinctiveness (sumED) show high mean correlations, and these two metrics show contrasting patterns over phylogenetic depth.sPD is correlated well with the number of lineages deep in evolutionary time, while sumED shows a pattern more similar to taxonomic measures of lineage diversity.The best performing metric of lineage diversity is the newly-derived measure, time integrated lineage diversity (TILD), but phylogenetic diversity (PD) and phylogenetic species richness (PSR) also performed well.PD and PSR show stronger correlations with the number of lineages in recent evolutionary time, while TILD shows stronger correlations with the number of lineages in deeper evolutionary time.pattern of sPD contrasts strongly with that for species richness, which is not unexpected given that the two measures are negatively correlated (Fig. S1).The remaining phylogenetically-derived CLD metrics show spatial patterns generally similar to that for species richness, but with key areas of divergence.
There is more variation in CLD metrics in the western United States than there is in species richness.
Communities in the Great Lakes region shows higher CLD than communities immediately to their Meanwhile, most phylogenetic metrics of CLD show the region around the southern Appalachians to have higher CLD than regions to their north and west, whereas Fig. 5a shows them to have comparable species richness.

Discussion
Community lineage diversity (CLD) metrics derived from a phylogeny did a better job of capturing lineage diversity of tree communities over the full evolutionary depth of seed plants than did taxonomic metrics, such as the number of genera or families.The best performing metric was a new one, derived here, which we term time integrated lineage diversity (TILD).This metric, and all other phylogenetically-derived CLD metrics, give a different spatial pattern across the contiguous United States than that found for species richness.If prioritisation schemes were to be based solely on the number of tree species per community, the entire western half of the US would receive less conservation attention.This is because the most species-rich tree communities in the contiguous US are dominated by angiosperms, particularly Eudicots, and these are more diverse in the eastern US. represent substantial reservoirs of evolutionary history, as reflected in CLD values comparable to the most lineage-diverse tree communities in the eastern US.

Taxonomic measures of lineage diversity
In many studies [12,36], species richness has been shown to be strongly correlated with phylogenetic diversity (PD), and thus it has been argued that species richness may be a suitable proxy for CLD [37].Our study suggests that, at least for tree communities in the contiguous US, this is not the case.Higher-level taxonomic measures that we explored, specifically the numbers of genera, families and orders in communities, do not perform much better.As expected, as higher taxonomic ranks are used, strong correlations with number of lineages persist deeper into evolutionary time (compare the x-intercept of highest correlation for different taxonomic ranks in Fig. 3), but none of the taxonomic measures provide a high correlation with number of lineages prior to ∼120 Ma.This is perhaps unsurprising as the majority of lineages deeper in evolutionary time are gymnosperms, and all the gymnosperms in our dataset come from a single order, three families and 15 genera, while angiosperms dominate the variation in these taxonomic measures of CLD with 18 orders, 35 families and 68 genera.Thus, for clades with highly imbalanced phylogenies, as for seed plants, taxonomic measures of lineage diversity are not likely to provide an adequate, synthetic measure of CLD [38].

Phylogenetic measures of lineage diversity
The best performing metric in this study, in terms of maintaining a high correlation with number of lineages at all phylogenetic depths, was time integrated lineage diversity (TILD), a metric newly developed here.This metric represents the area beneath a lineage through time plot where the number of lineages per time slice has been log-transformed.TILD is mathematically related to PD, which is identical to the area beneath a raw (i.e., non-log-transformed) lineage through time plot.PD was originally conceived as a metric to aid conservation prioritisation [19], and it has always been properly interpreted as a measure of the total evolutionary diversity in communities, which is certainly worth quantifying.But, it is strongly skewed towards the number of lineages present in recent evolutionary time, downweighting older evolutionary divergences.We suggest that researchers may use PD to quantify CLD more recently in evolutionary time, and complementarily, TILD may be more suitable to obtain a measure of lineage diversity for older clades.The phylogenetic species richness (PSR; [27]) is strongly correlated with both TILD and PD (Fig. S1) and shows a similar spatial pattern to these two metrics (Fig. 5).However, as TILD and PD are both directly interpretable in terms of numbers of lineages, we advocate for their use as measures of lineage diversity.
For this dataset, the standardised phylogenetic diversity (sPD) correlates well with the number of lineages deep in evolutionary time, but not with numbers of lineages in recent evolutionary time.In fact, sPD is negatively correlated with numbers of lineages less than 70 million years old.We suggest that sPD may better serve as a metric of phylogenetic community structure, which is interesting in its own right [39], but that it should not be used as a measure of lineage diversity in communities.Our dataset appears to show a negative correlation between species richness and sPD, meaning that more species-rich communities are more phylogenetically clustered.This would argue against a role for competitive exclusion of closely related species in structuring tree communities in the contiguous US (c.f.results of Dexter et al. 2017 for the Amazon).However, caution is needed in interpreting this result, as sPD may artefactually decline with increasing species richness [41].Further, more focused studies of younger clades in single ecological regions have given the opposite result (e.g., [42]).
Like sPD, the sum of evolutionary distinctiveness in communities, sumED showed a weaker mean correlation with number of lineages over the full evolutionary history of seed plants than TILD, PD or PSR.In contrast to sPD, it showed weaker correlations deeper in evolutionary time and stronger relationships in shallower evolutionary time (Fig. 4).In fact, sumED showed a very similar pattern to taxonomic metrics of lineage diversity (Fig. 4), and of the phylogenetically-derived metrics in this study, it shows the strongest correlation with taxonomic metrics (Fig. S1).As a conservation prioritisation metric, sumED has a clear intuitive value, since it represents the totality of phylogenetic diversity in a given community that is rare in the entire dataset, but it will be sensitive to phylogenetic taxon sampling in the overall dataset, even if all taxa in a given community are present in the phylogeny (see Isaac et al. 2007 for full explanation of how ED is derived for each species).Conversely, PD and TILD will be insensitive to the sampling level of species not present in a given community and may therefore have more general utility.

Tree diversity patterns across the contiguous United States
In this study, we have quantified various measures of CLD at the alpha level for tree communities across the contiguous United States.Consistent with previous assessments [43,44], the most evident spatial contrast in the species richness pattern is between the eastern and western United States.To a very coarse approximation, this reflects the dominance of gymnosperms in the western United States, the dominance of angiosperms in the eastern United States, and the fact that angiosperms are a much more diverse clade than gymnosperms (even when focusing only on trees).It is notable that the differences in species richness between communities of average species richness versus the most species-rich communities arise almost entirely from diversification that happened after 120 Ma, i.e.
since the radiation of the Eudicots (Fig. 3).Eudicots represent the majority of species in our dataset (Fig. 2) and the large majority of species in our most species-rich communities.
Previous studies have identified the high plateau south of the Appalachian Mountains [43], and the Florida panhandle, Alabama/Georgia border region [44], as areas of maximal tree species richness.In contrast we found the highest tree species richness in a region centred on Kentucky and Tennessee, which corresponds to the mixed mesophytic forest region [45].The first part of the name reflects that there are no particularly dominant tree species in the region and most forest stands have a mix of dominant species.Braun (1950) recognised the exceptional tree species richness of this region and characterised it as "the association of the Deciduous Forest which occupies the area of optimum moisture and temperature conditions of North America" (p.42).Indeed, moisture stress for plants is lower in this region of the US compared to regions southeast of the Appalachians or the entire western US, while temperatures do not reach as extreme lows as the northern parts of the contiguous US.Meanwhile, as the second part of the name, mesophytic, implies, the forests here are also found on more fertile soils compared to other forests in the US.Thus, the high alpha diversity of tree communities in this region may reflect an environment that is the most benign for the majority of tree species occurring in the contiguous United States.This is similar to the pattern found in another large biogeographic region, the Amazon, where the most species-rich tree communities are found in the western Amazon, which has more fertile soils and is subject to less moisture stress than the southern or eastern Amazon [12].
The spatial pattern of variation in tree species richness does not show a standard latitudinal gradient, which would anticipate highest species richness in the areas closest to the equator.In the western US, this may be attributable to severe moisture stress, as much of the south-western US is desert and lacks trees.Meanwhile, the south-eastern US is bordered by the Gulf of Mexico, and many species that could likely now occur in this region went extinct during Pleistocene climatic cycles [46].
Thus, biogeographic history is also playing a role in determining the composition of the species pool that is available to colonise local communities.Nevada mountain range.Presumably the arid conditions on the eastern side of these mountains limit tree CLD.Though species richness minima to the east of the Sierra Nevada (as well as the Rockies) have previously been noted [43], this contrast across mountain ranges is most evident when considering CLD.Other regions of notable CLD in the western US include the Chiracahua mountains of southern Arizona and northern Idaho.
In the eastern United States as well, the spatial pattern of CLD contrasts with that of species richness, although there are regions that are high for both diversity measures (e.g. the southern Appalachians).The most north-eastern state in the contiguous US (Maine) as well as the area around the Great Lakes emerge as regions of high CLD, which is presumably due to the increasing prevalence of conifers in the far north and their deep evolutionary heritage.Meanwhile, the mixed mesophytic forest region that has the highest tree species richness values does not show the highest CLD values.
Higher CLD is found to the south and east of the mixed mesophytic forest region, and in the southern Appalachians and parts of the south-eastern Piedmont and coastal plain regions.The south-eastern United States was highlighted as a region of high angiosperm tree PD in a previous study based on range maps [48], and our results show this is consistent when using inventory data, and when incorporating gymnosperms into the quantification of overall seed plant CLD.The average age of angiosperm families in the southeastern US tree community is higher than anywhere else in the contiguous US [49] as this region, particularly the coastal plain, house many tree species (e.g., Sabal species [Arecaeae], textitPersea borbonia [Lauraceae], textitAnnona glabra [Annonaceae]) that belong to old, largely tropical families.This incursion of tropical lineages into the south-eastern US elevates CLD values in the south-eastern coastal plain.

Conclusions
Our study has shown that phylogenetically-derived metrics of community lineage diversity outperform taxonomically-derived metrics.Therefore, molecular phylogenies should play a key role in informing conservation prioritisation schemes.Thus, the continued effort to sequence all species in the tree of life receives clear support from our study.As for recommended metrics, the phylogenetic diversity of communities is a valid measure of CLD, and we do not suggest that the many previous studies to use the metric are invalid.Rather, we stress that it is biased towards the diversity of recently derived lineages, and we urge consideration of the newly developed metric TILD (time integrated lineage diversity), which may be more representative of CLD over the full evolutionary history of the clade of study.
We used an empirical dataset on tree communities of the contiguous United States to explore these different metrics of CLD.We found that the spatial patterns of CLD differ in important ways from the spatial patterns of species richness, for example by highlighting the high conservation value of temperate rainforests in the Pacific northwest.However, it would be naïve to suggest that conservationists in the United States were unaware of the high conservation value of those forests.
Indeed, the tree flora of the United States is likely well enough known, such that there is already good awareness of which areas have particularly high or low conservation value with respect to tree species composition and lineage diversity.Where these metrics may be particularly useful is in less well known floras, such as in many tropical biogeographic regions.There has been one study of variation in lineage diversity across ∼300 sites in the rain forests of the Amazon basin [12], but we know of no such similar study across the Congo Basin or southeast Asian rain forests, much less the more extensive tropical dry biomes.

Figure 1 .
Figure 1.Example phylogenies for four communities (A,B,C,D) with contrasting species richness (SR), phylogenetic diversity (PD) and phylogenetic community structure (LD70 = number of lineages 70 Ma; LD5 = number of lineages 5 Ma).Time units are arbitrary.The advent of molecular phylogenetics has allowed researchers to move past taxonomic approaches to quantifying CLD.Using a temporally calibrated phylogeny, one can choose a certain lineage age -X millions of years (Myrs) -and then readily estimate the number of lineages at X million years ago (Ma) in a community.Further, one could examine how the number of lineages varies at different time slices across a set of communities or geographic space (sensuJønsson et al. 2011).This is

Figure 2 .
Figure 2. Phylogeny of all tree species present in the contiguous United States in the US FIA dataset, based on the phylogenies for gymnosperms and angiosperms in Ma et al. (2016).

Figure 3 .
Figure 3. Lineage through time (LTT) plots for a sample of 1000 communities from a) the most species-poor communities (below 10% quantile in species richness per 25 stems), b) communities of average species richness (between the 45-55% quantiles) and c) the most species-rich communities (above the 90% quantile).The dark black lines give the mean LTT for these subsets, while the horizontal dashed grey lines show the minimum and maximum species richness per 25 stems for these subsets.

Figure 4 .
Figure 4. Spearman's rank correlations between different global measures of community lineage diversity and the number of lineages at different phylogenetic depths.The mean value of spearman's rho across all depths (excluding t = 0 and 350) is given above each plot.The phylogenetic depth at which the maximum correlation is found is marked with dashed lines going to the x and y-axes.The number of species per 25 trees in plots shows clear spatial patterns across the contiguous United States.Low values are generally observed west of the Mississippi River, while east of the Mississippi River, low values are observed in Florida and around the Great Lakes.The highest values are in the central part of the eastern United States, centred around Kentucky and Tennessee.The spatial

Figure 5 .
Figure 5. Spatial pattern of species richness and phylogenetically-derived community lineage diversity metrics per 25 trees for individual plots across the contiguous United States (n = 124,566 plots), with plots coloured according to their value.The legends gives the maximum and minimum values.

Author
Contributions: K.G.D. and R.A.S. conceived the manuscript and led analyses.A.G. contributed to analyses.All authors contributed to writing.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 2 February 2019 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 2 February 2019 doi:10.20944/preprints201902.0018.v1 Faith (1992)developed a simple metric, phylogenetic diversity (PD), to quantify the evolutionary history present in communities, which is calculated by summing the length of all branches in a phylogeny that includes all taxa present in a community, and only those taxa.While this metric is related to the number and age of evolutionary lineages present in a community, and thus may serve as a proxy for CLD, Figure1demonstrates that inferences of CLD based on calculating PD may not always be straightforward.In this contrived scenario, it seems clear that Community A has less CLD than Community B and that Community C has less CLD than Community D. The calculations of PD, and even species richness, would support this visual observation.Further, it seems plausible that Community A has more lineage diversity than Community C, even though Community C has more species.However, do Communities B and D really have identical CLD even though they have such a discrepancy in species richness?Comparing Communities B and D is challenging because they have such different phylogenetic structures.Community B has 4x as many lineages at 70 Ma, while

preprints.org) | NOT PEER-REVIEWED | Posted: 2 February 2019 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 February 2019 doi:10.20944/preprints201902.0018.v1
The spatial pattern of alpha CLD in tree communities shows several evident contrasts with the spatial pattern of species richness.With respect to tree species richness, the western US shows almost uniformly low values, at least in comparison to the eastern US, but CLD metrics show much greater variation.One evident hotspot of lineage diversity is in the temperate rain forests of the [47]fic northwest, with high CLD values then extending down the western flank of the Sierra Nevada mountains.This temperate rain forest region includes the 'Miracle Mile' in the Klamath mountains of northern California which holds 18 species of conifers[47], albeit not that many occur in any single FIA plot.Lineage diversity then plummets as one crosses either the Cascade mountain range or the Sierra Preprints (www.