Submitted:
06 March 2026
Posted:
09 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Dataset Preparation
2.2. A Mathematical Approach to Compute FNR for Low Coverage Samples
2.2.1. Hausdorff Distance
2.2.2. Discrete Fréchet Distance
2.2.3. A Combination of Both Methods
2.3. FNR and Heterozygosity as a Function of Coverage
2.4. Sum-of-Least-Squares Assessment of Goodness-of-Fit
3. Results
3.1. PSMC-FAC Enables Accurate FNR-Based Correction Across Species and Coverages
3.2. Appropriate FNR-Based Correction Depends on Recent Demographic History
3.3. FNR Corrections Are Robust Across Diverse Demographic Histories
4. Discussion
4.1. FNR Correction in Low- and Mid-Depth Genomes: Reference Genome Effect
4.2. Effects of Biases Introduced by PSMC Assumptions on Optimal FNR Calculation
4.3. Polynomial Relationship Between Coverage and Optimal FNR
4.4. Empirical Applications and Future Implications
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| PSMC | Pairwise Sequentially Markovian Coalescent |
| FNR | False-Negative Rate |
| PSMC-FAC | PSMC False-Negative Rate Automatized Correction |
| LD | Linear dichroism |
| WGS | Whole-Genome Sequencing |
| Ne | Effective Population Size |
| TMRCA | Time to the Most Recent Common Ancestor |
| ARG | Ancestral Recombination Graph |
| HMM | Hidden Markov Model |
| SMC | Sequentially Markov Coalescent |
| MSMC | Multiple Sequentially Markovian Coalescent |
| RAD | Restriction Site Associated DNA |
| BAM | Binary Alignment/Map format |
| CRAM | Compressed Reference-oriented Alignment Map format |
| VCF | Variant Call Format |
| PSMCFA | PSMC FASTA-like input format |
| PCR | Polymerase Chain Reaction |
| SSE | Sum of Squared Errors |
| CHB | Han Chinese in Beijing, China (1000 Genomes Project population) |
| YRI | Yoruba in Ibadan, Nigeria (1000 Genomes Project population) |
| TSI | Toscani in Italia (1000 Genomes Project population) |
| 1000GP | 1000 Genomes Project |
| ARS-USDA | Agricultural Research Service, United States Department of Agriculture |
| GRCh38 | Genome Reference Consortium Human Build 38 |
| Hg38 | Human Genome version 38 |
| CanFam3.1 | Dog Reference Genome Assembly Version 3.1 |
| BosTau9 | Cattle Reference Genome Assembly Version 9 |
| kya | Thousand Years Ago |
| Mya | Million Years Ago |
| DNA | Deoxyribonucleic Acid |
| R² | Coefficient of Determination |
Appendix A: Computational Workflow for PSMC Processing and FNR Estimation Using PSMC-FAC
A.1: Preparation of PSMC Input Files
A2: Usage of PSMC-FAC
A3: Polynomial Regression and Visualization of Coverage-FNR Relationships.
A4: Plotting Other Low-Coverage Genomes According to PSMC-FAC-assisted FNR Correction:
Appendix B





| Common name | Scientific name | Population | Sample number | Coverage | Heterozygosity | Source |
|---|---|---|---|---|---|---|
| Cow | Bos taurus | Angus breed | 19879801 | 19.18X | 2.38*10⁻³ | (A) |
| Brangus breed | 19999911 | 39.47X | 3.98*10⁻³ | (A) | ||
| Beefmaster breed | 19999927 | 31.11X | 3.92*10⁻³ | (A) | ||
| Grey wolf | Canis lupus |
C.l.italicus (Italian wolf) |
SAMEA116045429 | 23.67X | 1.68*10⁻³ | (B) |
| SAMEA116045431 | 27.19X | 1.48*10⁻³ | (B) | |||
| SAMEA116045435 | 26.08X | 1.44*10⁻³ | (B) | |||
| C.l.signatus (Iberian wolf) | SAMN43221691 | 20.42X | 1.87*10⁻³ | (C) | ||
| SAMN43221682 | 19.03X | 1.81*10⁻³ | (C) | |||
| SAMN04851099 | 18.08X | 1.88*10⁻³ | (C) | |||
| Human | H. sapiens | Han from China (CHB) | NA18543 | 29.59X | 1*10⁻³ | (D) |
| NA18544 | 29.53X | 9.82*10⁻⁴ | (D) | |||
| NA18559 | 33.34X | 9.89*10⁻⁴ | (D) | |||
| Yoruba from Nigeria (YRI) | NA18867 | 30.18X | 1.32*10⁻³ | (D) | ||
| NA18924 | 31.27X | 1.32*10⁻³ | (D) | |||
| NA19096 | 31.35X | 1.32*10⁻³ | (D) | |||
| Toscani from Italy (TSI) | NA20754 | 31.63X | 1.04*10⁻³ | (D) | ||
| NA20759 | 32.65X | 1.05*10⁻³ | (D) | |||
| NA20766 | 29.95X | 1.03*10⁻³ | (D) |
References
- Aimé, C.; Verdu, P.; Ségurel, L.; et al. Microsatellite data show recent demographic expansions in sedentary but not in nomadic human populations in Africa and Eurasia. European Journal of Human Genetics 2014, 22, 1201–1207. [CrossRef]
- Miller, E.F.; Manica, A.; Amos, W. Global demographic history of human populations inferred from whole mitochondrial genomes. Royal Society Open Science 2018, 5(8), 180543. [CrossRef]
- Eddine, A.; Gomes Rocha, R.; Mostefai, N.; Karssene, Y.; De Smet, K.; Brito, J.C.; Klees, D.; Nowak, C.; Cocchiararo, B.; Lopes, S.; et al. Demographic expansion of an African opportunistic carnivore during the Neolithic revolution. Biology Letters 2020, 16(1), 20190560. [CrossRef]
- Csapó, H.; Jabłońska, A.; Węsławski, J.M.; Mieszkowska, N.; Gantsevich, M.; Dahl-Hansen, I.; Renaud, P.; Grabowski, M. mtDNA data reveal disparate population structures and High Arctic colonization patterns in three intertidal invertebrates with contrasting life history traits. Frontiers in Marine Science 2023, 10, 1275320. [CrossRef]
- Li, H.; Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 2011, 475(7357), 493–496. [CrossRef]
- MacLeod, I.M.; Larkin, D.M.; Lewin, H.A.; Hayes, B.J.; Goddard, M.E. Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors. Molecular Biology and Evolution 2013, 30(9), 2209–2223. [CrossRef]
- Kim, H.; Ratan, A.; Perry, G.H.; Montenegro, A.; Miller, W.; Schuster, S.C. Khoisan hunter-gatherers have been the largest population throughout most of modern-human demographic history. Nature Communications 2014, 5. [CrossRef]
- Hawkins, M.T.R.; Culligan, R.R.; Frasier, C.L.; Dikow, R.B.; Hagenson, R.; Lei, R.; Louis, E.E. Genome sequence and population declines in the critically endangered greater bamboo lemur (Prolemur simus) and implications for conservation. BMC Genomics 2018, 19(1), 1–15. [CrossRef]
- Nadachowska-Brzyska, K.; Burri, R.; Smeds, L.; Ellegren, H. PSMC analysis of effective population sizes in molecular ecology and its application to black-and-white Ficedula flycatchers. Molecular Ecology 2016, 25(5), 1058–1072. [CrossRef]
- Kingman, J.F.C. On the genealogy of large populations. Journal of Applied Probability 1982, 19(A), 27–43. [CrossRef]
- Wakeley, J. Developments in coalescent theory from single loci to chromosomes. Theoretical Population Biology 2020, 133, 56–64. [CrossRef]
- McVean, G.A.T.; Cardin, N.J. Approximating the coalescent with recombination. Philosophical Transactions of the Royal Society B 2005, 360(1459), 1387. [CrossRef]
- Wiuf, C.; Hein, J. Recombination as a Point Process along Sequences. Theoretical Population Biology 1999, 55(3), 248–259. [CrossRef]
- Mather, N.; Traves, S.M.; Ho, S.Y.W. A practical introduction to sequentially Markovian coalescent methods for estimating demographic history from genomic data. Ecology and Evolution 2020, 10(1), 579–589. [CrossRef]
- Peede, D.; Bañuelos, M.M.; Medina Tretmanis, J.; Miyagi, M.; Huerta-Sánchez, E. Recent advances in methods to characterize archaic introgression in modern humans. Genome Research 2026, 36(2), 239–256. https://www.genome.org/cgi/doi/10.1101/gr.278993.124.
- Sellinger, T.P.P.; Abu-Awad, D.; Tellier, A. Limits and convergence properties of the sequentially Markovian coalescent. Molecular Ecology Resources 2021, 21(7), 2231–2248. [CrossRef]
- Cousins, T.; Tabin, D.; Patterson, N.; Reich, D.; Durvasula, A. Accurate inference of population history in the presence of background selection. BioRxiv 2024. [CrossRef]
- Mazet, O.; Rodríguez, W.; Grusea, S.; et al. On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? Heredity 2016, 116, 362–371. [CrossRef]
- Chikhi, L.; Rodríguez, W.; Grusea, S.; Santos, P.; Boitard, S.; Mazet, O. The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity. Heredity 2018, 120(1), 13–24. [CrossRef]
- Nieto, A.; Lao, O.; Mona, S. Performance of Sequential Markovian Coalescence Methods when Populations are Structured. BioRxiv 2025. [CrossRef]
- Hilgers, L.; Liu, S.; Jensen, A.; Brown, T.; Cousins, T.; Schweiger, R.; Guschanski, K.; Hiller, M. Avoidable false PSMC population size peaks occur across numerous studies. Current Biology 2025, 35(4), 927–930.e3. [CrossRef]
- Schiffels, S.; Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nature Genetics 2014, 46, 919–925. [CrossRef]
- Terhorst, J.; Kamm, J.A.; Song, Y.S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nature Genetics 2016, 49(2), 303–309. [CrossRef]
- Cousins, T.; Scally, A.; Durbin, R. A structured coalescent model reveals deep ancestral structure shared by all modern humans. Nature Genetics 2025, 57, 856–864. [CrossRef]
- Hey, J.; Machado, C.A. The study of structured populations—new hope for a difficult and divided science. Nature Reviews Genetics 2003, 4(7), 535–543. [CrossRef]
- Pritchard, J.K.; Stephens, M.; Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 2000, 155(2), 945–959. [CrossRef]
- Sarabia, C.; vonHoldt, B.; Larrasoaña, J.C.; Uríos, V.; Leonard, J.A. Pleistocene climate fluctuations drove demographic history of African golden wolves (Canis lupaster). Molecular Ecology 2021, 30(23), 6101–6120. [CrossRef]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genomes Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16), 2078–2079. [CrossRef]
- Lindblad-Toh, K.; Wade, C.; Mikkelsen, T.; et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 2005, 438(7069), 803–819. [CrossRef]
- Li, H.; Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 2010, 26(5), 589–595. [CrossRef]
- Bonfield, J.K. CRAM 3.1: advances in the CRAM file format. Bioinformatics 2022, 38(6), 1497–1503. [CrossRef]
- 1000 Genomes Project Consortium; Auton, A.; Brooks, L.D.; Durbin, R.M.; Garrison, E.P.; Kang, H.M.; et al. A global reference for human genetic variation. Nature 2015, 526(7571), 68–74. [CrossRef]
- Li, H.; Handsaker, B.; Wysoker, A.; et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16), 2078–2079. [CrossRef]
- USDA Agricultural Research Service (ARS). Bovine reference genome and whole-genome sequencing data. Agricultural Research Service, U.S. Department of Agriculture. Accessed November 2025. https://www.ars.usda.gov/plains-area/clay-center-ne/marc/wgs/bovref/.
- Heaton, M.P.; Smith, T.P.L.; Carnahan, J.K.; Basnayake, V.; Qiu, J.; Simpson, B.; Kalbfleisch, T.S. Using diverse U.S. beef cattle genomes to identify missense mutations in EPAS1, a gene associated with high-altitude pulmonary hypertension. F1000Research 2016, 5, 2003. [CrossRef]
- Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; Li, H. Twelve years of SAMtools and BCFtools. GigaScience 2021, 10(2), giab008. [CrossRef]
- Schneider, V.A.; Graves-Lindsay, T.; Howe, K.; Bouk, N.; Chen, H.C.; Kitts, P.A.; et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research 2017, 27(5), 849–864. [CrossRef]
- Rosen, B.D.; Bickhart, D.M.; Schnabel, R.D.; Koren, S.; Elsik, C.G.; Tseng, E.; et al. De novo assembly of the cattle reference genome with single-molecule sequencing. GigaScience 2020, 9(3), giaa021. [CrossRef]
- Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27(15), 2156–2158. [CrossRef]
- Freedman, A.H.; Gronau, I.; Schweizer, R.M.; Ortega-Del Vecchyo, D.; Han, E.; Silva, P.M.; et al. Genome sequencing highlights the dynamic early history of dogs. PLoS Genetics 2014, 10(1), e1004016. [CrossRef]
- Mei, C.; Wang, H.; Liao, Q.; Wang, L.; Cheng, G.; Wang, H.; et al. Genetic Architecture and Selection of Chinese Cattle Revealed by Whole Genome Resequencing. Molecular Biology and Evolution 2018, 35(3), 688–699. [CrossRef]
- Liu, X.; Li, Z.; Yan, Y.; Li, Y.; Wu, H.; Pei, J.; et al. Selection and introgression facilitated the adaptation of Chinese native endangered cattle in extreme environments. Evolutionary Applications 2020, 14(3), 860–873. [CrossRef]
- Alt, H.; Behrends, B.; Blömer, J. Approximate matching of polygonal shapes. Annals of Mathematics and Artificial Intelligence 1995, 13, 251–265. [CrossRef]
- Ahn, H.K.; Knauer, C.; Scherfenberg, M.; Schlipf, L.; Vigneron, A. Computing the discrete Fréchet distance with imprecise input. Lecture Notes in Computer Science 2010, 6507, 422–433. [CrossRef]
- Fuentes-Pardo, A.P.; Ruzzante, D.E. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations. Molecular Ecology 2017, 26(20), 5369–5406. [CrossRef]
- Buerkle, C.A.; Gompert, Z. Population genomics based on low coverage sequencing: How low should we go? Molecular Ecology 2013, 22(11), 3028–3035. [CrossRef]
- Hermosilla-Albala, N.; Silva, F.E.; Cuadros-Espinoza, S.; et al. Whole genomes of Amazonian uakari monkeys reveal complex connectivity and fast differentiation driven by high environmental dynamism. Communications Biology 2024, 7, 1283. [CrossRef]
- Liu, S.; Hansen, M.M. PSMC analysis of RAD sequencing data. Molecular Ecology Resources 2017, 17(4), 631–641. [CrossRef]
- Pan, B.; Kusko, R.; Xiao, W.; et al. Similarities and differences between variants called with human reference genome HG19 or HG38. BMC Bioinformatics 2019, 20(Suppl 2), 101. [CrossRef]
- Günther, T.; Nettelblad, C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genetics 2019, 15(7), e1008302. [CrossRef]
- Bergström, A.; Stanton, D.W.G.; Taron, U.H.; et al. Grey wolf genomic history reveals a dual ancestry of dogs. Nature 2022, 607, 313–320. [CrossRef]
- Battilani, D.; Gargiulo, R.; Caniglia, R.; et al. Beyond population size: Whole-genome data reveal bottleneck legacies in the peninsular Italian wolf. Journal of Heredity 2025, 116(1), 10–23. [CrossRef]
- Tournebize, R.; Chikhi, L. Ignoring population structure in hominin evolutionary models can lead to the inference of spurious admixture events. Nature Ecology & Evolution 2025, 9, 225–236. [CrossRef]
- Cahill, J.A.; Soares, A.E.; Green, R.E.; Shapiro, B. Inferring species divergence times using pairwise sequentially Markovian coalescent modelling and low-coverage genomic data. Philosophical Transactions of the Royal Society B 2016, 371(1699), 20150138. [CrossRef]
- Patton, A.H.; Margres, M.J.; Stahlke, A.R.; et al. Contemporary demographic reconstruction methods are robust to genome assembly quality: A case study in Tasmanian devils. Molecular Biology and Evolution 2019, 36(12), 2906–2921. [CrossRef]
- Peede, D.; Cousins, T.; Durvasula, A.; et al. Not Just Ne No More: New Applications for SMC from Ecology to Phylogenies. Genome Biology and Evolution 2026, 18(1), evaf229. [CrossRef]



Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).