Submitted:
24 April 2025
Posted:
24 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
2.1. Data Source and Sample Selection
2.2. VCF Processing and Genotype Extraction
2.3. Principal Component Analysis (PCA)
2.4. Site Frequency Spectrum
2.5. ADMIXTURE Analysis
2.6. FST Calculation
2.7. Multidimensional Scaling (MDS)
2.8. Phylogenetic Tree Construction
2.9. Software and Environment
3. Results
3.1. Principal Component Analysis (PCA)
3.2. Site Frequency Spectrum (SFS)
3.3. Population Structure via ADMIXTURE
3.4. Pairwise Genetic Differentiation (FST Matrix)
3.5. Multidimensional Scaling (MDS)
3.6. Phylogenetic Tree
4. Discussion
4.1. Continental and Regional Structure
4.2. Site Frequency Spectrum and Rare Variants
4.3. Ancestry Proportions and Admixture
4.4. Sex-Specific Analyses
4.5. Implications and Limitations
5. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Maddy-Weitzman, B. "Notes". The Berber Identity Movement and the Challenge to North African States, New York, USA: University of Texas Press, 2011, pp. 211-254. [CrossRef]
- Henn, B.M.; Botigué, L.R.; Gravel, S.; Wang, W.; Brisbin, A.; Byrnes, J.K.; Fadhlaoui-Zid, K.; Zalloua, P.A.; Moreno-Estrada, A.; Bertranpetit, J., Bustamante, C.D., Comas, D. Genomic Ancestry of North Africans Supports Back-to-Africa Migrations. PLoS Genetics, 2012, 8(1): e1002397. [CrossRef]
- Brisighelli, F.; Blanco-Verea, A.; Boschi, I.; Garagnani, P.; Pascali, V.L.; Carracedo, A.; Capelli, C.; Salas, A. Patterns of Y-STR variation in Italy. Forensic Science International: Genetics, 2012, 6, 6, 834-839. [CrossRef]
- Arauna, L.R.; Mendoza-Revilla, J.; Mas-Sandoval, A.; Izaabel, H.; Bekada, A.; Benhamamouch, S.; Fadhlaoui-Zid, K.; Zalloua, P.; Hellenthal, G.; Comas, D. Recent Historical Migrations Have Shaped the Gene Pool of Arabs and Berbers in North Africa. Molecular Biology and Evolution, 2017, 34(2), 318–329. [CrossRef]
- Fadhlaoui-Zid, K., Martínez-Cruz, B., Khodjet-el-khil, H., Mendizabal, I.; Benammar-Elgaaied, A.; Comas, D. Genetic structure of Tunisian ethnic groups revealed by paternal lineages. American Journal of Physical Anthropology, 2011, 146(2), 271–280. [CrossRef]
- Fadhlaoui-Zid, K.; Haber, M.; Martínez-Cruz, B.; Zalloua, P.; Benammar-Elgaaied, A.; Comas, D. (2013) Genome-Wide and Paternal Diversity Reveal a Recent Origin of Human Populations in North Africa. PLoS ONE, 2013, 8(11): e80293. [CrossRef]
- Bekada, A.; Fregel, R.; Cabrera, V.M.; Larruga, J.M.; Pestano, J.; Benhamamouch, S.; Gonzalez, A.M. Introducing the Algerian Mitochondrial DNA and Y-Chromosome Profiles into the North African Landscape. PLoS ONE, 2013, 8(2): e56775. [CrossRef]
- Bekada, A.; Arauna, L.R.; Deba, T.; Calafell, F.; Benhamamouch, S.; Comas, D. Genetic Heterogeneity in Algerian Human Populations. PLoS ONE, 2015, 10(9): e0138453. [CrossRef]
- The 1000 Genomes Project Consortium. A Global Reference for Human Genetic Variation. Nature 2015, 526, 68–74. [CrossRef]
- Arredi, B.; Poloni, E.S.; Paracchini, S.; Zerjal, T.; Fathallah, D.M.; Makrelouf, M.; Pascali, V.L.; Novelletto, A.; Tyler-Smith, C. A Predominantly Neolithic Origin for Y-Chromosomal DNA Variation in North Africa. Am. J. Hum. Genet., 2004, 75, 338–345. [CrossRef]
- Botigué, L.R.; Henn, B.M.; Gravel, S.; Maples, B.K.; Gignoux, C.R.; Corona, E.; Atzmon, G.; Burns, E.; Ostrer, H.; Flores, C.; Bertranpetit, J.; Comas D.; Bustamante, C.D. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. PNAS, 2013, 110(29), 11791-11796. [CrossRef]
- Semino, O.; Magri, C.; Benuzzi, G.; Lin, A.A.; Al-Zahery, N.; Battaglia, V.; Maccioni, L.; Triantaphyllidis, C.; Shen, P.; Oefner, P.J.; Zhivotovsky, L.A.; King, R.; Torroni, A.; Cavalli-Sforza, L.L.; Underhill, P.A.; Santachiara-Benerecetti, A.S. Origin, Diffusion, and Differentiation of Y-Chromosome Haplogroups E and J: Inferences on the Neolithization of Europe and Later Migratory Events in the Mediterranean Area. Am. J. Hum. Genet., 2004, 74, 1023–1034. [CrossRef]
- Cruciani, F.; La Fratta, R.; Santolamazza, P.; Sellitto, D.; Pascone, R.; Moral, P.; Watson, E.; Guida, V.; Colomb, E.B.; Zaharova, B.; Lavinha, J.; Vona, G.; Aman, R.; Calì, F.; Akar, N.; Richards, M.; Torroni, A.; Novelletto, A.; Scozzari, R. Phylogeographic Analysis of Haplogroup E3b (E-M215) Y Chromosomes Reveals Multiple Migratory Events Within and Out Of Africa. The American Journal of Human Genetics, 2004, 74(5), 1014–1022. [CrossRef]
- Montinaro, F.; Busby, G.B.J.; Pascali, V.L.; Myers, S.; Hellenthal, G.; Capelli, C. Unravelling the hidden ancestry of American admixed populations. Nature Communications, 2015, 6, 6596. [CrossRef]
- Martin, A.R.; Gignoux, C.R.; Walters, R.K.; Wojcik, G.L.; Neale, B.M.; Gravel, S.; Daly, M.J., Bustamante, C.D.; Kenny, E.E. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. The American Journal of Human Genetics, 2017, 100, 4, 635-649. [CrossRef]
- McVean, G. A genealogical interpretation of principal components analysis. PLoS Genetics, 2009, 5(10), e1000686. [CrossRef]
- Novembre, J.; Ramachandran, S. Perspectives on Human Population Structure at the Cusp of the Sequencing Era. Annual Reviews of Genomics and Human Genetics, 2011, 12, 245-274. [CrossRef]
- Alexander, D.H.; Novembre, J.; Lange, K. Fast Model-Based Estimation of Ancestry in Unrelated Individuals. Genome Research, 2009, 19, 1655–1664. [CrossRef]
- Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; Li, H. Twelve years of SAMtools and BCFtools, GigaScience, 2021, 10, 2, giab008. [CrossRef]
- Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res., 2011;12:2825-2830. https://jmlr.org/papers/v12/pedregosa11a.html.
- Weir, B.S.; Cockerham, C.C. Estimating F-Statistics for the Analysis of Population Structure. Evolution, 1984, 38, 6, 1358–1370. [CrossRef]
- Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; McVean, G.; Durbin, R. 1000 Genomes Project Analysis Group, The variant call format and VCFtools. Bioinformatics, 2011, 27, 15, 2156–2158, . [CrossRef]
- Miles, A., Ralph, P., Rae, S., Pisupati, R. cggh/scikit-allel: v1.2.1. Zenodo, 2019. https://zenodo.org/record/3238280.
- Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.R.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.W.; Daly, M.J.; Sham, P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 2007, 81(3), 559–575. [CrossRef]
- Harris CR, Millman KJ, van der Walt SJ, et al. Array programming with NumPy. Nature, 2020, 585(7825):357-362. [CrossRef]
- McKinney W. Data structures for statistical computing in Python. In: Proceedings of the 9th Python in Science Conference. Austin, TX; 2010, 51-56. [CrossRef]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Computing in Science and Engineering, 2007, 9(3):90-95. [CrossRef]
- Waskom ML. Seaborn: statistical data visualization. J. Open Source Software, 2021, 6(60):3021. [CrossRef]
- Merkel, D. Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014, 239, 2.
- Kluyver, T.; Ragan-Kelley, B.; Pérez, F.; Granger, B.; Bussonier, M.; Frederic, J.; Kelly, K.; Hamrick, J.; Grout, J.; Corlay, S.; Ivanov, P.; Avila, D.; Abdalla, S.; Willing, C.. Jupyter Notebooks – a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B, eds. Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press, 2016, 87-90. [CrossRef]
- Patterson, N.; Price, A.L.; Reich, D. Population Structure and Eigenanalysis. PLoS Genetics, 2006, 2, e190. [CrossRef]
- Keinan, A.; Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science, 2012, 336, 6082, 740-743. [CrossRef]
- Reich, D.; Thangaraj, K., Patterson, N.; Price, A.L.; Singh L. Reconstructing Indian population history. Nature, 2009, 461, 489–494. [CrossRef]
- Tishkoff, S.A.; Reed, F.A.; Friedlaender, F.R.; Ehret, C.; Ranciaro, A.; Froment, A.; Hirbo, J.B.; Awomoyi, A.A.; Bodo, J.M.; Doumbo, O.; Ibrahim, M.; Juma, A.T.; Kotze, M.J.; Lema, G.; Moore, J.H.; Mortensen, H.; Nyambo, T.B.; Omar, S.A.; Powell, K.; Pretorius, G.S.; Smith, M.W.; Thera, M.A.; Wambebe, C.; Weber, J.L.; Williams, S.M. The genetic structure and history of Africans and African Americans. Science, 2009, 324, 5930, 1035–1044. [CrossRef]
- Pagani, L.; Lawson, D.; Jagoda, E.; Morseburg, A.; Ericsson, A.; et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature, 2016, 538, 238–242. [CrossRef]
- Novembre, J.; Johnson, T.; Bryc, K.; Kutalik, Z.; Boyko, A.R.; Auton, A.; Indap, A.; King, K.S.; Bergmann, S.; Nelson, M.R.; Stephens, M.; Bustamante, C.D. Genes mirror geography within Europe. Nature, 2008, 456, 98–101. [CrossRef]
- Lao, O.; Lu, T.T.; Nothnagel, M.; Junge, O.; Freitag-Wolf, S.; Caliebe, A.; Balascakova, M.; Bertranpetit, J.; Bindoff, L.A.; Comas, D.; Holmlund, G.; Kouvatsi, K.; Macek, M.; Mollet, I.; Parson, W.; et al. Correlation between genetic and geographic structure in Europe. Current Biology, 2008, 18(16), 1241–1248. [CrossRef]
- Rosenberg, N. A.; Pritchard, J.K.; Webber, J.L.; Cann, H.M.; Kidd, K.K.; Zhivotovsky, L.A.; Feldman, M.W. Genetic structure of human populations. Science, 2002, 298(5602), 2381–2385. [CrossRef]
- Rosenberg, N.A.; Mahajan, S.; Ramachandran, S.; Zhao, C.; Pritchard, J.K.; Feldman, M.W. Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. PLoS Genetics, 2005, 1, e70. [CrossRef]
- Capocasa, M.; Anagnostou, P.; Bachis, V.; Battaggia, C.; et al. Linguistic, geographic and genetic isolation: a collaborative study of Italian populations. Journal of Anthropological Sciences, 2014, 92, 201-231. [CrossRef]
- Tennessen, J.A.; Bigham, A.W.; O'Connor, T.D.; Fu, W.; Kenny, E.E.; Gravel, S.; et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 2012, 337(6090), 64–69. [CrossRef]
- Tishkoff, S.A.; Williams, S.M. Genetic analysis of African populations: human evolution and complex disease. Nature Reviews Genetics, 2002, 3(8), 611–621. [CrossRef]
- Gurdasani, D.; Carstensen, T.; Tekola-Ayele, F.; Pagani, L.; Tachmazidou, I.; et al. The African Genome Variation Project shapes medical genetics in Africa. Nature, 2015, 517, 327–332. [CrossRef]
- Nelson, M. R.; Bryce, K.; King, K.S.; Indian, A.; Boyko, A.R.; et al. The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. American Journal of Human Genetics, 2008, 83(3), 347–358. [CrossRef]
- Fiorito, G.; Di Gaetano, C., Guarrera, S.; Guarrera, S.; Rosa, F.; Feldman, M.W.; Piazza, A.; Matullo, G. The Italian genome reflects the history of Europe and the Mediterranean basin. European Journal of Human Genetics, 2016, 24, 1056–1062. [CrossRef]
- Mathieson, I.; Lazaridis, I.; Rohland, N.; Mallick, S.; Patterson, N.; Alpaslan, S. Genome-wide patterns of selection in 230 ancient Eurasians. Nature, 2015, 528, 499–503. [CrossRef]
- Goldberg, A.; Gunter, T.; Rosenberg, N.A; Jacobson, M. Ancient X chromosomes reveal contrasting sex bias in Neolithic and Bronze Age Eurasian migrations. PNAS, 2017, 114(10), 2657–2662. [CrossRef]








| IBS | TSI | CEU | YRI | |
| IBS | 0 | 0.0016 | 0.003 | 0.1394 |
| TSI | 0 | 0 | 0.0029 | 0.1372 |
| CEU | 0 | 0 | 0 | 0.1412 |
| YRI | 0 | 0 | 0 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).