Version 1
: Received: 8 December 2020 / Approved: 9 December 2020 / Online: 9 December 2020 (09:53:18 CET)
Version 2
: Received: 28 January 2021 / Approved: 28 January 2021 / Online: 28 January 2021 (11:31:23 CET)
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,8,29]],"date-time":"2022-08-29T11:50:33Z","timestamp":1661773833381},"reference-count":48,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T00:00:00Z","timestamp":1617580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003173","name":"Crafoordska Stiftelsen","doi-asserted-by":"publisher","award":["2020"]},{"DOI":"10.13039\/501100001862","name":"Svenska Forskningsr\u00e5det Formas","doi-asserted-by":"publisher","award":["2020-03485"]},{"name":"Erik Philip-S\u00f6rensen Foundation","award":["G2020-011"]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1949939"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Genes"],"abstract":"In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled \u201cSoft sweeps are the dominant mode of adaptation in the human genome\u201d (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863\u20131877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366\u20131371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern\u2019s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S\/HIC) should be taken with a huge shovel of salt.<\/jats:p>","DOI":"10.3390\/genes12040527","type":"journal-article","created":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T15:48:29Z","timestamp":1617637709000},"page":"527","source":"Crossref","is-referenced-by-count":2,"title":["On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn\u2019t"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-4795-1084","authenticated-orcid":false,"given":"Eran","family":"Elhaik","sequence":"first","affiliation":[]},{"given":"Dan","family":"Graur","sequence":"additional","affiliation":[]}],"member":"1968","published-online":{"date-parts":[[2021,4,5]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms6281"},{"key":"ref2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1007859"},{"key":"ref3","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1005928"},{"key":"ref4","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msx154"},{"key":"ref5","doi-asserted-by":"publisher","DOI":"10.1534\/g3.118.200262"},{"key":"ref6","doi-asserted-by":"publisher","DOI":"10.1016\/j.tig.2017.12.005"},{"key":"ref7","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy092"},{"key":"ref8","doi-asserted-by":"publisher","DOI":"10.1038\/217624a0"},{"key":"ref9","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaa8415"},{"key":"ref10","doi-asserted-by":"publisher","DOI":"10.1038\/nrg3920"},{"key":"ref11","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth343"},{"key":"ref12","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2012.02.005"},{"key":"ref13","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2018.00297"},{"key":"ref14","doi-asserted-by":"publisher","DOI":"10.3390\/genes11090985"},{"key":"ref15","doi-asserted-by":"publisher","DOI":"10.2174\/0929867324666170623092503"},{"key":"ref16","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-017-1280-5"},{"key":"ref17","doi-asserted-by":"publisher","DOI":"10.1101\/2020.04.21.053629"},{"key":"ref18","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy107"},{"key":"ref19","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msaa259"},{"key":"ref20","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1093\/genetics\/157.2.875","article-title":"Haldane\u2019s sieve and adaptation from the standing genetic variation","volume":"157","author":"Orr","year":"2001","journal-title":"Genetics"},{"key":"ref21","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.104.036947"},{"key":"ref22","doi-asserted-by":"publisher","DOI":"10.1101\/gr.094052.109"},{"key":"ref23","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","year":"2015","journal-title":"Nature"},{"key":"ref24","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-05257-7"},{"key":"ref25","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkt1052"},{"key":"ref26","unstructured":"Why Scientific Studies are so Often Wrong: The Streetlight Effecthttps:\/\/www.discovermagazine.com\/the-sciences\/why-scientific-studies-are-so-often-wrong-the-streetlight-effect"},{"key":"ref27","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw556"},{"key":"ref28","doi-asserted-by":"publisher","DOI":"10.1038\/nature10231"},{"key":"ref29","series-title":"Population Biology of Plant Pathogens: Genetics, Ecology, and Evolution","first-page":"59","article-title":"Chapter 4: Mutation and Random Genetic Drift","author":"Milgroom","year":"2015"},{"key":"ref30","doi-asserted-by":"publisher","DOI":"10.1038\/nature11396"},{"key":"ref31","doi-asserted-by":"publisher","DOI":"10.1038\/nrg2526"},{"key":"ref32","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.166.3.1375"},{"key":"ref33","doi-asserted-by":"publisher","DOI":"10.1101\/gr.6023607"},{"key":"ref34","doi-asserted-by":"publisher","DOI":"10.1101\/gr.119636.110"},{"key":"ref35","doi-asserted-by":"publisher","DOI":"10.1086\/505436"},{"key":"ref36","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.genom.9.081307.164420"},{"key":"ref37","doi-asserted-by":"publisher","DOI":"10.1086\/377138"},{"key":"ref38","doi-asserted-by":"publisher","DOI":"10.1038\/533452a"},{"key":"ref39","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.118.301502"},{"key":"ref40","doi-asserted-by":"publisher","DOI":"10.1016\/j.cub.2009.11.055"},{"key":"ref41","doi-asserted-by":"publisher","DOI":"10.1038\/nature11247"},{"key":"ref42","doi-asserted-by":"publisher","DOI":"10.1093\/gbe\/evt028"},{"key":"ref43","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1075"},{"key":"ref44","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1179"},{"key":"ref45","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky930"},{"key":"ref46","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky311"},{"key":"ref47","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1720798115"},{"key":"ref48","doi-asserted-by":"publisher","DOI":"10.1002\/1521-1878(200101)23:1<104::AID-BIES1013>3.0.CO;2-2"}],"container-title":["Genes"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,1]],"date-time":"2021-05-01T04:20:02Z","timestamp":1619842802000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,5]]},"references-count":48,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["genes12040527"],"URL":"http:\/\/dx.doi.org\/10.3390\/genes12040527","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202012.0214.v2","asserted-by":"object"},{"id-type":"doi","id":"10.20944\/preprints202012.0214.v1","asserted-by":"object"}]},"ISSN":["2073-4425"],"issn-type":[{"value":"2073-4425","type":"electronic"}],"subject":["Genetics (clinical)","Genetics"],"published":{"date-parts":[[2021,4,5]]}}}
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,8,29]],"date-time":"2022-08-29T11:50:33Z","timestamp":1661773833381},"reference-count":48,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T00:00:00Z","timestamp":1617580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003173","name":"Crafoordska Stiftelsen","doi-asserted-by":"publisher","award":["2020"]},{"DOI":"10.13039\/501100001862","name":"Svenska Forskningsr\u00e5det Formas","doi-asserted-by":"publisher","award":["2020-03485"]},{"name":"Erik Philip-S\u00f6rensen Foundation","award":["G2020-011"]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1949939"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Genes"],"abstract":"In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled \u201cSoft sweeps are the dominant mode of adaptation in the human genome\u201d (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863\u20131877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366\u20131371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern\u2019s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S\/HIC) should be taken with a huge shovel of salt.","DOI":"10.3390\/genes12040527","type":"journal-article","created":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T15:48:29Z","timestamp":1617637709000},"page":"527","source":"Crossref","is-referenced-by-count":2,"title":["On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn\u2019t"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-4795-1084","authenticated-orcid":false,"given":"Eran","family":"Elhaik","sequence":"first","affiliation":[]},{"given":"Dan","family":"Graur","sequence":"additional","affiliation":[]}],"member":"1968","published-online":{"date-parts":[[2021,4,5]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms6281"},{"key":"ref2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1007859"},{"key":"ref3","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1005928"},{"key":"ref4","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msx154"},{"key":"ref5","doi-asserted-by":"publisher","DOI":"10.1534\/g3.118.200262"},{"key":"ref6","doi-asserted-by":"publisher","DOI":"10.1016\/j.tig.2017.12.005"},{"key":"ref7","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy092"},{"key":"ref8","doi-asserted-by":"publisher","DOI":"10.1038\/217624a0"},{"key":"ref9","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaa8415"},{"key":"ref10","doi-asserted-by":"publisher","DOI":"10.1038\/nrg3920"},{"key":"ref11","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth343"},{"key":"ref12","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2012.02.005"},{"key":"ref13","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2018.00297"},{"key":"ref14","doi-asserted-by":"publisher","DOI":"10.3390\/genes11090985"},{"key":"ref15","doi-asserted-by":"publisher","DOI":"10.2174\/0929867324666170623092503"},{"key":"ref16","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-017-1280-5"},{"key":"ref17","doi-asserted-by":"publisher","DOI":"10.1101\/2020.04.21.053629"},{"key":"ref18","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy107"},{"key":"ref19","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msaa259"},{"key":"ref20","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1093\/genetics\/157.2.875","article-title":"Haldane\u2019s sieve and adaptation from the standing genetic variation","volume":"157","author":"Orr","year":"2001","journal-title":"Genetics"},{"key":"ref21","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.104.036947"},{"key":"ref22","doi-asserted-by":"publisher","DOI":"10.1101\/gr.094052.109"},{"key":"ref23","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","year":"2015","journal-title":"Nature"},{"key":"ref24","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-05257-7"},{"key":"ref25","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkt1052"},{"key":"ref26","unstructured":"Why Scientific Studies are so Often Wrong: The Streetlight Effecthttps:\/\/www.discovermagazine.com\/the-sciences\/why-scientific-studies-are-so-often-wrong-the-streetlight-effect"},{"key":"ref27","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw556"},{"key":"ref28","doi-asserted-by":"publisher","DOI":"10.1038\/nature10231"},{"key":"ref29","series-title":"Population Biology of Plant Pathogens: Genetics, Ecology, and Evolution","first-page":"59","article-title":"Chapter 4: Mutation and Random Genetic Drift","author":"Milgroom","year":"2015"},{"key":"ref30","doi-asserted-by":"publisher","DOI":"10.1038\/nature11396"},{"key":"ref31","doi-asserted-by":"publisher","DOI":"10.1038\/nrg2526"},{"key":"ref32","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.166.3.1375"},{"key":"ref33","doi-asserted-by":"publisher","DOI":"10.1101\/gr.6023607"},{"key":"ref34","doi-asserted-by":"publisher","DOI":"10.1101\/gr.119636.110"},{"key":"ref35","doi-asserted-by":"publisher","DOI":"10.1086\/505436"},{"key":"ref36","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.genom.9.081307.164420"},{"key":"ref37","doi-asserted-by":"publisher","DOI":"10.1086\/377138"},{"key":"ref38","doi-asserted-by":"publisher","DOI":"10.1038\/533452a"},{"key":"ref39","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.118.301502"},{"key":"ref40","doi-asserted-by":"publisher","DOI":"10.1016\/j.cub.2009.11.055"},{"key":"ref41","doi-asserted-by":"publisher","DOI":"10.1038\/nature11247"},{"key":"ref42","doi-asserted-by":"publisher","DOI":"10.1093\/gbe\/evt028"},{"key":"ref43","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1075"},{"key":"ref44","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1179"},{"key":"ref45","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky930"},{"key":"ref46","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky311"},{"key":"ref47","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1720798115"},{"key":"ref48","doi-asserted-by":"publisher","DOI":"10.1002\/1521-1878(200101)23:13.0.CO;2-2"}],"container-title":["Genes"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,1]],"date-time":"2021-05-01T04:20:02Z","timestamp":1619842802000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,5]]},"references-count":48,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["genes12040527"],"URL":"http:\/\/dx.doi.org\/10.3390\/genes12040527","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202012.0214.v2","asserted-by":"object"},{"id-type":"doi","id":"10.20944\/preprints202012.0214.v1","asserted-by":"object"}]},"ISSN":["2073-4425"],"issn-type":[{"value":"2073-4425","type":"electronic"}],"subject":["Genetics (clinical)","Genetics"],"published":{"date-parts":[[2021,4,5]]}}}
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,8,29]],"date-time":"2022-08-29T11:50:33Z","timestamp":1661773833381},"reference-count":48,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T00:00:00Z","timestamp":1617580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003173","name":"Crafoordska Stiftelsen","doi-asserted-by":"publisher","award":["2020"]},{"DOI":"10.13039\/501100001862","name":"Svenska Forskningsr\u00e5det Formas","doi-asserted-by":"publisher","award":["2020-03485"]},{"name":"Erik Philip-S\u00f6rensen Foundation","award":["G2020-011"]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1949939"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Genes"],"abstract":"In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled \u201cSoft sweeps are the dominant mode of adaptation in the human genome\u201d (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863\u20131877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366\u20131371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern\u2019s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S\/HIC) should be taken with a huge shovel of salt.<\/jats:p>","DOI":"10.3390\/genes12040527","type":"journal-article","created":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T15:48:29Z","timestamp":1617637709000},"page":"527","source":"Crossref","is-referenced-by-count":2,"title":["On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn\u2019t"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-4795-1084","authenticated-orcid":false,"given":"Eran","family":"Elhaik","sequence":"first","affiliation":[]},{"given":"Dan","family":"Graur","sequence":"additional","affiliation":[]}],"member":"1968","published-online":{"date-parts":[[2021,4,5]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms6281"},{"key":"ref2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1007859"},{"key":"ref3","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1005928"},{"key":"ref4","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msx154"},{"key":"ref5","doi-asserted-by":"publisher","DOI":"10.1534\/g3.118.200262"},{"key":"ref6","doi-asserted-by":"publisher","DOI":"10.1016\/j.tig.2017.12.005"},{"key":"ref7","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy092"},{"key":"ref8","doi-asserted-by":"publisher","DOI":"10.1038\/217624a0"},{"key":"ref9","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaa8415"},{"key":"ref10","doi-asserted-by":"publisher","DOI":"10.1038\/nrg3920"},{"key":"ref11","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth343"},{"key":"ref12","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2012.02.005"},{"key":"ref13","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2018.00297"},{"key":"ref14","doi-asserted-by":"publisher","DOI":"10.3390\/genes11090985"},{"key":"ref15","doi-asserted-by":"publisher","DOI":"10.2174\/0929867324666170623092503"},{"key":"ref16","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-017-1280-5"},{"key":"ref17","doi-asserted-by":"publisher","DOI":"10.1101\/2020.04.21.053629"},{"key":"ref18","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy107"},{"key":"ref19","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msaa259"},{"key":"ref20","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1093\/genetics\/157.2.875","article-title":"Haldane\u2019s sieve and adaptation from the standing genetic variation","volume":"157","author":"Orr","year":"2001","journal-title":"Genetics"},{"key":"ref21","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.104.036947"},{"key":"ref22","doi-asserted-by":"publisher","DOI":"10.1101\/gr.094052.109"},{"key":"ref23","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","year":"2015","journal-title":"Nature"},{"key":"ref24","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-05257-7"},{"key":"ref25","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkt1052"},{"key":"ref26","unstructured":"Why Scientific Studies are so Often Wrong: The Streetlight Effecthttps:\/\/www.discovermagazine.com\/the-sciences\/why-scientific-studies-are-so-often-wrong-the-streetlight-effect"},{"key":"ref27","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw556"},{"key":"ref28","doi-asserted-by":"publisher","DOI":"10.1038\/nature10231"},{"key":"ref29","series-title":"Population Biology of Plant Pathogens: Genetics, Ecology, and Evolution","first-page":"59","article-title":"Chapter 4: Mutation and Random Genetic Drift","author":"Milgroom","year":"2015"},{"key":"ref30","doi-asserted-by":"publisher","DOI":"10.1038\/nature11396"},{"key":"ref31","doi-asserted-by":"publisher","DOI":"10.1038\/nrg2526"},{"key":"ref32","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.166.3.1375"},{"key":"ref33","doi-asserted-by":"publisher","DOI":"10.1101\/gr.6023607"},{"key":"ref34","doi-asserted-by":"publisher","DOI":"10.1101\/gr.119636.110"},{"key":"ref35","doi-asserted-by":"publisher","DOI":"10.1086\/505436"},{"key":"ref36","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.genom.9.081307.164420"},{"key":"ref37","doi-asserted-by":"publisher","DOI":"10.1086\/377138"},{"key":"ref38","doi-asserted-by":"publisher","DOI":"10.1038\/533452a"},{"key":"ref39","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.118.301502"},{"key":"ref40","doi-asserted-by":"publisher","DOI":"10.1016\/j.cub.2009.11.055"},{"key":"ref41","doi-asserted-by":"publisher","DOI":"10.1038\/nature11247"},{"key":"ref42","doi-asserted-by":"publisher","DOI":"10.1093\/gbe\/evt028"},{"key":"ref43","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1075"},{"key":"ref44","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1179"},{"key":"ref45","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky930"},{"key":"ref46","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky311"},{"key":"ref47","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1720798115"},{"key":"ref48","doi-asserted-by":"publisher","DOI":"10.1002\/1521-1878(200101)23:1<104::AID-BIES1013>3.0.CO;2-2"}],"container-title":["Genes"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,1]],"date-time":"2021-05-01T04:20:02Z","timestamp":1619842802000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,5]]},"references-count":48,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["genes12040527"],"URL":"http:\/\/dx.doi.org\/10.3390\/genes12040527","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202012.0214.v2","asserted-by":"object"},{"id-type":"doi","id":"10.20944\/preprints202012.0214.v1","asserted-by":"object"}]},"ISSN":["2073-4425"],"issn-type":[{"value":"2073-4425","type":"electronic"}],"subject":["Genetics (clinical)","Genetics"],"published":{"date-parts":[[2021,4,5]]}}}
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,8,29]],"date-time":"2022-08-29T11:50:33Z","timestamp":1661773833381},"reference-count":48,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T00:00:00Z","timestamp":1617580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003173","name":"Crafoordska Stiftelsen","doi-asserted-by":"publisher","award":["2020"]},{"DOI":"10.13039\/501100001862","name":"Svenska Forskningsr\u00e5det Formas","doi-asserted-by":"publisher","award":["2020-03485"]},{"name":"Erik Philip-S\u00f6rensen Foundation","award":["G2020-011"]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1949939"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Genes"],"abstract":"In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled \u201cSoft sweeps are the dominant mode of adaptation in the human genome\u201d (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863\u20131877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366\u20131371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern\u2019s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S\/HIC) should be taken with a huge shovel of salt.","DOI":"10.3390\/genes12040527","type":"journal-article","created":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T15:48:29Z","timestamp":1617637709000},"page":"527","source":"Crossref","is-referenced-by-count":2,"title":["On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn\u2019t"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-4795-1084","authenticated-orcid":false,"given":"Eran","family":"Elhaik","sequence":"first","affiliation":[]},{"given":"Dan","family":"Graur","sequence":"additional","affiliation":[]}],"member":"1968","published-online":{"date-parts":[[2021,4,5]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms6281"},{"key":"ref2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1007859"},{"key":"ref3","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1005928"},{"key":"ref4","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msx154"},{"key":"ref5","doi-asserted-by":"publisher","DOI":"10.1534\/g3.118.200262"},{"key":"ref6","doi-asserted-by":"publisher","DOI":"10.1016\/j.tig.2017.12.005"},{"key":"ref7","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy092"},{"key":"ref8","doi-asserted-by":"publisher","DOI":"10.1038\/217624a0"},{"key":"ref9","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaa8415"},{"key":"ref10","doi-asserted-by":"publisher","DOI":"10.1038\/nrg3920"},{"key":"ref11","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth343"},{"key":"ref12","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2012.02.005"},{"key":"ref13","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2018.00297"},{"key":"ref14","doi-asserted-by":"publisher","DOI":"10.3390\/genes11090985"},{"key":"ref15","doi-asserted-by":"publisher","DOI":"10.2174\/0929867324666170623092503"},{"key":"ref16","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-017-1280-5"},{"key":"ref17","doi-asserted-by":"publisher","DOI":"10.1101\/2020.04.21.053629"},{"key":"ref18","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msy107"},{"key":"ref19","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msaa259"},{"key":"ref20","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1093\/genetics\/157.2.875","article-title":"Haldane\u2019s sieve and adaptation from the standing genetic variation","volume":"157","author":"Orr","year":"2001","journal-title":"Genetics"},{"key":"ref21","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.104.036947"},{"key":"ref22","doi-asserted-by":"publisher","DOI":"10.1101\/gr.094052.109"},{"key":"ref23","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","year":"2015","journal-title":"Nature"},{"key":"ref24","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-05257-7"},{"key":"ref25","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkt1052"},{"key":"ref26","unstructured":"Why Scientific Studies are so Often Wrong: The Streetlight Effecthttps:\/\/www.discovermagazine.com\/the-sciences\/why-scientific-studies-are-so-often-wrong-the-streetlight-effect"},{"key":"ref27","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw556"},{"key":"ref28","doi-asserted-by":"publisher","DOI":"10.1038\/nature10231"},{"key":"ref29","series-title":"Population Biology of Plant Pathogens: Genetics, Ecology, and Evolution","first-page":"59","article-title":"Chapter 4: Mutation and Random Genetic Drift","author":"Milgroom","year":"2015"},{"key":"ref30","doi-asserted-by":"publisher","DOI":"10.1038\/nature11396"},{"key":"ref31","doi-asserted-by":"publisher","DOI":"10.1038\/nrg2526"},{"key":"ref32","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.166.3.1375"},{"key":"ref33","doi-asserted-by":"publisher","DOI":"10.1101\/gr.6023607"},{"key":"ref34","doi-asserted-by":"publisher","DOI":"10.1101\/gr.119636.110"},{"key":"ref35","doi-asserted-by":"publisher","DOI":"10.1086\/505436"},{"key":"ref36","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.genom.9.081307.164420"},{"key":"ref37","doi-asserted-by":"publisher","DOI":"10.1086\/377138"},{"key":"ref38","doi-asserted-by":"publisher","DOI":"10.1038\/533452a"},{"key":"ref39","doi-asserted-by":"publisher","DOI":"10.1534\/genetics.118.301502"},{"key":"ref40","doi-asserted-by":"publisher","DOI":"10.1016\/j.cub.2009.11.055"},{"key":"ref41","doi-asserted-by":"publisher","DOI":"10.1038\/nature11247"},{"key":"ref42","doi-asserted-by":"publisher","DOI":"10.1093\/gbe\/evt028"},{"key":"ref43","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1075"},{"key":"ref44","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gku1179"},{"key":"ref45","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky930"},{"key":"ref46","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gky311"},{"key":"ref47","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1720798115"},{"key":"ref48","doi-asserted-by":"publisher","DOI":"10.1002\/1521-1878(200101)23:13.0.CO;2-2"}],"container-title":["Genes"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,1]],"date-time":"2021-05-01T04:20:02Z","timestamp":1619842802000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-4425\/12\/4\/527"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,5]]},"references-count":48,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["genes12040527"],"URL":"http:\/\/dx.doi.org\/10.3390\/genes12040527","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202012.0214.v2","asserted-by":"object"},{"id-type":"doi","id":"10.20944\/preprints202012.0214.v1","asserted-by":"object"}]},"ISSN":["2073-4425"],"issn-type":[{"value":"2073-4425","type":"electronic"}],"subject":["Genetics (clinical)","Genetics"],"published":{"date-parts":[[2021,4,5]]}}}
Abstract
Supervised machine learning (SML) is a powerful method for predicting a small number of well-defined output groups (e.g., potential buyers of a certain product) by taking as input a large number of known well-defined measurements (e.g., past purchases, income, ethnicity, gender, credit record, age, favorite color, favorite chewing gum). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known to be true. SML has had enormous success in the world of commerce, and this success has prompted a few scientists to employ it in the study of molecular and genome evolution. Here, we list the properties of SML that make it an unsuitable tool in evolutionary studies. In particular, we argue that SML cannot be used in an evolutionary exploratory context for the simple reason that training datasets that are known to be a priori true do not exist. As a case study, we use an SML study in which it was concluded that most human genomes evolve by positive selection through soft selective sweeps (Schrider and Kern 2017). We show that in the absence of legitimate training datasets, Schrider and Kern (2017) used (1) simulations that employ many manipulatable variables and (2) a system of cherry-picking data that would put to shame most modern evangelical exegeses of the Bible. These two factors, in addition to the lack of methodological detail and the lack of either negative controls or corrections for multiple comparisons, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., discoal) should be taken with a huge shovel of salt.
Keywords
machine learning; evolution; discoal; SML
Subject
LIFE SCIENCES, Biochemistry
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.