Tags

Type your tag names separated by a space and hit enter

Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population.
Forensic Sci Int Genet. 2022 03; 57:102659.FS

Abstract

Improving the resolution of the current widely used Y-chromosomal short tandem repeat (Y-STR) dataset is of great importance for forensic investigators, and the current approach is limited, except for the addition of more Y-STR loci. In this research, a regional Y-DNA database was investigated to improve the Y-STR haplotype resolution utilizing a Y-SNP Pedigree Tagging System that includes 24 Y-chromosomal single nucleotide polymorphism (Y-SNP) loci. This pilot study was conducted in the Chinese Yunnan Zhaoyang Han population, and 3473 unrelated male individuals were enrolled. Based on data on the male haplogroups under different panels, the matched or near-matching (NM) Y-STR haplotype pairs from different haplogroups indicated the critical roles of haplogroups in improving the regional Y-STR haplotype resolution. A classic median-joining network analysis was performed using Y-STR or Y-STR/Y-SNP data to reconstruct population substructures, which revealed the ability of Y-SNPs to correct misclassifications from Y-STRs. Additionally, population substructures were reconstructed using multiple unsupervised or supervised dimensionality reduction methods, which indicated the potential of Y-STR haplotypes in predicting Y-SNP haplogroups. Haplogroup prediction models were built based on nine publicly accessible machine-learning (ML) approaches. The results showed that the best prediction accuracy score could reach 99.71% for major haplogroups and 98.54% for detailed haplogroups. Potential influences on prediction accuracy were assessed by adjusting the Y-STR locus numbers, selecting Y-STR loci with various mutabilities, and performing data processing. ML-based predictors generally presented a better prediction accuracy than two available predictors (Nevgen and EA-YPredictor). Three tree models were developed based on the Yfiler Plus panel with unprocessed input data, which showed their strong generalization ability in classifying various Chinese Han subgroups (validation dataset). In conclusion, this study revealed the significance and application prospects of Y-SNP haplogroups in improving regional Y-STR databases. Y-SNP haplogroups can be used to discriminate NM Y-STR haplotype pairs, and it is important for forensic Y-STR databases to develop haplogroup prediction tools to improve the accuracy of biogeographic ancestry inferences.

Authors+Show Affiliations

Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China; Human Phenome Institute, Fudan University, Shanghai 200438, China.Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China; Human Phenome Institute, Fudan University, Shanghai 200438, China.Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China; Human Phenome Institute, Fudan University, Shanghai 200438, China.Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China; Human Phenome Institute, Fudan University, Shanghai 200438, China.Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China; Human Phenome Institute, Fudan University, Shanghai 200438, China.Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China; Human Phenome Institute, Fudan University, Shanghai 200438, China.Criminal Investigation Department of Yunnan Province, Kunming 650021, Yunnan, China.Public Security Bureau of Zhaotong City, Zhaotong 657000, Yunnan, China.Criminal Investigation Department of Yunnan Province, Kunming 650021, Yunnan, China.Public Security Bureau of Zhaotong City, Zhaotong 657000, Yunnan, China.Department of Forensic Medicine, Nanjing Medical University, Nanjing 211166, Jiangsu, China.Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China; Human Phenome Institute, Fudan University, Shanghai 200438, China. Electronic address: lijin@fudan.edu.cn.State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China; Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China. Electronic address: lishilin@fudan.edu.cn.

Pub Type(s)

Journal Article
Research Support, Non-U.S. Gov't

Language

eng

PubMed ID

35007855

Citation

Yin, Caiyong, et al. "Improving the Regional Y-STR Haplotype Resolution Utilizing Haplogroup-determining Y-SNPs and the Application of Machine Learning in Y-SNP Haplogroup Prediction in a Forensic Y-STR Database: a Pilot Study On Male Chinese Yunnan Zhaoyang Han Population." Forensic Science International. Genetics, vol. 57, 2022, p. 102659.
Yin C, He Z, Wang Y, et al. Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population. Forensic Sci Int Genet. 2022;57:102659.
Yin, C., He, Z., Wang, Y., He, X., Zhang, X., Xia, M., Zhai, D., Chang, K., Chen, X., Chen, X., Chen, F., Jin, L., & Li, S. (2022). Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population. Forensic Science International. Genetics, 57, 102659. https://doi.org/10.1016/j.fsigen.2021.102659
Yin C, et al. Improving the Regional Y-STR Haplotype Resolution Utilizing Haplogroup-determining Y-SNPs and the Application of Machine Learning in Y-SNP Haplogroup Prediction in a Forensic Y-STR Database: a Pilot Study On Male Chinese Yunnan Zhaoyang Han Population. Forensic Sci Int Genet. 2022;57:102659. PubMed PMID: 35007855.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: A pilot study on male Chinese Yunnan Zhaoyang Han population. AU - Yin,Caiyong, AU - He,Ziwei, AU - Wang,Yi, AU - He,Xi, AU - Zhang,Xiao, AU - Xia,Mingying, AU - Zhai,Dian, AU - Chang,Kaichuang, AU - Chen,Xueyun, AU - Chen,Xingneng, AU - Chen,Feng, AU - Jin,Li, AU - Li,Shilin, Y1 - 2021/12/29/ PY - 2021/04/14/received PY - 2021/12/14/revised PY - 2021/12/27/accepted PY - 2022/1/11/pubmed PY - 2022/4/5/medline PY - 2022/1/10/entrez KW - Database development KW - Machine learning KW - Y-SNP haplogroup KW - Y-STR haplotype resolution SP - 102659 EP - 102659 JF - Forensic science international. Genetics JO - Forensic Sci Int Genet VL - 57 N2 - Improving the resolution of the current widely used Y-chromosomal short tandem repeat (Y-STR) dataset is of great importance for forensic investigators, and the current approach is limited, except for the addition of more Y-STR loci. In this research, a regional Y-DNA database was investigated to improve the Y-STR haplotype resolution utilizing a Y-SNP Pedigree Tagging System that includes 24 Y-chromosomal single nucleotide polymorphism (Y-SNP) loci. This pilot study was conducted in the Chinese Yunnan Zhaoyang Han population, and 3473 unrelated male individuals were enrolled. Based on data on the male haplogroups under different panels, the matched or near-matching (NM) Y-STR haplotype pairs from different haplogroups indicated the critical roles of haplogroups in improving the regional Y-STR haplotype resolution. A classic median-joining network analysis was performed using Y-STR or Y-STR/Y-SNP data to reconstruct population substructures, which revealed the ability of Y-SNPs to correct misclassifications from Y-STRs. Additionally, population substructures were reconstructed using multiple unsupervised or supervised dimensionality reduction methods, which indicated the potential of Y-STR haplotypes in predicting Y-SNP haplogroups. Haplogroup prediction models were built based on nine publicly accessible machine-learning (ML) approaches. The results showed that the best prediction accuracy score could reach 99.71% for major haplogroups and 98.54% for detailed haplogroups. Potential influences on prediction accuracy were assessed by adjusting the Y-STR locus numbers, selecting Y-STR loci with various mutabilities, and performing data processing. ML-based predictors generally presented a better prediction accuracy than two available predictors (Nevgen and EA-YPredictor). Three tree models were developed based on the Yfiler Plus panel with unprocessed input data, which showed their strong generalization ability in classifying various Chinese Han subgroups (validation dataset). In conclusion, this study revealed the significance and application prospects of Y-SNP haplogroups in improving regional Y-STR databases. Y-SNP haplogroups can be used to discriminate NM Y-STR haplotype pairs, and it is important for forensic Y-STR databases to develop haplogroup prediction tools to improve the accuracy of biogeographic ancestry inferences. SN - 1878-0326 UR - https://www.unboundmedicine.com/medline/citation/35007855/Improving_the_regional_Y_STR_haplotype_resolution_utilizing_haplogroup_determining_Y_SNPs_and_the_application_of_machine_learning_in_Y_SNP_haplogroup_prediction_in_a_forensic_Y_STR_database:_A_pilot_study_on_male_Chinese_Yunnan_Zhaoyang_Han_population_ DB - PRIME DP - Unbound Medicine ER -