Sequence-based U.S. population data for 27 autosomal STR loci.Forensic Sci Int Genet. 2018 11; 37:106-115.FS
This manuscript reports Short Tandem Repeat (STR) sequence-based allele frequencies for 1036 samples across 27 autosomal STR loci: D1S1656, TPOX, D2S441, D2S1338, D3S1358, D4S2408, FGA, D5S818, CSF1PO, D6S1043, D7S820, D8S1179, D9S1122, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D17S1301, D18S51, D19S433, D20S482, D21S11, Penta D, and D22S1045. Sequence data were analyzed by two bioinformatic pipelines and all samples have been evaluated for concordance with alleles derived from CE-based analysis at all loci. Each reported sequence includes high-quality flanking sequence and is properly formatted according to the most recent guidance of the International Society for Forensic Genetics. In addition, GenBank accession numbers are reported for each sequence, and associated records are available in the STRSeq BioProject (https://www.ncbi.nlm.nih.gov/bioproject/380127). The D3S1358 locus demonstrates the greatest average increase in heterozygosity across populations (approximately 10 percentage points). Loci demonstrating average increase in heterozygosity from 10 to 5 percentage points include (in descending order) D9S1122, D13S317, D8S1179, D21S11, D5S818, D12S391, and D2S441. The remaining 19 loci each demonstrate less than 5 percentage point increase in average heterozygosity. Discussion includes the utility of this data in understanding traditional CE results, such as informing stutter models and understanding migration challenges, and considerations for population sampling strategies in light of the marked increase in rare alleles for several of the sequence-based STR loci. This NIST 1036 data set is expected to support the implementation of STR sequencing forensic casework by providing high-confidence sequence-based allele frequencies for the same sample set which are already the basis for population statistics in many U.S. forensic laboratories.