Tags

Type your tag names separated by a space and hit enter

Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs.
Forensic Sci Int Genet. 2016 Mar; 21:68-75.FS

Abstract

This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report with the assigned SNP-STR alleles. The main output file from STRinNGS contains all sequences with read counts above 1% of the total number of reads per locus. STR sequences are automatically named according to the nomenclature used previously and according to the repeat unit definitions in STRBase (http://www.cstl.nist.gov/strbase/). The sequences are named with (1) the locus name, (2) the length of the repeat region divided by the length of the repeat unit, (3) the sequence(s) of the repeat unit(s) followed by the number of repeats and (4) variations in the flanking regions. Lower case letters in the main output file are used to flag sequences with previously unknown variations in the STRs. SNPs in the flanking regions are named by their "rs" numbers and the nucleotides in the SNP position. Data from 207 Danes sequenced with the Ion Torrent™ HID STR 10-plex that amplified nine STRs (CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D16S539, TH01, TPOX, vWA), and Amelogenin was analysed with STRinNGS. Sequencing uncovered five common SNPs near four STRs and revealed 20 new alleles in the 207 Danes. Three short homopolymers in the D8S1179 flanking regions caused frequent sequencing errors. In 29 of 3726 allele calls (0.8%), sequences with homopolymer errors were falsely assigned as true alleles. An in-house developed script in R compensated for these errors by compiling sequence reads that had identical STR sequences and identical nucleotides in the five common SNPs. In the output file from the R script, all SNP-STR haplotype calls were correct. The 207 samples and six additional samples were sequenced for D3S1358, D12S391, and D21S11 using the 454 GS Junior platform in this and a previous work. Overall, next generation sequencing (NGS) of the 11 STRs lowered the mean match probability 386 times and increased the typical paternity indexes (i.e. the geometric mean) for trios and duos 47 and 23 times, respectively, compared to the traditional PCR-CE typing of the same population.

Authors+Show Affiliations

Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark. Electronic address: claus.boersting@sund.ku.dk.Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.

Pub Type(s)

Journal Article

Language

eng

PubMed ID

26722765

Citation

Friis, Susanne L., et al. "Introduction of the Python Script STRinNGS for Analysis of STR Regions in FASTQ or BAM Files and Expansion of the Danish STR Sequence Database to 11 STRs." Forensic Science International. Genetics, vol. 21, 2016, pp. 68-75.
Friis SL, Buchard A, Rockenbauer E, et al. Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs. Forensic Sci Int Genet. 2016;21:68-75.
Friis, S. L., Buchard, A., Rockenbauer, E., Børsting, C., & Morling, N. (2016). Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs. Forensic Science International. Genetics, 21, 68-75. https://doi.org/10.1016/j.fsigen.2015.12.006
Friis SL, et al. Introduction of the Python Script STRinNGS for Analysis of STR Regions in FASTQ or BAM Files and Expansion of the Danish STR Sequence Database to 11 STRs. Forensic Sci Int Genet. 2016;21:68-75. PubMed PMID: 26722765.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs. AU - Friis,Susanne L, AU - Buchard,Anders, AU - Rockenbauer,Eszter, AU - Børsting,Claus, AU - Morling,Niels, Y1 - 2015/12/12/ PY - 2015/07/08/received PY - 2015/11/05/revised PY - 2015/12/10/accepted PY - 2016/1/2/entrez PY - 2016/1/2/pubmed PY - 2016/12/15/medline KW - Forensic genetics KW - Massive parallel sequencing KW - Next generation sequencing KW - SNP–STR sequence analysis KW - STRinNGS Python application SP - 68 EP - 75 JF - Forensic science international. Genetics JO - Forensic Sci Int Genet VL - 21 N2 - This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report with the assigned SNP-STR alleles. The main output file from STRinNGS contains all sequences with read counts above 1% of the total number of reads per locus. STR sequences are automatically named according to the nomenclature used previously and according to the repeat unit definitions in STRBase (http://www.cstl.nist.gov/strbase/). The sequences are named with (1) the locus name, (2) the length of the repeat region divided by the length of the repeat unit, (3) the sequence(s) of the repeat unit(s) followed by the number of repeats and (4) variations in the flanking regions. Lower case letters in the main output file are used to flag sequences with previously unknown variations in the STRs. SNPs in the flanking regions are named by their "rs" numbers and the nucleotides in the SNP position. Data from 207 Danes sequenced with the Ion Torrent™ HID STR 10-plex that amplified nine STRs (CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D16S539, TH01, TPOX, vWA), and Amelogenin was analysed with STRinNGS. Sequencing uncovered five common SNPs near four STRs and revealed 20 new alleles in the 207 Danes. Three short homopolymers in the D8S1179 flanking regions caused frequent sequencing errors. In 29 of 3726 allele calls (0.8%), sequences with homopolymer errors were falsely assigned as true alleles. An in-house developed script in R compensated for these errors by compiling sequence reads that had identical STR sequences and identical nucleotides in the five common SNPs. In the output file from the R script, all SNP-STR haplotype calls were correct. The 207 samples and six additional samples were sequenced for D3S1358, D12S391, and D21S11 using the 454 GS Junior platform in this and a previous work. Overall, next generation sequencing (NGS) of the 11 STRs lowered the mean match probability 386 times and increased the typical paternity indexes (i.e. the geometric mean) for trios and duos 47 and 23 times, respectively, compared to the traditional PCR-CE typing of the same population. SN - 1878-0326 UR - https://www.unboundmedicine.com/medline/citation/26722765/Introduction_of_the_Python_script_STRinNGS_for_analysis_of_STR_regions_in_FASTQ_or_BAM_files_and_expansion_of_the_Danish_STR_sequence_database_to_11_STRs_ L2 - https://linkinghub.elsevier.com/retrieve/pii/S1872-4973(15)30102-2 DB - PRIME DP - Unbound Medicine ER -