Tags

Type your tag names separated by a space and hit enter

A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.
BMC Genomics. 2013; 14 Suppl 1:S1.BG

Abstract

BACKGROUND

Genome-wide association studies (GWAS) have identified many common polymorphisms associated with complex traits. However, these associated common variants explain only a small fraction of the phenotypic variances, leaving a substantial portion of genetic heritability unexplained. As a result, searches for "missing" heritability are drawing increasing attention, particularly for rare variant studies that often require a large sample size and, thus, extensive sequencing effort. Although the development of next generation sequencing (NGS) technologies has made it possible to sequence a large number of reads economically and efficiently, it is still often cost prohibitive to sequence thousands of individuals that are generally required for association studies. A more efficient and cost-effective design would involve pooling the genetic materials of multiple individuals together and then sequencing the pools, instead of the individuals. This pooled sequencing approach has improved the plausibility of association studies for rare variants, while, at the same time, posed a great challenge to the pooled sequencing data analysis, essentially because individual sample identity is lost, and NGS sequencing errors could be hard to distinguish from low frequency alleles.

RESULTS

A unified approach for estimating minor allele frequency, SNP calling and association studies based on pooled sequencing data using an expectation maximization (EM) algorithm is developed in this paper. This approach makes it possible to study the effects of minor allele frequency, sequencing error rate, number of pools, number of individuals in each pool, and the sequencing depth on the estimation accuracy of minor allele frequencies. We show that the naive method of estimating minor allele frequencies by taking the fraction of observed minor alleles can be significantly biased, especially for rare variants. In contrast, our EM approach can give an unbiased estimate of the minor allele frequency under all scenarios studied in this paper. A SNP calling approach, EM-SNP, for pooled sequencing data based on the EM algorithm is then developed and compared with another recent SNP calling method, SNVer. We show that EM-SNP outperforms SNVer in terms of the fraction of db-SNPs among the called SNPs, as well as transition/transversion (Ti/Tv) ratio. Finally, the EM approach is used to study the association between variants and type I diabetes.

CONCLUSIONS

The EM-based approach for the analysis of pooled sequencing data can accurately estimate minor allele frequencies, call SNPs, and find associations between variants and complex traits. This approach is especially useful for studies involving rare variants.

Authors+Show Affiliations

Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089-2910, USA.No affiliation info available

Pub Type(s)

Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

Language

eng

PubMed ID

23369070

Citation

Chen, Quan, and Fengzhu Sun. "A Unified Approach for Allele Frequency Estimation, SNP Detection and Association Studies Based On Pooled Sequencing Data Using EM Algorithms." BMC Genomics, vol. 14 Suppl 1, 2013, pp. S1.
Chen Q, Sun F. A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics. 2013;14 Suppl 1:S1.
Chen, Q., & Sun, F. (2013). A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics, 14 Suppl 1, S1. https://doi.org/10.1186/1471-2164-14-S1-S1
Chen Q, Sun F. A Unified Approach for Allele Frequency Estimation, SNP Detection and Association Studies Based On Pooled Sequencing Data Using EM Algorithms. BMC Genomics. 2013;14 Suppl 1:S1. PubMed PMID: 23369070.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. AU - Chen,Quan, AU - Sun,Fengzhu, Y1 - 2013/01/21/ PY - 2013/2/2/entrez PY - 2013/2/13/pubmed PY - 2013/8/13/medline SP - S1 EP - S1 JF - BMC genomics JO - BMC Genomics VL - 14 Suppl 1 N2 - BACKGROUND: Genome-wide association studies (GWAS) have identified many common polymorphisms associated with complex traits. However, these associated common variants explain only a small fraction of the phenotypic variances, leaving a substantial portion of genetic heritability unexplained. As a result, searches for "missing" heritability are drawing increasing attention, particularly for rare variant studies that often require a large sample size and, thus, extensive sequencing effort. Although the development of next generation sequencing (NGS) technologies has made it possible to sequence a large number of reads economically and efficiently, it is still often cost prohibitive to sequence thousands of individuals that are generally required for association studies. A more efficient and cost-effective design would involve pooling the genetic materials of multiple individuals together and then sequencing the pools, instead of the individuals. This pooled sequencing approach has improved the plausibility of association studies for rare variants, while, at the same time, posed a great challenge to the pooled sequencing data analysis, essentially because individual sample identity is lost, and NGS sequencing errors could be hard to distinguish from low frequency alleles. RESULTS: A unified approach for estimating minor allele frequency, SNP calling and association studies based on pooled sequencing data using an expectation maximization (EM) algorithm is developed in this paper. This approach makes it possible to study the effects of minor allele frequency, sequencing error rate, number of pools, number of individuals in each pool, and the sequencing depth on the estimation accuracy of minor allele frequencies. We show that the naive method of estimating minor allele frequencies by taking the fraction of observed minor alleles can be significantly biased, especially for rare variants. In contrast, our EM approach can give an unbiased estimate of the minor allele frequency under all scenarios studied in this paper. A SNP calling approach, EM-SNP, for pooled sequencing data based on the EM algorithm is then developed and compared with another recent SNP calling method, SNVer. We show that EM-SNP outperforms SNVer in terms of the fraction of db-SNPs among the called SNPs, as well as transition/transversion (Ti/Tv) ratio. Finally, the EM approach is used to study the association between variants and type I diabetes. CONCLUSIONS: The EM-based approach for the analysis of pooled sequencing data can accurately estimate minor allele frequencies, call SNPs, and find associations between variants and complex traits. This approach is especially useful for studies involving rare variants. SN - 1471-2164 UR - https://www.unboundmedicine.com/medline/citation/23369070/A_unified_approach_for_allele_frequency_estimation_SNP_detection_and_association_studies_based_on_pooled_sequencing_data_using_EM_algorithms_ L2 - https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-S1-S1 DB - PRIME DP - Unbound Medicine ER -