Prime

Type your tag names separated by a space and hit enter

pcadapt: an R package to perform genome scans for selection based on principal component analysis.

Abstract

The R package pcadapt performs genome scans to detect genes under selection based on population genomic data. It assumes that candidate markers are outliers with respect to how they are related to population structure. Because population structure is ascertained with principal component analysis, the package is fast and works with large-scale data. It can handle missing data and pooled sequencing data. By contrast to population-based approaches, the package handle admixed individuals and does not require grouping individuals into populations. Since its first release, pcadapt has evolved in terms of both statistical approach and software implementation. We present results obtained with robust Mahalanobis distance, which is a new statistic for genome scans available in the 2.0 and later versions of the package. When hierarchical population structure occurs, Mahalanobis distance is more powerful than the communality statistic that was implemented in the first version of the package. Using simulated data, we compare pcadapt to other computer programs for genome scans (BayeScan, hapflk, OutFLANK, sNMF). We find that the proportion of false discoveries is around a nominal false discovery rate set at 10% with the exception of BayeScan that generates 40% of false discoveries. We also find that the power of BayeScan is severely impacted by the presence of admixed individuals whereas pcadapt is not impacted. Last, we find that pcadapt and hapflk are the most powerful in scenarios of population divergence and range expansion. Because pcadapt handles next-generation sequencing data, it is a valuable tool for data analysis in molecular ecology.

Links

  • Publisher Full Text
  • Authors+Show Affiliations

    ,

    Laboratoire TIMC-IMAG, UMR 5525, CNRS, Université Grenoble Alpes, Grenoble, France.

    ,

    Laboratoire d'Ecologie Alpine UMR 5553, CNRS, Université Grenoble Alpes, Grenoble, France.

    Laboratoire TIMC-IMAG, UMR 5525, CNRS, Université Grenoble Alpes, Grenoble, France.

    Source

    Molecular ecology resources 17:1 2017 Jan pg 67-77

    MeSH

    Adaptation, Biological
    Biostatistics
    Computational Biology
    Genetics, Population
    Principal Component Analysis
    Selection, Genetic
    Software

    Pub Type(s)

    Comparative Study
    Evaluation Studies
    Journal Article

    Language

    eng

    PubMed ID

    27601374

    Citation

    TY - JOUR T1 - pcadapt: an R package to perform genome scans for selection based on principal component analysis. AU - Luu,Keurcien, AU - Bazin,Eric, AU - Blum,Michael G B, Y1 - 2016/09/07/ PY - 2016/05/31/received PY - 2016/07/29/revised PY - 2016/08/01/accepted PY - 2016/9/8/pubmed PY - 2017/4/4/medline PY - 2016/9/8/entrez KW - Mahalanobis distance KW - R package KW - outlier detection KW - population genetics KW - principal component analysis SP - 67 EP - 77 JF - Molecular ecology resources JO - Mol Ecol Resour VL - 17 IS - 1 N2 - The R package pcadapt performs genome scans to detect genes under selection based on population genomic data. It assumes that candidate markers are outliers with respect to how they are related to population structure. Because population structure is ascertained with principal component analysis, the package is fast and works with large-scale data. It can handle missing data and pooled sequencing data. By contrast to population-based approaches, the package handle admixed individuals and does not require grouping individuals into populations. Since its first release, pcadapt has evolved in terms of both statistical approach and software implementation. We present results obtained with robust Mahalanobis distance, which is a new statistic for genome scans available in the 2.0 and later versions of the package. When hierarchical population structure occurs, Mahalanobis distance is more powerful than the communality statistic that was implemented in the first version of the package. Using simulated data, we compare pcadapt to other computer programs for genome scans (BayeScan, hapflk, OutFLANK, sNMF). We find that the proportion of false discoveries is around a nominal false discovery rate set at 10% with the exception of BayeScan that generates 40% of false discoveries. We also find that the power of BayeScan is severely impacted by the presence of admixed individuals whereas pcadapt is not impacted. Last, we find that pcadapt and hapflk are the most powerful in scenarios of population divergence and range expansion. Because pcadapt handles next-generation sequencing data, it is a valuable tool for data analysis in molecular ecology. SN - 1755-0998 UR - https://www.unboundmedicine.com/medline/citation/27601374/pcadapt:_an_R_package_to_perform_genome_scans_for_selection_based_on_principal_component_analysis_ L2 - http://dx.doi.org/10.1111/1755-0998.12592 ER -