Tags

Type your tag names separated by a space and hit enter

Classification of breast cancer subtypes by combining gene expression and DNA methylation data.
J Integr Bioinform. 2014 Jun 13; 11(2):236.JI

Abstract

Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.

Authors+Show Affiliations

University of Southern Denmark, Molecular Oncology, J.B. Winsløws Vej 25, NanoCAN, 5000 Odense, Denmark.Computational Systems Biology Group, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.Clinical Institute, University of Southern Denmark, 5000 Odense, Denmark.Lundbeckfonden Center of Excellence in Nanomedicine (NanoCAN), University of Southern Denmark, 5000 Odense, Denmark.Lundbeckfonden Center of Excellence in Nanomedicine (NanoCAN), University of Southern Denmark, 5000 Odense, Denmark.Department of Mathematics and Computer Science (IMADA), University of Southern Denmark, 5000 Odense, Denmark.Department of Mathematics and Computer Science (IMADA), University of Southern Denmark, 5000 Odense, Denmark.

Pub Type(s)

Journal Article
Research Support, Non-U.S. Gov't

Language

eng

PubMed ID

24953305

Citation

List, Markus, et al. "Classification of Breast Cancer Subtypes By Combining Gene Expression and DNA Methylation Data." Journal of Integrative Bioinformatics, vol. 11, no. 2, 2014, p. 236.
List M, Hauschild AC, Tan Q, et al. Classification of breast cancer subtypes by combining gene expression and DNA methylation data. J Integr Bioinform. 2014;11(2):236.
List, M., Hauschild, A. C., Tan, Q., Kruse, T. A., Mollenhauer, J., Baumbach, J., & Batra, R. (2014). Classification of breast cancer subtypes by combining gene expression and DNA methylation data. Journal of Integrative Bioinformatics, 11(2), 236. https://doi.org/10.2390/biecoll-jib-2014-236
List M, et al. Classification of Breast Cancer Subtypes By Combining Gene Expression and DNA Methylation Data. J Integr Bioinform. 2014 Jun 13;11(2):236. PubMed PMID: 24953305.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Classification of breast cancer subtypes by combining gene expression and DNA methylation data. AU - List,Markus, AU - Hauschild,Anne-Christin, AU - Tan,Qihua, AU - Kruse,Torben A, AU - Mollenhauer,Jan, AU - Baumbach,Jan, AU - Batra,Richa, Y1 - 2014/06/13/ PY - 2014/05/28/received PY - 2014/05/28/revised PY - 2014/06/13/accepted PY - 2014/6/24/entrez PY - 2014/6/24/pubmed PY - 2015/3/13/medline SP - 236 EP - 236 JF - Journal of integrative bioinformatics JO - J Integr Bioinform VL - 11 IS - 2 N2 - Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level. SN - 1613-4516 UR - https://www.unboundmedicine.com/medline/citation/24953305/Classification_of_breast_cancer_subtypes_by_combining_gene_expression_and_DNA_methylation_data_ DB - PRIME DP - Unbound Medicine ER -