Tags

Type your tag names separated by a space and hit enter

Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation.
J Chem Inf Model. 2013 Nov 25; 53(11):3054-63.JC

Abstract

The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.

Authors+Show Affiliations

Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.No affiliation info availableNo affiliation info available

Pub Type(s)

Journal Article
Research Support, American Recovery and Reinvestment Act
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

Language

eng

PubMed ID

24144044

Citation

Ekins, Sean, et al. "Fusing Dual-event Data Sets for Mycobacterium Tuberculosis Machine Learning Models and Their Evaluation." Journal of Chemical Information and Modeling, vol. 53, no. 11, 2013, pp. 3054-63.
Ekins S, Freundlich JS, Reynolds RC. Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model. 2013;53(11):3054-63.
Ekins, S., Freundlich, J. S., & Reynolds, R. C. (2013). Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. Journal of Chemical Information and Modeling, 53(11), 3054-63. https://doi.org/10.1021/ci400480s
Ekins S, Freundlich JS, Reynolds RC. Fusing Dual-event Data Sets for Mycobacterium Tuberculosis Machine Learning Models and Their Evaluation. J Chem Inf Model. 2013 Nov 25;53(11):3054-63. PubMed PMID: 24144044.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. AU - Ekins,Sean, AU - Freundlich,Joel S, AU - Reynolds,Robert C, Y1 - 2013/10/30/ PY - 2013/10/23/entrez PY - 2013/10/23/pubmed PY - 2014/7/9/medline SP - 3054 EP - 63 JF - Journal of chemical information and modeling JO - J Chem Inf Model VL - 53 IS - 11 N2 - The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date. SN - 1549-960X UR - https://www.unboundmedicine.com/medline/citation/24144044/Fusing_dual_event_data_sets_for_Mycobacterium_tuberculosis_machine_learning_models_and_their_evaluation_ L2 - https://dx.doi.org/10.1021/ci400480s DB - PRIME DP - Unbound Medicine ER -