Tags

Type your tag names separated by a space and hit enter

Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.
Mol Pharm. 2018 10 01; 15(10):4346-4360.MP

Abstract

Tuberculosis is a global health dilemma. In 2016, the WHO reported 10.4 million incidences and 1.7 million deaths. The need to develop new treatments for those infected with Mycobacterium tuberculosis (Mtb) has led to many large-scale phenotypic screens and many thousands of new active compounds identified in vitro. However, with limited funding, efforts to discover new active molecules against Mtb needs to be more efficient. Several computational machine learning approaches have been shown to have good enrichment and hit rates. We have curated small molecule Mtb data and developed new models with a total of 18,886 molecules with activity cutoffs of 10 μM, 1 μM, and 100 nM. These data sets were used to evaluate different machine learning methods (including deep learning) and metrics and to generate predictions for additional molecules published in 2017. One Mtb model, a combined in vitro and in vivo data Bayesian model at a 100 nM activity yielded the following metrics for 5-fold cross validation: accuracy = 0.88, precision = 0.22, recall = 0.91, specificity = 0.88, kappa = 0.31, and MCC = 0.41. We have also curated an evaluation set (n = 153 compounds) published in 2017, and when used to test our model, it showed the comparable statistics (accuracy = 0.83, precision = 0.27, recall = 1.00, specificity = 0.81, kappa = 0.36, and MCC = 0.47). We have also compared these models with additional machine learning algorithms showing Bayesian machine learning models constructed with literature Mtb data generated by different laboratories generally were equivalent to or outperformed deep neural networks with external test sets. Finally, we have also compared our training and test sets to show they were suitably diverse and different in order to represent useful evaluation sets. Such Mtb machine learning models could help prioritize compounds for testing in vitro and in vivo.

Authors+Show Affiliations

Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States. Department of Biochemistry and Biophysics , University of North Carolina , Chapel Hill , North Carolina 27599 , United States.Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States. The Rutgers Center for Computational and Integrative Biology , Camden , New Jersey 08102 , United States.Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.Molecular Materials Informatics, Inc. , 1900 St. Jacques #302 , Montreal H3J 2S1 , Quebec , Canada.Science Data Software, LLC , 14914 Bradwill Court , Rockville , Maryland 20850 , United States.Science Data Software, LLC , 14914 Bradwill Court , Rockville , Maryland 20850 , United States.Department of Medicine, Division of Hematology and Oncology , University of Alabama at Birmingham , NP 2540 J, 1720 Second Avenue South , Birmingham , Alabama 35294-3300 , United States.Department of Pharmacology, Physiology and Neuroscience , Rutgers University-New Jersey Medical School , Newark , New Jersey 07103 , United States.Department of Pharmacology, Physiology and Neuroscience , Rutgers University-New Jersey Medical School , Newark , New Jersey 07103 , United States. Division of Infectious Diseases, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens , Rutgers University-New Jersey Medical School , Newark , New Jersey 07103 , United States.Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.

Pub Type(s)

Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

Language

eng

PubMed ID

29672063

Citation

Lane, Thomas, et al. "Comparing and Validating Machine Learning Models for Mycobacterium Tuberculosis Drug Discovery." Molecular Pharmaceutics, vol. 15, no. 10, 2018, pp. 4346-4360.
Lane T, Russo DP, Zorn KM, et al. Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm. 2018;15(10):4346-4360.
Lane, T., Russo, D. P., Zorn, K. M., Clark, A. M., Korotcov, A., Tkachenko, V., Reynolds, R. C., Perryman, A. L., Freundlich, J. S., & Ekins, S. (2018). Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Molecular Pharmaceutics, 15(10), 4346-4360. https://doi.org/10.1021/acs.molpharmaceut.8b00083
Lane T, et al. Comparing and Validating Machine Learning Models for Mycobacterium Tuberculosis Drug Discovery. Mol Pharm. 2018 10 1;15(10):4346-4360. PubMed PMID: 29672063.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. AU - Lane,Thomas, AU - Russo,Daniel P, AU - Zorn,Kimberley M, AU - Clark,Alex M, AU - Korotcov,Alexandru, AU - Tkachenko,Valery, AU - Reynolds,Robert C, AU - Perryman,Alexander L, AU - Freundlich,Joel S, AU - Ekins,Sean, Y1 - 2018/04/26/ PY - 2018/4/20/pubmed PY - 2019/8/20/medline PY - 2018/4/20/entrez KW - deep learning KW - drug discovery KW - machine learning KW - support vector machine KW - tuberculosis SP - 4346 EP - 4360 JF - Molecular pharmaceutics JO - Mol. Pharm. VL - 15 IS - 10 N2 - Tuberculosis is a global health dilemma. In 2016, the WHO reported 10.4 million incidences and 1.7 million deaths. The need to develop new treatments for those infected with Mycobacterium tuberculosis (Mtb) has led to many large-scale phenotypic screens and many thousands of new active compounds identified in vitro. However, with limited funding, efforts to discover new active molecules against Mtb needs to be more efficient. Several computational machine learning approaches have been shown to have good enrichment and hit rates. We have curated small molecule Mtb data and developed new models with a total of 18,886 molecules with activity cutoffs of 10 μM, 1 μM, and 100 nM. These data sets were used to evaluate different machine learning methods (including deep learning) and metrics and to generate predictions for additional molecules published in 2017. One Mtb model, a combined in vitro and in vivo data Bayesian model at a 100 nM activity yielded the following metrics for 5-fold cross validation: accuracy = 0.88, precision = 0.22, recall = 0.91, specificity = 0.88, kappa = 0.31, and MCC = 0.41. We have also curated an evaluation set (n = 153 compounds) published in 2017, and when used to test our model, it showed the comparable statistics (accuracy = 0.83, precision = 0.27, recall = 1.00, specificity = 0.81, kappa = 0.36, and MCC = 0.47). We have also compared these models with additional machine learning algorithms showing Bayesian machine learning models constructed with literature Mtb data generated by different laboratories generally were equivalent to or outperformed deep neural networks with external test sets. Finally, we have also compared our training and test sets to show they were suitably diverse and different in order to represent useful evaluation sets. Such Mtb machine learning models could help prioritize compounds for testing in vitro and in vivo. SN - 1543-8392 UR - https://www.unboundmedicine.com/medline/citation/29672063/Comparing_and_Validating_Machine_Learning_Models_for_Mycobacterium_tuberculosis_Drug_Discovery_ L2 - https://dx.doi.org/10.1021/acs.molpharmaceut.8b00083 DB - PRIME DP - Unbound Medicine ER -