In Methods in molecular biology (Clifton, N.J.)
Elucidating the mechanisms of metabolic pathways helps us understand the cascade of enzyme-catalyzed reactions that lead to the conversion of substances into final products. This has implications for predicting how newly synthesized compounds will affect a person's metabolism and, hence, the development of novel treatments to improve one's health. The study of metabolomic pathways, together with protein engineering, may also aid in the extraction, at a scale, of natural products to be used as drugs and drug precursors. Several approaches have been used to correlate protein annotations to metabolic pathways in order to derive pathways directly related to specific organisms. These could range from association rule-mining techniques to machine learning methods such as decision trees, naïve Bayes, logistic regression, and ensemble methods.In this chapter, we will be reviewing the use of machine learning for metabolic pathway analyses, with a step-by-step focus on the use of deep learning to predict the association of compounds (metabolites) to their respective metabolomic pathway classes. This prediction could help explain interactions of small molecules in organisms. Inspired by the work of Baranwal et al. (2019), we demonstrate how to build and train a deep learning neural network model to perform a multi-label prediction. We considered two different types of fingerprints as features (inputs to the model). The output of the model is the set of metabolic pathway classes (from the KEGG dataset) in which the input molecule participates. We will walk through the various steps of this process, including data collection, feature engineering, model selection, training, and evaluation. This model-building and evaluation process may be easily transferred to other domains of interest. All the source code used in this chapter is made publicly available at https://github.com/jp-um/machine_learning_for_metabolomic_pathway_analyses .
Bonetta Valentino Rosalin, Ebejer Jean-Paul, Valentino Gianluca
Feature engineering, KEGG classes, Machine learning, Metabolomics, Neural networks, Performance metrics