Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Molecular informatics

Predicting the likely biological activity (or property) of compounds is a fundamental and challenging task in the drug discovery process. Current computational methodologies aim to improve their predictive accuracies by using deep learning (DL) approaches. However, shallow learning-based methodologies for small- and medium-sized chemical datasets have demonstrated to be most suitable for. The latter start with a universe of molecular descriptors (MDs), then apply different feature selection algorithms, and finally construct a predictive model for the intended learning task. We demonstrate here that this approach may miss relevant information by assuming that the initial universe of MDs codifies, when it does not, all relevant aspects for the respective learning task. We argue that the limitation is mainly because of the constrained intervals of the parameters used in the algorithms that compute MDs, parameters that define the Descriptor Configuration Space (DCS). We propose to relax these constraints in an open CDS approach, so that a larger universe of MDs can initially be considered. We model the generation of MDs as a multicriteria optimization problem and tackle it with a variant of the standard genetic algorithm. As a novel component, the individual fitness function is computed by aggregating four criteria via the Choquet integral using a fuzzy (non-additive) measure. Experimental results on benchmarking chemical datasets show that the proposed approach generates a meaningful DCS by improving state-of-the-art approaches in most of the datasets.

García-González Luis A, Marrero-Ponce Yovani, Brizuela Carlos A, Garcia-Jacas Cesar

2023-Mar-09

ecotoxicological endpoints, genetic algorithm, molecular descriptors