Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Dermatology Dermatology

GeFeS: A generalized wrapper feature selection approach for optimizing classification performance.

In Computers in biology and medicine

In this paper, we propose a generalized wrapper-based feature selection, called GeFeS, which is based on a parallel new intelligent genetic algorithm (GA). The proposed GeFeS works properly under different numerical dataset dimensions and sizes, carefully tries to avoid overfitting and significantly enhances classification accuracy. To make the GA more accurate, robust and intelligent, we have proposed a new operator for features weighting, improved the mutation and crossover operators, and integrated nested cross-validation into the GA process to properly validate the learning model. The k-nearest neighbor (kNN) classifier is utilized to evaluate the goodness of selected features. We have evaluated the efficiency of GeFeS on various datasets selected from the UCI machine learning repository. The performance is compared with state-of-the-art classification and feature selection methods. The results demonstrate that GeFeS can significantly generalize the proposed multi-population intelligent genetic algorithm under different sizes of two-class and multi-class datasets. We have achieved the average classification accuracy of 95.83%, 97.62%, 99.02%, 98.51%, and 94.28% while reducing the number of features from 56 to 28, 34 to 18, 279 to 135, 30 to 16, and 19 to 9 under lung cancer, dermatology, arrhythmia, WDBC, and hepatitis, respectively.

Sahebi Golnaz, Movahedi Parisa, Ebrahimi Masoumeh, Pahikkala Tapio, Plosila Juha, Tenhunen Hannu


Data mining, Evolutionary computing, Feature selection, Machine learning, Medical datasets, Overfitting, Parallel computing

General General

ncRDeep: Non-coding RNA classification with convolutional neural network.

In Computational biology and chemistry

A non-coding RNA (ncRNA) is a kind of RNA that is not converted into protein, however, it is involved in many biological processes, diseases, and cancers. Numerous ncRNAs have been identified and classified with high throughput sequencing technology. Hence, accurate ncRNAs class prediction is important and necessary for further study of their functions. Several computation techniques have been employed to predict the class of ncRNAs. Recent classification methods used the secondary structure as their primary input. However, the computational tools of RNA secondary structure are not accurate enough which affects the final performance of ncRNAs predictors. In this paper, we propose a simple yet efficient method, called ncRDeep, for ncRNAs prediction. It uses a simple convolutional neural network and RNA sequence information only. The ncRDeep was evaluated on benchmark datasets and the comparison results showed that the ncRDeep outperforms the state-of-the-art methods significantly. More specifically, the average accuracy was improved by 8.32%. Finally, we built a freely accessible web server for the developed tool ncRDeep at

Chantsalnyam Tuvshinbayar, Lim Dae Yeong, Tayara Hilal, Chong Kil To


Classification, Convolution neural network, Deep learning, Non-coding RNA

General General

Adverse drug event detection using reason assignments in FDA drug labels.

In Journal of biomedical informatics ; h5-index 55.0

Adverse drug events (ADEs) are unintended incidents that involve the taking of a medication. ADEs pose significant health and financial problems worldwide. Information about ADEs can inform health care and improve patient safety. However, much of this information is buried in narrative texts and needs to be extracted with Natural Language Processing techniques, in order to be useful to computerized methods. ADEs can be found on drug labels, contained in the different sections such as descriptions of the drug's active components or more prominently in descriptions of studied side-effects. Extracting these automatically could be useful in triaging and processing drug reports. In this paper, we present three base methods consisting of a Conditional Random Field (CRF), a bi-directional Long Short Term Memory unit with a CRF layer (biLSTM+CRF), and a pre-trained Bi-directional Encoder Representations from Transformers (BERT) model. We also present several ensembles of the CRF and biLSTM+CRF methods for extracting ADEs and their Reason from FDA drug labels. We show that all three methods perform well on our task, and that combining the models through different ensemble methods can improve results, providing increases in recall for the majority class and improving precision for all other classes. We also show the potential of framing ADE extraction from drug labels as a multi-class classification task on the Reason, or type, of ADE.

Sutphin Corey, Lee Kahyun, Yepes Antonio Jimeno, Uzuner Özlem, McInnes Bridget T


Machine learning, Named entity recognition, Natural Language Processing

General General

Multi fragment melting analysis system (MFMAS) for one-step identification of lactobacilli.

In Journal of microbiological methods

The accurate identification of lactobacilli is essential for the effective management of industrial practices associated with lactobacilli strains, such as the production of fermented foods or probiotic supplements. For this reason, in this study, we proposed the Multi Fragment Melting Analysis System (MFMAS)-lactobacilli based on high resolution melting (HRM) analysis of multiple DNA regions that have high interspecies heterogeneity for fast and reliable identification and characterization of lactobacilli. The MFMAS-lactobacilli is a new and customized version of the MFMAS, which was developed by our research group. MFMAS-lactobacilli is a combined system that consists of i) a ready-to-use plate, which is designed for multiple HRM analysis, and ii) a data analysis software, which is used to characterize lactobacilli species via incorporating machine learning techniques. Simultaneous HRM analysis of multiple DNA fragments yields a fingerprint for each tested strain and the identification is performed by comparing the fingerprints of unknown strains with those of known lactobacilli species registered in the MFMAS. In this study, a total of 254 isolates, which were recovered from fermented foods and probiotic supplements, were subjected to MFMAS analysis, and the results were confirmed by a combination of different molecular techniques. All of the analyzed isolates were exactly differentiated and accurately identified by applying the single-step procedure of MFMAS, and it was determined that all of the tested isolates belonged to 18 different lactobacilli species. The individual analysis of each target DNA region provided identification with an accuracy range from 59% to 90% for all tested isolates. However, when each target DNA region was analyzed simultaneously, perfect discrimination and 100% accurate identification were obtained even in closely related species. As a result, it was concluded that MFMAS-lactobacilli is a multi-purpose method that can be used to differentiate, classify, and identify lactobacilli species. Hence, our proposed system could be a potential alternative to overcome the inconsistencies and difficulties of the current methods.

Kesmen Zülal, Kılıç Özge, Gormez Yasin, Çelik Mete, Bakir-Gungor Burcu


High resolution melting (HRM), Lactobacilli, Logistic regression (LR), Machine learning, Multi-fragment melting analysis system (MFMAS), One-step identification

General General

Managing gestational diabetes mellitus using a smartphone application with artificial intelligence (SineDie) during the COVID-19 pandemic: Much more than just telemedicine.

In Diabetes research and clinical practice ; h5-index 50.0

We describe our experience in the remote management of women with gestational diabetes mellitus during the COVID-19 pandemic. We used a mobile phone application with artificial intelligence that automatically classifies and analyses the data (ketonuria, diet transgressions, and blood glucose values), making adjustment recommendations regarding the diet or insulin treatment.

Albert Lara, Capel Ismael, García-Sáez Gema, Martín-Redondo Pablo, Hernando M Elena, Rigla Mercedes


Artificial intelligence, Gestational diabetes mellitus, Mobile phone application, Telemedicine, eHealth

Ophthalmology Ophthalmology

Predicting progression to advanced age-related macular degeneration from clinical, genetic and lifestyle factors using machine learning.

In Ophthalmology ; h5-index 90.0

OBJECTIVE : Current prediction models for advanced age-related macular degeneration (AMD) are based on a restrictive set of risk factors. The objective of this study was to develop a comprehensive prediction model, applying a machine learning algorithm allowing selection of the most predictive risk factors automatically.

DESIGN : Two population-based cohort studies PARTICIPANTS: The Rotterdam Study I (RS-I, training set) included 3838 participants aged 55 years or more, with a median follow-up period of 10.8 years and 108 incident cases of advanced AMD. The ALIENOR study (test set) included 362 participants aged 73 years or more, with a median follow-up period of 6.5 years and 33 incident cases of advanced AMD.

METHODS : The prediction model used the bootstrap lasso for survival analysis to select the best predictors of incident advanced AMD in the training set. Predictive performance of the model was assessed using the area under the receiver operating characteristic curve (AUC).

MAIN OUTCOME MEASURES : incident advanced AMD (atrophic and/or neovascular), based on standardized interpretation of retinal photographs.

RESULTS : The prediction model retained i) age, ii) a combination of phenotypic predictors (based on the presence of intermediate drusen, hyper-pigmentation in one or both eyes and age-related eye disease study (AREDS) simplified score), iii) a summary genetic risk score based on 49 single nucleotide polymorphisms, iv) smoking, v) diet quality, vi) education, and vii) pulse pressure. The cross-validated AUC estimation in RS-I was 0.92 [0.88-0.97] at 5 years, 0.92 [0.90-0.95] at 10 years and 0.91 [0.88-0.94] at 15 years. In ALIENOR, the AUC reached 0.92 at 5 years [0.87-0.98]. In terms of calibration, the model tended to underestimate the cumulative incidence of advanced AMD for the high-risk groups, especially in ALIENOR.

CONCLUSIONS : This prediction model reached high discrimination abilities, paving the way towards making precision medicine for AMD patients a reality in the near future.

Ajana Soufiane, Cougnard-Grégoire Audrey, Colijn Johanna M, Merle Bénédicte Mj, Verzijden Timo, de Jong Paulus Tvm, Hofman Albert, Vingerling Johannes R, Hejblum Boris P, Korobelnik Jean-François, Meester-Smoor Magda A, Ueffing Marius, Jacqmin-Gadda Hélène, Klaver Caroline Cw, Delcourt Cécile