Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Machine Learning-Assisted Evaluation of Circulating DNA Quantitative Analysis for Cancer Screening.

In Advanced science (Weinheim, Baden-Wurttemberg, Germany)

While the utility of circulating cell-free DNA (cfDNA) in cancer screening and early detection have recently been investigated by testing genetic and epigenetic alterations, here, an original approach by examining cfDNA quantitative and structural features is developed. First, the potential of cfDNA quantitative and structural parameters is independently demonstrated in cell culture, murine, and human plasma models. Subsequently, these variables are evaluated in a large retrospective cohort of 289 healthy individuals and 983 patients with various cancer types; after age resampling, this evaluation is done independently and the variables are combined using a machine learning approach. Implementation of a decision tree prediction model for the detection and classification of healthy and cancer patients shows unprecedented performance for 0, I, and II colorectal cancer stages (specificity, 0.89 and sensitivity, 0.72). Consequently, the methodological proof of concept of using both quantitative and structural biomarkers, and classification with a machine learning method are highlighted, as an efficient strategy for cancer screening. It is foreseen that the classification rate may even be improved by the addition of such biomarkers to fragmentomics, methylation, or the detection of genetic alterations. The optimization of such a multianalyte strategy with this machine learning method is therefore warranted.

Tanos Rita, Tosato Guillaume, Otandault Amaelle, Al Amir Dache Zahra, Pique Lasorsa Laurence, Tousch Geoffroy, El Messaoudi Safia, Meddeb Romain, Diab Assaf Mona, Ychou Marc, Du Manoir Stanislas, Pezet Denis, Gagnière Johan, Colombo Pierre-Emmanuel, Jacot William, Assénat Eric, Dupuy Marie, Adenis Antoine, Mazard Thibault, Mollevi Caroline, Sayagués José María, Colinge Jacques, Thierry Alain R


cancer, circulating DNA, early diagnosis, machine learning, screening

General General

Federated Learning: A Survey on Enabling Technologies, Protocols, and Applications.

In IEEE access : practical innovations, open solutions

This paper provides a comprehensive study of Federated Learning (FL) with an emphasis on enabling software and hardware platforms, protocols, real-life applications and use-cases. FL can be applicable to multiple domains but applying it to different industries has its own set of obstacles. FL is known as collaborative learning, where algorithm(s) get trained across multiple devices or servers with decentralized data samples without having to exchange the actual data. This approach is radically different from other more established techniques such as getting the data samples uploaded to servers or having data in some form of distributed infrastructure. FL on the other hand generates more robust models without sharing data, leading to privacy-preserved solutions with higher security and access privileges to data. This paper starts by providing an overview of FL. Then, it gives an overview of technical details that pertain to FL enabling technologies, protocols, and applications. Compared to other survey papers in the field, our objective is to provide a more thorough summary of the most relevant protocols, platforms, and real-life use-cases of FL to enable data scientists to build better privacy-preserving solutions for industries in critical need of FL. We also provide an overview of key challenges presented in the recent literature and provide a summary of related research work. Moreover, we explore both the challenges and advantages of FL and present detailed service use-cases to illustrate how different architectures and protocols that use FL can fit together to deliver desired results.

Aledhari Mohammed, Razzak Rehma, Parizi Reza M, Saeed Fahad


Collaborative AI, Decentralized Data, Federated Learning, Machine Learning, On-Device AI, Peer-to-peer network, Privacy, Security

General General

Influence of number of membership functions on prediction of membrane systems using adaptive network based fuzzy inference system (ANFIS).

In Scientific reports ; h5-index 158.0

In membrane separation technologies, membrane modules are used to separate chemical components. In membrane technology, understanding the behavior of fluids inside membrane module is challenging, and numerical methods are possible by using computational fluid dynamics (CFD). On the other hand, the optimization of membrane technology via CFD needs time and computational costs. Artificial Intelligence (AI) and CFD together can model a chemical process, including membrane technology and phase separation. This process can learn the process by learning the neural networks, and point by point learning of CFD mesh elements (computing nodes), and the fuzzy logic system can predict this process. In the current study, the adaptive neuro-fuzzy inference system (ANFIS) model and different parameters of ANFIS for learning a process based on membrane technology was used. The purpose behind using this model is to see how different tuning parameters of the ANFIS model can be used for increasing the exactness of the AI model and prediction of the membrane technology. These parameters were changed in this study, and the accuracy of the prediction was investigated. The results indicated that with low number of inputs, poor regression was obtained, less than 0.32 (R-value), but by increasing the number of inputs, the AI algorithm led to an increase in the prediction capability of the model. When the number of inputs increased to 4, the R-value was increased to 0.99, showing the high accuracy of model as well as its high capability in prediction of membrane process. The AI results were in good agreement with the CFD results. AI results were achieved in a limited time and with low computational costs. In terms of the categorization of CFD data-set, the AI framework plays a critical role in storing data in short memory, and the recovery mechanism can be very easy for users. Furthermore, the results were compared with Particle Swarm Optimization (PSOFIS), and Genetic Algorithm (GAFIS). The time for prediction and learning were compared to study the capability of the methods in prediction and their accuracy.

Babanezhad Meisam, Masoumian Armin, Nakhjiri Ali Taghvaie, Marjani Azam, Shirazian Saeed


Internal Medicine Internal Medicine

Correction: COVID-19 risk and outcomes in patients with substance use disorders: analyses from electronic health records in the United States.

In Molecular psychiatry ; h5-index 103.0

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Wang Quan Qiu, Kaelber David C, Xu Rong, Volkow Nora D


General General

Hypercluster: a flexible tool for parallelized unsupervised clustering optimization.

In BMC bioinformatics

BACKGROUND : Unsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow.

RESULTS : We present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Users can efficiently evaluate a huge range of clustering results from multiple models and hyperparameters to identify an optimal model.

CONCLUSIONS : Hypercluster improves ease of use, robustness and reproducibility for unsupervised clustering application for high throughput biology. Hypercluster is available on pip and bioconda; installation, documentation and example workflows can be found at: .

Blumenberg Lili, Ruggles Kelly V


Hyperparameter optimization, Machine learning, Python, Scikit-learn, SnakeMake, Unsupervised clustering

General General

A multimodal deep learning-based drug repurposing approach for treatment of COVID-19.

In Molecular diversity

Recently, various computational methods have been proposed to find new therapeutic applications of the existing drugs. The Multimodal Restricted Boltzmann Machine approach (MM-RBM), which has the capability to connect the information about the multiple modalities, can be applied to the problem of drug repurposing. The present study utilized MM-RBM to combine two types of data, including the chemical structures data of small molecules and differentially expressed genes as well as small molecules perturbations. In the proposed method, two separate RBMs were applied to find out the features and the specific probability distribution of each datum (modality). Besides, RBM was used to integrate the discovered features, resulting in the identification of the probability distribution of the combined data. The results demonstrated the significance of the clusters acquired by our model. These clusters were used to discover the medicines which were remarkably similar to the proposed medications to treat COVID-19. Moreover, the chemical structures of some small molecules as well as dysregulated genes' effect led us to suggest using these molecules to treat COVID-19. The results also showed that the proposed method might prove useful in detecting the highly promising remedies for COVID-19 with minimum side effects. All the source codes are accessible using

Hooshmand Seyed Aghil, Zarei Ghobadi Mohadeseh, Hooshmand Seyyed Emad, Azimzadeh Jamalkandi Sadegh, Alavi Seyed Mehdi, Masoudi-Nejad Ali


COVID-19, Deep learning, Drug repurposing, Multimodal data fusion, Restricted Boltzmann machine