Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Pilot Study Using Machine Learning to Identify Immune Profiles for the Prediction of Early Virological Relapse After Stopping Nucleos(t)ide Analogues in HBeAg-Negative CHB.

In Hepatology communications

Treatment with nucleos(t)ide analogues (NAs) may be stopped after 1-3 years of hepatitis B virus DNA suppression in hepatitis B e antigen (HBeAg)-negative patients according to Asian Pacific Association for the Study of Liver and European Association for the Study of Liver guidelines. However, virological relapse (VR) occurs in most patients. We aimed to analyze soluble immune markers (SIMs) and use machine learning to identify SIM combinations as predictor for early VR after NA discontinuation. A validation cohort was used to verify the predictive power of the SIM combination. In a post hoc analysis of a prospective, multicenter therapeutic vaccination trial (ABX-203, NCT02249988), hepatitis B surface antigen, hepatitis B core antigen, and 47 SIMs were repeatedly determined before NA was stopped. Forty-three HBeAg-negative patients were included. To detect the highest predictive constellation of host and viral markers, a supervised machine learning approach was used. Data were validated in a different cohort of 49 patients treated with entecavir. VR (hepatitis B virus DNA ≥ 2,000 IU/mL) occurred in 27 patients. The predictive value for VR of single SIMs at the time of NA stop was best for interleukin (IL)-2, IL-17, and regulated on activation, normal T cell expressed and secreted (RANTES/CCL5) with a maximum area under the curve of 0.65. Hepatitis B core antigen had a higher predictive power than hepatitis B surface antigen but lower than the SIMs. A supervised machine-learning algorithm allowed a remarkable improvement of early relapse prediction in patients treated with entecavir. The combination of IL-2, monokine induced by interferon γ (MIG)/chemokine (C-C motif) ligand 9 (CCL9), RANTES/CCL5, stem cell factor (SCF), and TNF-related apoptosis-inducing ligand (TRAIL) was reliable in predicting VR (0.89; 95% confidence interval: 0.5-1.0) and showed viable results in the validation cohort (0.63; 0.1-0.99). Host immune markers such as SIMs appear to be underestimated in guiding treatment cessation in HBeAg-negative patients. Machine learning can help find predictive SIM patterns that allow a precise identification of patients particularly suitable for NA cessation.

Wübbolding Maximilian, Lopez Alfonso Juan Carlos, Lin Chun-Yen, Binder Sebastian, Falk Christine, Debarry Jennifer, Gineste Paul, Kraft Anke R M, Chien Rong-Nan, Maasoumy Benjamin, Wedemeyer Heiner, Jeng Wen-Juei, Meyer Hermann Michael, Cornberg Markus, Höner Zu Siederdissen Christoph


General General

Datasets for recognition of aggressive interactions of children toward robotic toys.

In Data in brief

The data is related to unwanted interactions between a person and a small robotic toy based on acceleration sensor embedded within the robotic toy. Three toys were considered namely, a stuffed panda, a stuffed robot, and an excavator. Each toy was embedded with an accelerometer to record the interactions. Five different unwanted interactions were performed by adult participants and children. The considered interactions were hit, shake, throw, pickup, drop, and idle for the no interaction case. The collected data contains the magnitude of the resultant acceleration from the interactions. The data was processed by extracting the instances of interactions. A secondary dataset was created from the original one by creating artificial sequences. This data article contains the processed data that can be used to explore different machine learning models and techniques in classifying such interactions. Online repository contains the files:

Alhaddad Ahmad Yaser, Cabibihan John-John, Bonarini Andrea


Acceleration, Human-robot interaction, Safety, Social robots

General General

An early aortic dissection screening model and applied research based on ensemble learning.

In Annals of translational medicine

Background : As a particularly dangerous and rare cardiovascular disease, aortic dissection (AD) is characterized by complex and diverse symptoms and signs. In the early stage, the rate of misdiagnosis and missed diagnosis is relatively high. This study aimed to use machine learning technology to establish a fast and accurate screening model that requires only patients' routine examination data as input to obtain predictive results.

Methods : A retrospective analysis of the examination data and diagnosis results of 53,213 patients with cardiovascular disease was conducted. Among these samples, 802 samples had AD. Forty-two features were extracted from the patients' routine examination data to establish a prediction model. There were five ensemble learning models applied to explore the possibility of using machine learning methods to build screening models for AD, including AdaBoost, XGBoost, SmoteBagging, EasyEnsemble and XGBF. Among these, XGBF is an ensemble learning model that we propose to deal with the imbalance of the positive and negative samples. The seven-fold cross validation method was used to analyze and verify the performance of each model. Due to the imbalance of the samples, the evaluation indicators were sensitivity and specificity.

Results : Comparative experiments showed that the sensitivity of XGBF was 80.5%, which was better than the 16.1% of AdaBoost, 15.7% of XGBoost, 78.0% of SmoteBagging and 77.8% of EasyEnsemble. Additionally, XGBF had relatively high specificity, and the training time consumption was short. Based on these three indicators, XGBF performed best, and met the application requirements, which means through careful design, we can use machine learning technology to achieve early AD screening.

Conclusions : Through reasonable design, the ensemble learning method can be used to build an effective screening model. The XGBF has high practical application value for screening for AD.

Liu Lijue, Tan Shiyang, Li Yi, Luo Jingmin, Zhang Wei, Li Shihao


Aortic dissection (AD), early screening, ensemble learning, machine learning

General General

Classification of Cancer Types Using Graph Convolutional Neural Networks.

In Frontiers in physics

Background : Cancer has been a leading cause of death in the United States with significant health care costs. Accurate prediction of cancers at an early stage and understanding the genomic mechanisms that drive cancer development are vital to the improvement of treatment outcomes and survival rates, thus resulting in significant social and economic impacts. Attempts have been made to classify cancer types with machine learning techniques during the past two decades and deep learning approaches more recently.

Results : In this paper, we established four models with graph convolutional neural network (GCNN) that use unstructured gene expressions as inputs to classify different tumor and non-tumor samples into their designated 33 cancer types or as normal. Four GCNN models based on a co-expression graph, co-expression+singleton graph, protein-protein interaction (PPI) graph, and PPI+singleton graph have been designed and implemented. They were trained and tested on combined 10,340 cancer samples and 731 normal tissue samples from The Cancer Genome Atlas (TCGA) dataset. The established GCNN models achieved excellent prediction accuracies (89.9-94.7%) among 34 classes (33 cancer types and a normal group). In silico gene-perturbation experiments were performed on four models based on co-expression graph, co-expression+singleton, PPI graph, and PPI+singleton graphs. The co-expression GCNN model was further interpreted to identify a total of 428 markers genes that drive the classification of 33 cancer types and normal. The concordance of differential expressions of these markers between the represented cancer type and others are confirmed. Successful classification of cancer types and a normal group regardless of normal tissues' origin suggested that the identified markers are cancer-specific rather than tissue-specific.

Conclusion : Novel GCNN models have been established to predict cancer types or normal tissue based on gene expression profiles. We demonstrated the results from the TCGA dataset that these models can produce accurate classification (above 94%), using cancer-specific markers genes. The models and the source codes are publicly available and can be readily adapted to the diagnosis of cancer and other diseases by the data-driven modeling research community.

Ramirez Ricardo, Chiu Yu-Chiao, Hererra Allen, Mostavi Milad, Ramirez Joshua, Chen Yidong, Huang Yufei, Jin Yu-Fang


Cancer classification2, Data-driven model4, Deep learning3, Graph convolutional neural network1, The Cancer Genome Atlas (TCGA)5

General General

Determining Multi-Component Phase Diagrams with Desired Characteristics Using Active Learning.

In Advanced science (Weinheim, Baden-Wurttemberg, Germany)

Herein, we demonstrate how to predict and experimentally validate phase diagrams for multi-component systems from a high-dimensional virtual space of all possible phase diagrams involving several elements based on small existing experimental data. The experimental data for bulk phases for known systems represents a sampling from this space, and screening the space allows multi-component phase diagrams with given design criteria to be built. This approach uses machine learning methods to predict phase diagrams and Bayesian experimental design to minimize experiments for refinement and validation, all within an active learning loop. The approach is proven by predicting and synthesizing the ferroelectric ceramic system (1-ω)(Ba0.61Ca0.28Sr0.11TiO3)-ω(BaTi0.888Zr0.0616Sn0.0028Hf0.0476O3) with a relatively high transition temperature and triple point, as well as the NiTi-based pseudo-binary phase diagram (1-ω)(Ti0.309Ni0.485Hf0.20Zr0.006)-ω(Ti0.309Ni0.485Hf0.07Zr0.068Nb0.068) designed for high transition temperature (ω ⩽ 1). Each phase diagram is validated and optimized through only three new experiments. The complexity of these compounds is beyond the reach of today's computational methods.

Tian Yuan, Yuan Ruihao, Xue Dezhen, Zhou Yumei, Wang Yunfan, Ding Xiangdong, Sun Jun, Lookman Turab


Bayesian optimization, ferroelectrics, machine learning, materials informatics, multi‐component phase diagrams, shape memory alloys

General General

Unlabeled Far-Field Deeply Subwavelength Topological Microscopy (DSTM).

In Advanced science (Weinheim, Baden-Wurttemberg, Germany)

A nonintrusive far-field optical microscopy resolving structures at the nanometer scale would revolutionize biomedicine and nanotechnology but is not yet available. Here, a new type of microscopy is introduced, which reveals the fine structure of an object through its far-field scattering pattern under illumination with light containing deeply subwavelength singularity features. The object is reconstructed by a neural network trained on a large number of scattering events. In numerical experiments on imaging of a dimer, resolving powers better than λ/200, i.e., two orders of magnitude beyond the conventional "diffraction limit" of λ/2, are demonstrated. It is shown that imaging is tolerant to noise and is achievable with low dynamic range light intensity detectors. Proof-of-principle experimental confirmation of DSTM is provided with a training set of small size, yet sufficient to achieve resolution five-fold better than the diffraction limit. In principle, deep learning reconstruction can be extended to objects of random shape and shall be particularly efficient in microscopy of a priori known shapes, such as those found in routine tasks of machine vision, smart manufacturing, and particle counting for life sciences applications.

Pu Tanchao, Ou Jun-Yu, Savinov Vassili, Yuan Guanghui, Papasimakis Nikitas, Zheludev Nikolay I


machine learning, microscopy, superoscillations, superresolution, unlabeled