Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Radiology Radiology

Predicting the invasiveness of lung adenocarcinomas appearing as ground-glass nodule on CT scan using multi-task learning and deep radiomics.

In Translational lung cancer research ; h5-index 38.0

Background : Due to different treatment method and prognosis of different subtypes of lung adenocarcinomas appearing as ground-glass nodules (GGNs) on computed tomography (CT) scan, it is important to classify invasive adenocarcinomas from non-invasive adenocarcinomas. The purpose of this paper is to build and evaluate the performance of deep learning networks on the differentiation the invasiveness of lung adenocarcinoma appearing as GGNs.

Methods : This retrospective study included 886 GGNs from 794 pathological confirmed patients with lung adenocarcinoma for training and testing the proposed networks. Three deep learning networks, namely XimaNet (deep learning-based classification model), XimaSharp (classification and nodule segmentation model), and Deep-RadNet (deep learning and radiomics combined classification model, i.e., deep radiomics) were built. Three classification tasks, namely task 1: classification of AAH/AIS and MIA, task 2: classification of MIA and IAC, and task 3: classification of non-invasive adenocarcinomas and invasive adenocarcinomas (AAH/AIS&MIA and IAC) were conducted to evaluate the model performance. The Z-test was used to compare the model performance.

Results : The AUC for classification of AAH/AIS with MIA were 0.891, 0.841 and 0.779 for Deep-RadNet, XimaNet and XimaSharp respectively. The AUC for classification of MIA with IAC were 0.889, 0.785 and 0.778 for three networks and AUC for classification of AAH/AIS&MIA with IAC were 0.941, 0.892 and 0.827 respectively. The performance of deep_RadNet was better than the other two models with the Z-test (P<0.05).

Conclusions : Deep-RadNet with the visual heat map could evaluate the invasiveness of GGNs accurately and intuitively, providing a theoretical basis for individualized and accurate medical treatment of patients with GGNs.

Wang Xiang, Li Qingchu, Cai Jiali, Wang Wei, Xu Peng, Zhang Yiqian, Fang Qu, Fu Chicheng, Fan Li, Xiao Yi, Liu Shiyuan


Deep learning, computed tomography (CT), ground glass opacity, pulmonary adenocarcinomas, radiomics, tumor invasiveness

oncology Oncology

Mass spectrometry-based serum proteomic signature as a potential biomarker for survival in patients with non-small cell lung cancer receiving immunotherapy.

In Translational lung cancer research ; h5-index 38.0

Background : VeriStrat test is a serum assay which uses a mass spectrometry (MS)-based proteomic signature derived from machine learning. It is currently used as a prognostic marker for patients with non-small cell lung cancer (NSCLC) receiving chemotherapy. However, little is known about its role for NSCLC patients receiving immune checkpoint inhibitors (ICIs).

Methods : This is a retrospective study that includes 47 patients with advanced stage NSCLC without an activating EGFR mutation, who underwent the VeriStrat test from 2016 to 2018. Spectra from blood samples were evaluated to assign patients into the VeriStrat 'Good' (VS-G) or VeriStrat 'Poor' (VS-P) risk group. The clinical outcomes of 32 patients who received programmed cell death 1 (PD-1) inhibitors nivolumab or pembrolizumab were analyzed by VeriStrat status.

Results : The VS-G group demonstrated significantly higher progression-free survival (PFS) and overall survival (OS) compared to the VS-P group among overall NSCLC patients regardless of treatment (median PFS of 7.1 vs. 4.2 months, P=0.013, and median OS, not reached vs. 17.2 months, P=0.012). Among NSCLC patients treated with ICIs, VS-G classification was associated with significantly increased PFS in comparison to VS-P classification (median PFS of 6.2 vs. 3.0 months, P=0.012), while the differences in OS trended towards significance (median OS, not reached vs. 16.5 months P=0.076). Multivariate analysis showed that the VeriStrat status was significantly correlated with PFS and OS in NSCLC patients treated with ICIs (P=0.017, P=0.034, respectively).

Conclusions : MS-based serum proteomic signature has potential as a biomarker for survival outcome in NSCLC patients receiving immunotherapy.

Chae Young Kwang, Kim Won Bin, Davis Andrew A, Park Lee Chun, Anker Jonathan F, Simon Nicholas I, Rhee Kyunghoon, Song Junho, Cho Anderson, Chang Sangmin, Ko Taeyeong, Oh Michael, Bhave Manali, Viveiros Pedro


Non-small cell lung cancer (NSCLC), VeriStrat test, immunotherapy, programmed death-1 (PD-1), serum proteomic test

General General

Examining the representativeness of a virtual reality environment for simulation of tennis performance.

In Journal of sports sciences ; h5-index 52.0

There has been a growing interest in using virtual reality (VR) for training perceptual-cognitive skill in sport. For VR training to effectively simulate real-world tennis performance, it must recreate the contextual information and movement behaviours present in the real-world environment. It is therefore critical to assess the representativeness of VR prior to implementing skill training interventions. We constructed a VR tennis environment designed for training perceptual-cognitive skill, with the aim of assessing its representativeness and validating its use. Participants movement behaviours were compared when playing tennis in VR and real-world environments. When performing groundstrokes, participants frequently used the same stance in VR as they did in the real-world condition. Participants experienced a high sense of presence in VR, evident through the factors of spatial presence, engagement and ecological validity being high, with minimal negative effects found. We conclude that Tennis VR is sufficiently representative of real-world tennis. Our discussion focuses on the opportunity for training perceptual-cognitive skill and the potential for skill transfer.

Le Noury Peter, Buszard Tim, Reid Machar, Farrow Damian


Skill, artificial intelligence, interactive training, perception action coupling

General General

Short-term forecasting of the coronavirus pandemic.

In International journal of forecasting

We have been publishing real-time forecasts of confirmed cases and deaths for COVID-19 from mid-March 2020 onwards, published at These forecasts are short-term statistical extrapolations of past and current data. They assume that the underlying trend is informative of short term developments, without requiring other assumptions of how the SARS-CoV-2 virus is spreading, or whether preventative policies are effective. As such they are complementary to forecasts from epidemiological models. The forecasts are based on extracting trends from windows of the data, applying machine learning, and then computing forecasts by applying some constraints to this flexible extracted trend. The methods have previously been applied to various other time series data and have performed well. They are also effective in this setting, providing better forecasts in the earlier stages than some epidemiological models.

Doornik Jurgen A, Castle Jennifer L, Hendry David F


Automatic forecasting, COVID-19, Epidemiology, Forecast averaging, Forecasting, Machine learning, Smoothing, Time series, Trend indicator saturation

General General

Public Health Informatics: Proposing Causal Sequence of Death Using Neural Machine Translation

ArXiv Preprint

Each year there are nearly 57 million deaths around the world, with over 2.7 million in the United States. Timely, accurate and complete death reporting is critical in public health, as institutions and government agencies rely on death reports to analyze vital statistics and to formulate responses to communicable diseases. Inaccurate death reporting may result in potential misdirection of public health policies. Determining the causes of death is, nevertheless, challenging even for experienced physicians. To facilitate physicians in accurately reporting causes of death, we present an advanced AI approach to determine a chronically ordered sequence of clinical conditions that lead to death, based on decedent's last hospital admission discharge record. The sequence of clinical codes on the death report is named as causal chain of death, coded in the tenth revision of International Statistical Classification of Diseases (ICD-10); the priority-ordered clinical conditions on the discharge record are coded in ICD-9. We identify three challenges in proposing the causal chain of death: two versions of coding system in clinical codes, medical domain knowledge conflict, and data interoperability. To overcome the first challenge in this sequence-to-sequence problem, we apply neural machine translation models to generate target sequence. We evaluate the quality of generated sequences with the BLEU (BiLingual Evaluation Understudy) score and achieve 16.44 out of 100. To address the second challenge, we incorporate expert-verified medical domain knowledge as constraint in generating output sequence to exclude infeasible causal chains. Lastly, we demonstrate the usability of our work in a Fast Healthcare Interoperability Resources (FHIR) interface to address the third challenge.

Yuanda Zhu, Ying Sha, Hang Wu, Mai Li, Ryan A. Hoffman, May D. Wang


General General

Simulating highly disturbed vegetation distribution: the case of China's Jing-Jin-Ji region.

In PeerJ

Background : Simulating vegetation distribution is an effective method for identifying vegetation distribution patterns and trends. The primary goal of this study was to determine the best simulation method for a vegetation in an area that is heavily affected by human disturbance.

Methods : We used climate, topographic, and spectral data as the input variables for four machine learning models (random forest (RF), decision tree (DT), support vector machine (SVM), and maximum likelihood classification (MLC)) on three vegetation classification units (vegetation group (I), vegetation type (II), and formation and subformation (III)) in Jing-Jin-Ji, one of China's most developed regions. We used a total of 2,789 vegetation points for model training and 974 vegetation points for model assessment.

Results : Our results showed that the RF method was the best of the four models, as it could effectively simulate vegetation distribution in all three classification units. The DT method could only simulate vegetation distribution in units I and II, while the other two models could not simulate vegetation distribution in any of the units. Kappa coefficients indicated that the DT and RF methods had more accurate predictions for units I and II than for unit III. The three vegetation classification units were most affected by six variables: three climate variables (annual mean temperature, mean diurnal range, and annual precipitation), one geospatial variable (slope), and two spectral variables (Mid-infrared ratio of winter vegetation index and brightness index of summer vegetation index). Variables Combination 7, including annual mean temperature, annual precipitation, mean diurnal range and precipitation of driest month, produced the highest simulation accuracy.

Conclusions : We determined that the RF model was the most effective for simulating vegetation distribution in all classification units present in the Jing-Jin-Ji region. The RF model produced high accuracy vegetation distributions in classification units I and II, but relatively low accuracy in classification unit III. Four climate variables were sufficient for vegetation distribution simulation in such region.

Yi Sangui, Zhou Jihua, Lai Liming, Du Hui, Sun Qinglin, Yang Liu, Liu Xin, Liu Benben, Zheng Yuanrun


** Important predictor variable, Jing-Jin-Ji region, Vegetation classification unit, Vegetation distribution model**