Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Prediction of Protein Solubility Based on Sequence Feature Fusion and DDcCNN.

In Interdisciplinary sciences, computational life sciences

BACKGROUND : Prediction of protein solubility is an indispensable prerequisite for pharmaceutical research and production. The general and specific objective of this work is to design a new model for predicting protein solubility by using protein sequence feature fusion and deep dual-channel convolutional neural networks (DDcCNN) to improve the performance of existing prediction models.

METHODS : The redundancy of raw protein is reduced by CD-HIT. The four subsequences are built from protein sequence: one global and three locals. The global subsequence is the entire protein sequence, and these local subsequences are obtained by moving a sliding window with some rules. Using G-gap to extract the features of the above four subsequences, a mixed matrix is constructed as the input of one channel which is composed of three-layer convolutional operating. Additional features are extracted by SCRATCH tool as input of another channel, which is consist of a single convolution in order to find hidden relationships and improve the accuracy of predictor. The outputs of two parallel channels are concatenated as the input of the hidden layer. And the prediction of protein solubility is obtained in the output layer. The best protein solubility prediction model is obtained by doing some comparative experiments of different frameworks.

RESULTS : The performance indicators of DDcCNN model (our designed) are as follows: accuracy of 77.82%, Matthew's correlation coefficient of 0.57, sensitivity of 76.13% and specificity of 79.32%. The results of some comparative experiments show that the overall performance of DDcCNN model is better than existing models (GCNN, LCNN and PCNN). The related models and data are publicly deposited at .

CONCLUSION : The satisfactory performance of DDcCNN model reveals that these features and flexible computational methodologies can reinforce the existing prediction models for better prediction of protein solubility could be applied in several applications, such as to preselect initial targets that are soluble or to alter solubility of target proteins, thus can help to reduce the production cost.

Wang Xianfang, Liu Yifeng, Du Zhiyong, Zhu Mingdong, Kaushik Aman Chandra, Jiang Xue, Wei Dongqing


Deep dual-channel convolutional neural network, Deep learning, Feature fusion, Protein solubility

oncology Oncology

DeepHBV: a deep learning model to predict hepatitis B virus (HBV) integration sites.

In BMC ecology and evolution

BACKGROUND : The hepatitis B virus (HBV) is one of the main causes of viral hepatitis and liver cancer. HBV integration is one of the key steps in the virus-promoted malignant transformation.

RESULTS : An attention-based deep learning model, DeepHBV, was developed to predict HBV integration sites. By learning local genomic features automatically, DeepHBV was trained and tested using HBV integration site data from the dsVIS database. Initially, DeepHBV showed an AUROC of 0.6363 and an AUPR of 0.5471 for the dataset. The integration of genomic features of repeat peaks and TCGA Pan-Cancer peaks significantly improved model performance, with AUROCs of 0.8378 and 0.9430 and AUPRs of 0.7535 and 0.9310, respectively. The transcription factor binding sites (TFBS) were significantly enriched near the genomic positions that were considered. The binding sites of the AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra, and Foxo3 were highlighted by DeepHBV in both the dsVIS and VISDB datasets, revealing a novel integration preference for HBV.

CONCLUSIONS : DeepHBV is a useful tool for predicting HBV integration sites, revealing novel insights into HBV integration-related carcinogenesis.

Wu Canbiao, Guo Xiaofang, Li Mengyuan, Shen Jingxian, Fu Xiayu, Xie Qingyu, Hou Zeliang, Zhai Manman, Qiu Xiaofan, Cui Zifeng, Xie Hongxian, Qin Pengmin, Weng Xuchu, Hu Zheng, Liang Jiuxing


Bioinformatics, Deep learning, Genomic features, HBV integration sites

Public Health Public Health

Identification of Variable Importance for Predictions of Mortality From COVID-19 Using AI Models for Ontario, Canada.

In Frontiers in public health

The Severe Acute Respiratory Syndrome Coronavirus 2 pandemic has challenged medical systems to the brink of collapse around the globe. In this paper, logistic regression and three other artificial intelligence models (XGBoost, Artificial Neural Network and Random Forest) are described and used to predict mortality risk of individual patients. The database is based on census data for the designated area and co-morbidities obtained using data from the Ontario Health Data Platform. The dataset consisted of more than 280,000 COVID-19 cases in Ontario for a wide-range of age groups; 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90+. Findings resulting from using logistic regression, XGBoost, Artificial Neural Network and Random Forest, all demonstrate excellent discrimination (area under the curve for all models exceeded 0.948 with the best performance being 0.956 for an XGBoost model). Based on SHapley Additive exPlanations values, the importance of 24 variables are identified, and the findings indicated the highest importance variables are, in order of importance, age, date of test, sex, and presence/absence of chronic dementia. The findings from this study allow the identification of out-patients who are likely to deteriorate into severe cases, allowing medical professionals to make decisions on timely treatments. Furthermore, the methodology and results may be extended to other public health regions.

Snider Brett, McBean Edward A, Yawney John, Gadsden S Andrew, Patel Bhumi


COVID-19, SHapley, XGBoost, artificial intelligence, mortality

Radiology Radiology

Application of machine learning analysis based on diffusion tensor imaging to identify REM sleep behavior disorder.

In Sleep & breathing = Schlaf & Atmung

PURPOSE : We evaluated the feasibility of machine learning analysis using diffusion tensor imaging (DTI) parameters to identify patients with idiopathic rapid eye movement (REM) sleep behavior disorder (RBD). We hypothesized that patients with idiopathic RBD could be identified via machine learning analysis based on DTI.

METHODS : We enrolled 20 patients with newly diagnosed idiopathic RBD at a tertiary hospital. We also included 20 healthy subjects as a control group. All of the subjects underwent DTI. We obtained the conventional DTI parameters and structural connectomic profiles from the DTI. We investigated the differences in conventional DTI measures and structural connectomic profiles between patients with idiopathic RBD and healthy controls. We then used machine learning analysis using a support vector machine (SVM) algorithm to identify patients with idiopathic RBD using conventional DTI and structural connectomic profiles.

RESULTS : Several regions showed significant differences in conventional DTI measures and structural connectomic profiles between patients with idiopathic RBD and healthy controls. The SVM classifier based on conventional DTI measures revealed an accuracy of 87.5% and an area under the curve of 0.900 to identify patients with idiopathic RBD. Another SVM classifier based on structural connectomic profiles yielded an accuracy of 75.0% and an area under the curve of 0.833.

CONCLUSION : Our findings demonstrate the feasibility of machine learning analysis based on DTI to identify patients with idiopathic RBD. The conventional DTI parameters might be more important than the structural connectomic profiles in identifying patients with idiopathic RBD.

Lee Dong Ah, Lee Ho-Joon, Kim Hyung Chan, Park Kang Min


Diffusion tensor imaging, Machine learning, REM sleep

Radiology Radiology

Automated Vertebral Segmentation and Measurement of Vertebral Compression Ratio Based on Deep Learning in X-Ray Images.

In Journal of digital imaging

Vertebral compression fracture is a deformity of vertebral bodies found on lateral spine images. To diagnose vertebral compression fracture, accurate measurement of vertebral compression ratio is required. Therefore, rapid and accurate segmentation of vertebra is important for measuring the vertebral compression ratio. In this study, we used 339 data of lateral thoracic and lumbar vertebra images for training and testing a deep learning model for segmentation. The result of segmentation by the model was compared with the manual measurement, which is performed by a specialist. As a result, the average sensitivity of the dataset was 0.937, specificity was 0.995, accuracy was 0.992, and dice similarity coefficient was 0.929, area under the curve of receiver operating characteristic curve was 0.987, and the precision recall curve was 0.916. The result of correlation analysis shows no statistical difference between the manually measured vertebral compression ratio and the vertebral compression ratio using the data segmented by the model in which the correlation coefficient was 0.929. In addition, the Bland-Altman plot shows good equivalence in which VCR values are in the area within average ± 1.96. In conclusion, vertebra segmentation based on deep learning is expected to be helpful for the measurement of vertebral compression ratio.

Kim Dong Hyun, Jeong Jin Gyo, Kim Young Jae, Kim Kwang Gi, Jeon Ji Young


Deep learning, Segmentation, Vertebral compression fracture, Vertebral compression ratio

Ophthalmology Ophthalmology

Validation and Clinical Applicability of Whole-Volume Automated Segmentation of Optical Coherence Tomography in Retinal Disease Using Deep Learning.

In JAMA ophthalmology ; h5-index 58.0

Importance : Quantitative volumetric measures of retinal disease in optical coherence tomography (OCT) scans are infeasible to perform owing to the time required for manual grading. Expert-level deep learning systems for automatic OCT segmentation have recently been developed. However, the potential clinical applicability of these systems is largely unknown.

Objective : To evaluate a deep learning model for whole-volume segmentation of 4 clinically important pathological features and assess clinical applicability.

Design, Setting, Participants : This diagnostic study used OCT data from 173 patients with a total of 15 558 B-scans, treated at Moorfields Eye Hospital. The data set included 2 common OCT devices and 2 macular conditions: wet age-related macular degeneration (107 scans) and diabetic macular edema (66 scans), covering the full range of severity, and from 3 points during treatment. Two expert graders performed pixel-level segmentations of intraretinal fluid, subretinal fluid, subretinal hyperreflective material, and pigment epithelial detachment, including all B-scans in each OCT volume, taking as long as 50 hours per scan. Quantitative evaluation of whole-volume model segmentations was performed. Qualitative evaluation of clinical applicability by 3 retinal experts was also conducted. Data were collected from June 1, 2012, to January 31, 2017, for set 1 and from January 1 to December 31, 2017, for set 2; graded between November 2018 and January 2020; and analyzed from February 2020 to November 2020.

Main Outcomes and Measures : Rating and stack ranking for clinical applicability by retinal specialists, model-grader agreement for voxelwise segmentations, and total volume evaluated using Dice similarity coefficients, Bland-Altman plots, and intraclass correlation coefficients.

Results : Among the 173 patients included in the analysis (92 [53%] women), qualitative assessment found that automated whole-volume segmentation ranked better than or comparable to at least 1 expert grader in 127 scans (73%; 95% CI, 66%-79%). A neutral or positive rating was given to 135 model segmentations (78%; 95% CI, 71%-84%) and 309 expert gradings (2 per scan) (89%; 95% CI, 86%-92%). The model was rated neutrally or positively in 86% to 92% of diabetic macular edema scans and 53% to 87% of age-related macular degeneration scans. Intraclass correlations ranged from 0.33 (95% CI, 0.08-0.96) to 0.96 (95% CI, 0.90-0.99). Dice similarity coefficients ranged from 0.43 (95% CI, 0.29-0.66) to 0.78 (95% CI, 0.57-0.85).

Conclusions and Relevance : This deep learning-based segmentation tool provided clinically useful measures of retinal disease that would otherwise be infeasible to obtain. Qualitative evaluation was additionally important to reveal clinical applicability for both care management and research.

Wilson Marc, Chopra Reena, Wilson Megan Z, Cooper Charlotte, MacWilliams Patricia, Liu Yun, Wulczyn Ellery, Florea Daniela, Hughes Cían O, Karthikesalingam Alan, Khalid Hagar, Vermeirsch Sandra, Nicholson Luke, Keane Pearse A, Balaskas Konstantinos, Kelly Christopher J