Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

Machine learning-based automated sponge cytology for screening of oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction: a nationwide, multicohort, prospective study.

In The lancet. Gastroenterology & hepatology

BACKGROUND : Oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction have a dismal prognosis, and early detection is key to reduce mortality. However, early detection depends on upper gastrointestinal endoscopy, which is not feasible to implement at a population level. We aimed to develop and validate a fully automated machine learning-based prediction tool integrating a minimally invasive sponge cytology test and epidemiological risk factors for screening of oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction before endoscopy.

METHODS : For this multicohort prospective study, we enrolled participants aged 40-75 years undergoing upper gastrointestinal endoscopy screening at 39 tertiary or secondary hospitals in China for model training and testing, and included community-based screening participants for further validation. All participants underwent questionnaire surveys, sponge cytology testing, and endoscopy in a sequential manner. We trained machine learning models to predict a composite outcome of high-grade lesions, defined as histology-confirmed high-grade intraepithelial neoplasia and carcinoma of the oesophagus and oesophagogastric junction. The predictive features included 105 cytological and 15 epidemiological features. Model performance was primarily measured with the area under the receiver operating characteristic curve (AUROC) and average precision. The performance measures for cytologists with AI assistance was also assessed.

FINDINGS : Between Jan 1, 2021, and June 30, 2022, 17 498 eligible participants were involved in model training and validation. In the testing set, the AUROC of the final model was 0·960 (95% CI 0·937 to 0·977) and the average precision was 0·482 (0·470 to 0·494). The model achieved similar performance to consensus of cytologists with AI assistance (AUROC 0·955 [95% CI 0·933 to 0·975]; p=0·749; difference 0·005, 95% CI, -0·011 to 0·020). If the model-defined moderate-risk and high-risk groups were referred for endoscopy, the sensitivity was 94·5% (95% CI 88·8 to 97·5), specificity was 91·9% (91·2 to 92·5), and the predictive positive value was 18·4% (15·6 to 21·6), and 90·3% of endoscopies could be avoided. Further validation in community-based screening showed that the AUROC of the model was 0·964 (95% CI 0·920 to 0·990), and 92·8% of endoscopies could be avoided after risk stratification.

INTERPRETATION : We developed a prediction tool with favourable performance for screening of oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction. This approach could prevent the need for endoscopy screening in many low-risk individuals and ensure resource optimisation by prioritising high-risk individuals.

FUNDING : Science and Technology Commission of Shanghai Municipality.

Gao Ye, Xin Lei, Lin Han, Yao Bin, Zhang Tao, Zhou Ai-Jun, Huang Shu, Wang Jian-Hua, Feng Ya-Dong, Yao Sheng-Hua, Guo Yan, Dang Tong, Meng Xian-Mei, Yang Zeng-Zhou, Jia Wan-Qi, Pang Hui-Fang, Tian Xiao-Juan, Deng Bin, Wang Jun-Ping, Fan Wen-Chuan, Wang Jun, Shi Li-Hong, Yang Guan-Yu, Sun Chang, Wang Wei, Zang Jun-Cai, Li Song-Yang, Shi Rui-Hua, Li Zhao-Shen, Wang Luo-Wei

2023-Mar-14

General

General

A gated temporal-separable attention network for EEG-based depression recognition.

In Computers in biology and medicine
Depression, a common mental illness worldwide, needs to be diagnosed and cured at an early stage. To assist clinical diagnosis, an EEG-based deep learning frame, which is named the gated temporal-separable attention network (GTSAN), is proposed in this paper for depression recognition. GTSAN model extracts discriminative information from EEG recordings in two ways. On the one hand, the gated recurrent unit (GRU) is used in the GTSAN model to capture the EEG historical information to form the features. On the other hand, the model digs the multilevel information by using an improved version of temporal convolutional network (TCN), called temporal-separable convolution network (TSCN), which applies causal convolution and dilated convolution to extract features from fine to coarse scales. The TSCN and GRU features can be produced in parallel. Finally, the new model introduces the attention mechanism to give different weights to these features, allowing them to be used to identify depression more effectively. Experiments on two depression datasets have demonstrated that the proposed model can mine potential depression patterns in data and obtain high recognition accuracies. The proposed model provides the possibility of using an EEG-based system to assist for diagnosing depression.
Yang Lijun, Wang Yixin, Zhu Xiangru, Yang Xiaohui, Zheng Chen

2023-Mar-11

Attention mechanism, Depression recognition, Electroencephalography (EEG), Gated recurrent unit, Temporal convolution network

Ophthalmology

Ophthalmology

A new ultra-wide-field fundus dataset to diabetic retinopathy grading using hybrid preprocessing methods.

In Computers in biology and medicine
Diabetic retinopathy(DR) is a common early diabetic complication and one of the main causes of blindness. In clinical diagnosis and treatment, regular screening with fundus imaging is an effective way to prevent the development of DR. However, the regular fundus images used in most DR screening work have a small imaging range, narrow field of vision, and can not contain more complete lesion information, which leads to less ideal automatic DR grading results. In order to improve the accuracy of DR grading, we establish a dataset containing 101 ultra-wide-field(UWF) DR fundus images and propose a deep learning(DL) automatic classification method based on a new preprocessing method. The emerging UWF fundus images have the advantages of a large imaging range and wide field of vision and contain more information about the lesions. In data preprocessing, we design a data denoising method for UWF images and use data enhancement methods to improve their contrast and brightness to improve the classification effect. In order to verify the efficiency of our dataset and the effectiveness of our preprocessing method, we design a series of experiments including a variety of DL classification models. The experimental results show that we can achieve high classification accuracy by using only the backbone model. The most basic ResNet50 model reaches an average of classification accuracy(ACA) 0.66, Macro F1 0.6559, and Kappa 0.58. The best-performing Swin-S model reaches ACA 0.72, Macro F1 0.7018, and Kappa 0.65. DR grading using UWF images can achieve higher accuracy and efficiency, which has practical significance and value in clinical applications.
Liu Haomiao, Teng Lu, Fan Linhua, Sun Yabin, Li Huiying

2023-Mar-08

Convolutional neural network, Deep learning, Diabetic retinopathy grading, Ultra-wide-field, Vision transformer

General

General

Identification of medicinal plant-based phytochemicals as a potential inhibitor for SARS-CoV-2 main protease (Mpro) using molecular docking and deep learning methods.

In Computers in biology and medicine
Highly transmissive and rapidly evolving Coronavirus disease-2019 (COVID-19), a viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), triggered a global pandemic, which is one of the most researched viruses in the academia. Effective drugs to treat people with COVID-19 have yet to be developed to reduce mortality and transmission. Studies on the SARS-CoV-2 virus identified that its main protease (M^pro) might be a potential therapeutic target for drug development, as this enzyme plays a key role in viral replication. In search of potential inhibitors of M^pro, we developed a phytochemical library consisting of 2431 phytochemicals from 104 Korean medicinal plants that exhibited medicinal and antioxidant properties. The library was screened by molecular docking, followed by revalidation by re-screening with a deep learning method. Recurrent Neural Networks (RNN) computing system was used to develop an inhibitory predictive model using SARS coronavirus M^pro dataset. It was deployed to screen the top 12 compounds based on their docked binding affinity that ranged from -8.0 to -8.9 kcal/mol. The top two lead compounds, Catechin gallate and Quercetin 3-O-malonylglucoside, were selected depending on inhibitory potency against M^pro. Interactions with the target protein active sites, including His41, Met49, Cys145, Met165, and Thr190 were also examined. Molecular dynamics simulation was performed to analyze root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (RG), solvent accessible surface area (SASA), and number of hydrogen bonds. Results confirmed the inflexible nature of the docked complexes. Absorption, distribution, metabolism, excretion, and toxicity (ADMET), as well as bioactivity prediction confirmed the pharmaceutical activities of the lead compound. Findings of this research might help scientists to optimize compatible drugs for the treatment of COVID-19 patients.
Hossain Alomgir, Rahman Md Ekhtiar, Rahman Md Siddiqur, Nasirujjaman Khondokar, Matin Mohammad Nurul, Faruqe Md Omar, Rabbee Muhammad Fazle

2023-Mar-11

Catechin gallate, Deep learning, Main protease, Molecular docking, SARS-CoV-2

General

General

Upscaling dispersivity for conservative solute transport in naturally fractured media.

In Water research
Physical heterogeneities are prevalent features of fracture systems and significantly impact transport processes in aquifers across different spatiotemporal scales. Upscaling solute transport parameter is an effective way of quantifying parameter variability in heterogeneous aquifers including fractured media. This paper develops conceptual models for upscaling conservative transport parameters in fracture media. The focus is on upscaling dispersivity. Lagrangian-based transport model (LBTM) for dispersivity upscaling are derived for the solute transport in two-dimensional fractures surrounded by an impermeable matrix. The LBTM is validated against the random walk particle tracking (RWPT) model, which enables highly efficient and accurate predictions of conservative solute transport. The results show that the derived scale-dependent analytical expressions are in excellent agreement with RWPT model results. In addition, LBTM results are also compared to experimental results from the observed breakthrough curve of a conservative solute transport through a single natural fracture within a granite core. Comparing results from the LBTM and transport experiment shows that LBTM based estimated dispersivity is 10.55% higher than the measured value. Errors introduced by the experiments, the conceptual assumptions in deriving models, and the heterogeneities of fracture apertures not fully sampled by measuring instruments are main factor for such discrepancy. The sensitivity analysis indicates that the longitudinal and transverse dispersivities are positively related to the integral scale and the variance of the log-fracture aperture. The longitudinal dispersivity is strongly contolled by the variance of the log-fracture aperture. The LBTM may be useful for directly predicting solute transports, requiring only the acquisition of fractured geostatistical data. This work provides a better understanding of transport processes in fractured media which ultimately control water quality across scales.
Jia Sida, Dai Zhenxue, Zhou Zhichao, Ling Hui, Yang Zhijie, Qi Linlin, Wang Zihao, Zhang Xiaoying, Thanh Hung Vo, Soltanian Mohamad Reza

2023-Mar-15

Dispersivity, Fracture, Lagrangian-based model, Random walk particle tracking, Sensitivity analysis, Transport experiment

General

General

Feature selection for driving style and skill clustering using naturalistic driving data and driving behavior questionnaire.

In Accident; analysis and prevention
Driver's driving style and driving skill have an essential influence on traffic safety, capacity, and efficiency. Through clustering algorithms, extensive studies explore the risk assessment, classification, and recognition of driving style and driving skill. This paper proposes a feature selection method for driving style and skill clustering. We create a supervised machine learning model of driver identification for driving behavior data with no ground truth labels on driving style and driving skill. The key features are selected based on permutation importance with the underlying assumption that the key features for clustering should also play an important role in characterizing individual drivers. The proposed method is tested on naturalistic driving data. We introduce 18 feature extraction methods and generate 72 feature candidates. We find five key features: longitudinal acceleration, frequency centroid of longitudinal acceleration, shape factor of lateral acceleration, root mean square of lateral acceleration, and standard deviation of speed. With the key features, drivers are clustered into three groups: novice, experienced cautious, and experienced reckless drivers. The ability of each feature to describe individuals' driving style and skill is evaluated using the Driving Behavior Questionnaire (DBQ). For each group, the driver's response to DBQ key questions and their distribution of key features are analyzed to prove the validity of the feature selection result. The feature selection method has the potential to understand driver's characteristics better and improve the accuracy of driving behavior modeling.
Chen Yao, Wang Ke, Lu Jian John

2023-Mar-15

Cluster analysis, Driving behavior, Driving behavior questionnaire, Driving skill, Driving style, Feature selection