Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

In Situ Spatiotemporal SERS Measurements and Multivariate Analysis of Virally Infected Bacterial Biofilms Using Nanolaminated Plasmonic Crystals.

In ACS sensors
In situ spatiotemporal biochemical characterization of the activity of living multicellular biofilms under external stimuli remains a significant challenge. Surface-enhanced Raman spectroscopy (SERS), combining the molecular fingerprint specificity of vibrational spectroscopy with the hotspot sensitivity of plasmonic nanostructures, has emerged as a promising noninvasive bioanalysis technique for living systems. However, most SERS devices do not allow reliable long-term spatiotemporal SERS measurements of multicellular systems because of challenges in producing spatially uniform and mechanically stable SERS hotspot arrays to interface with large cellular networks. Furthermore, very few studies have been conducted for multivariable analysis of spatiotemporal SERS datasets to extract spatially and temporally correlated biological information from multicellular systems. Here, we demonstrate in situ label-free spatiotemporal SERS measurements and multivariate analysis of Pseudomonas syringae biofilms during development and upon infection by bacteriophage virus Phi6 by employing nanolaminate plasmonic crystal SERS devices to interface mechanically stable, uniform, and spatially dense hotspot arrays with the P. syringae biofilms. We exploited unsupervised multivariate machine learning methods, including principal component analysis (PCA) and hierarchical cluster analysis (HCA), to resolve the spatiotemporal evolution and Phi6 dose-dependent changes of major Raman peaks originating from biochemical components in P. syringae biofilms, including cellular components, extracellular polymeric substances (EPS), metabolite molecules, and cell lysate-enriched extracellular media. We then employed supervised multivariate analysis using linear discriminant analysis (LDA) for the multiclass classification of Phi6 dose-dependent biofilm responses, demonstrating the potential for viral infection diagnosis. We envision extending the in situ spatiotemporal SERS method to monitor dynamic, heterogeneous interactions between viruses and bacterial networks for applications such as phage-based anti-biofilm therapy development and continuous pathogenic virus detection.
Garg Aditya, Nam Wonil, Wang Wei, Vikesland Peter, Zhou Wei

2023-Mar-09

bacterial biofilms, multivariate analysis, spatiotemporal SERS, surface-enhanced Raman spectroscopy, virus detection

General

General

Are Deep Learning Structural Models Sufficiently Accurate for Virtual Screening? Application of Docking Algorithms to AlphaFold2 Predicted Structures.

In Journal of chemical information and modeling
Machine learning-based protein structure prediction algorithms, such as RosettaFold and AlphaFold2, have greatly impacted the structural biology field, arousing a fair amount of discussion around their potential role in drug discovery. While there are few preliminary studies addressing the usage of these models in virtual screening, none of them focus on the prospect of hit-finding in a real-world virtual screen with a model based on low prior structural information. In order to address this, we have developed an AlphaFold2 version where we exclude all structural templates with more than 30% sequence identity from the model-building process. In a previous study, we used those models in conjunction with state-of-the-art free energy perturbation methods and demonstrated that it is possible to obtain quantitatively accurate results. In this work, we focus on using these structures in rigid receptor-ligand docking studies. Our results indicate that using out-of-the-box Alphafold2 models is not an ideal scenario for virtual screening campaigns; in fact, we strongly recommend to include some post-processing modeling to drive the binding site into a more realistic holo model.
Díaz-Rovira Anna M, Martín Helena, Beuming Thijs, Díaz Lucía, Guallar Victor, Ray Soumya S

2023-Mar-09

General

General

Biochemical identification of prepubertal boys with Klinefelter syndrome by combined reproductive hormone profiling using machine learning.

In Endocrine connections

OBJECTIVE : Klinefelter syndrome (KS) is the most common sex chromosome disorder and genetic cause of infertility in males. A highly variable phenotype contributes to the fact that a large proportion of cases are never diagnosed. Typical hallmarks in adults include small testes and azoospermia which may prompt biochemical evaluation that typically shows extremely high FSH and low/undetectable inhibin B serum concentrations. However, in prepubertal KS individual biochemical parameters are largely overlapping those of prepubertal controls. We aimed to characterize clinical profiles of prepubertal boys with KS in relation to controls, and to develop a novel biochemical classification model to identify KS before puberty.

METHODS : Retrospective, longitudinal data from 15 prepubertal boys with KS and data from 1475 controls were used to calculate age- and sex-adjusted standard deviation scores (SDS) for height and serum concentrations of reproductive hormones and used to infer a decision tree classification model for KS.

RESULTS : Individual reproductive hormones were low but within reference ranges and did not discriminate KS from controls. Clinical and biochemical profiles including age- and sex-adjusted SDS from multiple reference curves provided input data to train a 'random forest' machine learning (ML) model for detection of KS. Applied to unseen data, the ML model achieved a classification accuracy of 78% (95% CI, 61% - 94%).

CONCLUSIONS : Supervised ML applied to clinically relevant variables enabled computational classification of control and KS profiles. The application of age- and sex-adjusted SDS provided robust predictions irrespective of age. Specialized ML models applied to combined reproductive hormone concentrations may be useful diagnostic tools to improve the identification of prepubertal boys with KS.

Madsen Andre, Juul Anders, Aksglaede Lise

2023-Mar-01

oncology

Oncology

Development and validation of an ensemble machine-learning model for predicting early mortality among patients with bone metastases of hepatocellular carcinoma.

In Frontiers in oncology

PURPOSE : Using an ensemble machine learning technique that incorporates the results of multiple machine learning algorithms, the study's objective is to build a reliable model to predict the early mortality among hepatocellular carcinoma (HCC) patients with bone metastases.

METHODS : We extracted a cohort of 124,770 patients with a diagnosis of hepatocellular carcinoma from the Surveillance, Epidemiology, and End Results (SEER) program and enrolled a cohort of 1897 patients who were diagnosed as having bone metastases. Patients with a survival time of 3 months or less were considered to have had early death. To compare patients with and without early mortality, subgroup analysis was used. Patients were randomly divided into two groups: a training cohort (n = 1509, 80%) and an internal testing cohort (n = 388, 20%). In the training cohort, five machine learning techniques were employed to train and optimize models for predicting early mortality, and an ensemble machine learning technique was used to generate risk probability in a way of soft voting, and it was able to combine the results from the multiply machine learning algorithms. The study employed both internal and external validations, and the key performance indicators included the area under the receiver operating characteristic curve (AUROC), Brier score, and calibration curve. Patients from two tertiary hospitals were chosen as the external testing cohorts (n = 98). Feature importance and reclassification were both operated in the study.

RESULTS : The early mortality was 55.5% (1052/1897). Eleven clinical characteristics were included as input features of machine learning models: sex (p = 0.019), marital status (p = 0.004), tumor stage (p = 0.025), node stage (p = 0.001), fibrosis score (p = 0.040), AFP level (p = 0.032), tumor size (p = 0.001), lung metastases (p < 0.001), cancer-directed surgery (p < 0.001), radiation (p < 0.001), and chemotherapy (p < 0.001). Application of the ensemble model in the internal testing population yielded an AUROC of 0.779 (95% confidence interval [CI]: 0.727-0.820), which was the largest AUROC among all models. Additionally, the ensemble model (0.191) outperformed the other five machine learning models in terms of Brier score. In terms of decision curves, the ensemble model also showed favorable clinical usefulness. External validation showed similar results; with an AUROC of 0.764 and Brier score of 0.195, the prediction performance was further improved after revision of the model. Feature importance demonstrated that the top three most crucial features were chemotherapy, radiation, and lung metastases based on the ensemble model. Reclassification of patients revealed a substantial difference in the two risk groups' actual probabilities of early mortality (74.38% vs. 31.35%, p < 0.001). Patients in the high-risk group had significantly shorter survival time than patients in the low-risk group (p < 0.001), according to the Kaplan-Meier survival curve.

CONCLUSIONS : The ensemble machine learning model exhibits promising prediction performance for early mortality among HCC patients with bone metastases. With the aid of routinely accessible clinical characteristics, this model can be a trustworthy prognostic tool to predict the early death of those patients and facilitate clinical decision-making.

Long Ze, Yi Min, Qin Yong, Ye Qianwen, Che Xiaotong, Wang Shengjie, Lei Mingxing

2023

bone metastases, early mortality, ensemble model, hepatocellular carcinoma, machine learning

General

General

A hybrid CNN and ensemble model for COVID-19 lung infection detection on chest CT scans.

In PloS one ; h5-index 176.0
COVID-19 is highly infectious and causes acute respiratory disease. Machine learning (ML) and deep learning (DL) models are vital in detecting disease from computerized chest tomography (CT) scans. The DL models outperformed the ML models. For COVID-19 detection from CT scan images, DL models are used as end-to-end models. Thus, the performance of the model is evaluated for the quality of the extracted feature and classification accuracy. There are four contributions included in this work. First, this research is motivated by studying the quality of the extracted feature from the DL by feeding these extracted to an ML model. In other words, we proposed comparing the end-to-end DL model performance against the approach of using DL for feature extraction and ML for the classification of COVID-19 CT scan images. Second, we proposed studying the effect of fusing extracted features from image descriptors, e.g., Scale-Invariant Feature Transform (SIFT), with extracted features from DL models. Third, we proposed a new Convolutional Neural Network (CNN) to be trained from scratch and then compared to the deep transfer learning on the same classification problem. Finally, we studied the performance gap between classic ML models against ensemble learning models. The proposed framework is evaluated using a CT dataset, where the obtained results are evaluated using five different metrics The obtained results revealed that using the proposed CNN model is better than using the well-known DL model for the purpose of feature extraction. Moreover, using a DL model for feature extraction and an ML model for the classification task achieved better results in comparison to using an end-to-end DL model for detecting COVID-19 CT scan images. Of note, the accuracy rate of the former method improved by using ensemble learning models instead of the classic ML models. The proposed method achieved the best accuracy rate of 99.39%.
Akl Ahmed A, Hosny Khalid M, Fouda Mostafa M, Salah Ahmad

2023

Pathology

Pathology

Deep Learning for Predicting Metastasis on Melanoma WSIs

ArXiv Preprint
Northern Europe has the second highest mortality rate of melanoma globally. In 2020, the mortality rate of melanoma rose to 1.9 per 100 000 habitants. Melanoma prognosis is based on a pathologist's subjective visual analysis of the patient's tumor. This methodology is heavily time-consuming, and the prognosis variability among experts is notable, drastically jeopardizing its reproducibility. Thus, the need for faster and more reproducible methods arises. Machine learning has paved its way into digital pathology, but so far, most contributions are on localization, segmentation, and diagnostics, with little emphasis on prognostics. This paper presents a convolutional neural network (CNN) method based on VGG16 to predict melanoma prognosis as the presence of metastasis within five years. Patches are extracted from regions of interest from Whole Slide Images (WSIs) at different magnification levels used in model training and validation. Results infer that utilizing WSI patches at 20x magnification level has the best performance, with an F1 score of 0.7667 and an AUC of 0.81.
Christopher Andreassen, Saul Fuster, Helga Hardardottir, Emiel A. M. Janssen, Kjersti Engan

2023-03-10