Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

Personalized online ensemble machine learning with applications for dynamic data streams.

In Statistics in medicine
In this work we introduce the personalized online super learner (POSL), an online personalizable ensemble machine learning algorithm for streaming data. POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized, that is, optimization with respect to subject ID, to many individuals, that is, optimization with respect to common baseline covariates. As an online algorithm, POSL learns in real time. As a super learner, POSL is grounded in statistical optimality theory and can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed/offline algorithms that are not updated during POSL's fitting procedure, pooled algorithms that learn from many individuals' time series, and individualized algorithms that learn from within a single time series. POSL's ensembling of the candidates can depend on the amount of data collected, the stationarity of the time series, and the mutual characteristics of a group of time series. Depending on the underlying data-generating process and the information available in the data, POSL is able to adapt to learning across samples, through time, or both. For a range of simulations that reflect realistic forecasting scenarios and in a medical application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for both short and long time series, and it's able to adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time series dynamically enter and exit.
Malenica Ivana, Phillips Rachael V, Chambaz Antoine, Hubbard Alan E, Pirracchio Romain, van der Laan Mark J

2023-Mar-30

machine learning, online learning, personalized medicine, streaming data, time series

General

General

ExamPle: Explainable deep learning framework for the prediction of plant small secreted peptides.

In Bioinformatics (Oxford, England)

MOTIVATION : Plant Small Secreted Peptides (SSPs) play an important role in plant growth, development, and plant-microbe interactions. Therefore, the identification of SSPs is essential for revealing the functional mechanisms. Over the last few decades, machine learning-based methods have been developed, accelerating the discovery of SSPs to some extent. However, existing methods highly depend on handcrafted feature engineering, which easily ignores the latent feature representations and impacts the predictive performance.

RESULTS : Here, we propose ExamPle, a novel deep learning model using Siamese network and multi-view representation for the explainable prediction of the plant SSPs. Benchmarking comparison results show that our ExamPle performs significantly better than existing methods in the prediction of plant SSPs. Also, our model shows excellent feature extraction ability by using dimension reduction tools. Importantly, by utilizing in silico mutagenesis (ISM) experiments, ExamPle can discover sequence characteristics and identify the contribution of each amino acid. The key novel principle learned by our model is that the head region of the peptide and some specific sequential patterns are strongly associated with the SSPs' functions. Thus, ExamPle is a competitive model and tool for predicting plant SSPs and designing effective plant SSPs.

AVAILABILITY : Our codes and datasets are available at https://github.com/Johnsunnn/ExamPle.

SUPPLEMENTARY INFORMATION : Supplementary data are available at Bioinformatics online.

Li Zhongshen, Jin Junru, Wang Yu, Long Wentao, Ding Yuanhao, Hu Haiyan, Wei Leyi

2023-Mar-10

General

General

A machine learning-derived neuroanatomical pattern predicts delayed reward discounting in the Human Connectome Project Young Adult sample.

In Journal of neuroscience research
Delayed reward discounting (DRD) is defined as the extent to which person favors smaller rewards that are immediately available over larger rewards available in the future. Higher levels of DRD have been identified in individuals with a wide range of clinical disorders. Although there have been studies adopting larger samples and using only gray matter volume to characterize the neuroanatomical correlates of DRD, it is still unclear whether previously identified relationships are generalizable (out-of-sample) and how cortical thickness and cortical surface area contribute to DRD. In this study, using the Human Connectome Project Young Adult dataset (N = 1038), a machine learning cross-validated elastic net regression approach was used to characterize the neuroanatomical pattern of structural magnetic resonance imaging variables associated with DRD. The results revealed a multi-region neuroanatomical pattern predicted DRD and this was robust in a held-out test set (morphometry-only R² = 3.34%, morphometry + demographics R² = 6.96%). The neuroanatomical pattern included regions implicated in the default mode network, executive control network, and salience network. The relationship of these regions with DRD was further supported by univariate linear mixed effects modeling results, in which many of the regions identified as part of this pattern showed significant univariate associations with DRD. Taken together, these findings provide evidence that a machine learning-derived neuroanatomical pattern encompassing various theoretically relevant brain networks produces robustly predicts DRD in a large sample of healthy young adults.
Xu Hui, MacKillop James, Owens Max M

2023-Mar-10

delayed reward discounting, elastic net regression, machine learning, morphometry, neuroanatomy

Radiology

Radiology

Deep Learning Radiomics for the Assessment of Telomerase Reverse Transcriptase Promoter Mutation Status in Patients With Glioblastoma Using Multiparametric MRI.

In Journal of magnetic resonance imaging : JMRI

BACKGROUND : Studies have shown that magnetic resonance imaging (MRI)-based deep learning radiomics (DLR) has the potential to assess glioma grade; however, its role in predicting telomerase reverse transcriptase (TERT) promoter mutation status in patients with glioblastoma (GBM) remains unclear.

PURPOSE : To evaluate the value of deep learning (DL) in multiparametric MRI-based radiomics in identifying TERT promoter mutations in patients with GBM preoperatively.

STUDY TYPE : Retrospective.

POPULATION : A total of 274 patients with isocitrate dehydrogenase-wildtype GBM were included in the study. The training and external validation cohorts included 156 (54.3 ± 12.7 years; 96 males) and 118 (54 .2 ± 13.4 years; 73 males) patients, respectively.

FIELD STRENGTH/SEQUENCE : Axial contrast-enhanced T1-weighted spin-echo inversion recovery sequence (T1CE), T1-weighted spin-echo inversion recovery sequence (T1WI), and T2-weighted spin-echo inversion recovery sequence (T2WI) on 1.5-T and 3.0-T scanners were used in this study.

ASSESSMENT : Overall tumor area regions (the tumor core and edema) were segmented, and the radiomics and DL features were extracted from preprocessed multiparameter preoperative brain MRI images-T1WI, T1CE, and T2WI. A model based on the DLR signature, clinical signature, and clinical DLR (CDLR) nomogram was developed and validated to identify TERT promoter mutation status.

STATISTICAL TESTS : The Mann-Whitney U test, Pearson test, least absolute shrinkage and selection operator, and logistic regression analysis were applied for feature selection and construction of radiomics and DL signatures. Results were considered statistically significant at P-value <0.05.

RESULTS : The DLR signature showed the best discriminative power for predicting TERT promoter mutations, yielding an AUC of 0.990 and 0.890 in the training and external validation cohorts, respectively. Furthermore, the DLR signature outperformed CDLR nomogram (P = 0.670) and significantly outperformed clinical models in the validation cohort.

DATA CONCLUSION : The multiparameter MRI-based DLR signature exhibited a promising performance for the assessment of TERT promoter mutations in patients with GBM, which could provide information for individualized treatment.

LEVEL OF EVIDENCE : 3 TECHNICAL EFFICACY: Stage 2.

Zhang Hongbo, Zhang Hanwen, Zhang Yuze, Zhou Beibei, Wu Lei, Lei Yi, Huang Biao

2023-Mar-10

deep learning radiomics, glioblastoma, multiparametric magnetic resonance imaging, telomerase reverse transcriptase

Surgery

Surgery

Lymph Node Metastases in Papillary Thyroid Carcinoma can be Predicted by a Convolutional Neural Network: a Multi-Institution Study.

In The Annals of otology, rhinology, and laryngology

OBJECTIVES : The presence of nodal metastases in patients with papillary thyroid carcinoma (PTC) has both staging and treatment implications. However, lymph nodes are often not removed during thyroidectomy. Prior work has demonstrated the capability of artificial intelligence (AI) to predict the presence of nodal metastases in PTC based on the primary tumor histopathology alone. This study aimed to replicate these results with multi-institutional data.

METHODS : Cases of conventional PTC were identified from the records of 2 large academic institutions. Only patients with complete pathology data, including at least 3 sampled lymph nodes, were included in the study. Tumors were designated "positive" if they had at least 5 positive lymph node metastases. First, algorithms were trained separately on each institution's data and tested independently on the other institution's data. Then, the data sets were combined and new algorithms were developed and tested. The primary tumors were randomized into 2 groups, one to train the algorithm and another to test it. A low level of supervision was used to train the algorithm. Board-certified pathologists annotated the slides. HALO-AI convolutional neural network and image software was used to perform training and testing. Receiver operator characteristic curves and the Youden J statistic were used for primary analysis.

RESULTS : There were 420 cases used in analyses, 45% of which were negative. The best performing single institution algorithm had an area under the curve (AUC) of 0.64 with a sensitivity and specificity of 65% and 61% respectively, when tested on the other institution's data. The best performing combined institution algorithm had an AUC of 0.84 with a sensitivity and specificity of 68% and 91% respectively.

CONCLUSION : A convolutional neural network can produce an accurate and robust algorithm that is capable of predicting nodal metastases from primary PTC histopathology alone even in the setting of multi-institutional data.

Esce Antoinette, Redemann Jordan P, Olson Garth T, Hanson Joshua A, Agarwal Shweta, Yenwongfai Leonard, Ferreira Juanita, Boyd Nathan H, Bocklage Thèrése, Martin David R

2023-Mar-10

head and neck and endocrine pathology, head and neck surgery, histopathology, lymphatic metastasis, papillary thyroid cancer

oncology

Oncology

Development of multiple AI pipelines that predict neoadjuvant chemotherapy response of breast cancer using H&E-stained tissues.

In The journal of pathology. Clinical research
In recent years, the treatment of breast cancer has advanced dramatically and neoadjuvant chemotherapy (NAC) has become a common treatment method, especially for locally advanced breast cancer. However, other than the subtype of breast cancer, no clear factor indicating sensitivity to NAC has been identified. In this study, we attempted to use artificial intelligence (AI) to predict the effect of preoperative chemotherapy from hematoxylin and eosin images of pathological tissue obtained from needle biopsies prior to chemotherapy. Application of AI to pathological images typically uses a single machine-learning model such as support vector machines (SVMs) or deep convolutional neural networks (CNNs). However, cancer tissues are extremely diverse and learning with a realistic number of cases limits the prediction accuracy of a single model. In this study, we propose a novel pipeline system that uses three independent models each focusing on different characteristics of cancer atypia. Our system uses a CNN model to learn structural atypia from image patches and SVM and random forest models to learn nuclear atypia from fine-grained nuclear features extracted by image analysis methods. It was able to predict the NAC response with 95.15% accuracy on a test set of 103 unseen cases. We believe that this AI pipeline system will contribute to the adoption of personalized medicine in NAC therapy for breast cancer.
Shen Bin, Saito Akira, Ueda Ai, Fujita Koji, Nagamatsu Yui, Hashimoto Mikihiro, Kobayashi Masaharu, Mirza Aashiq H, Graf Hans Peter, Cosatto Eric, Hazama Shoichi, Nagano Hiroaki, Sato Eiichi, Matsubayashi Jun, Nagao Toshitaka, Cheng Esther, Hoda Syed Af, Ishikawa Takashi, Kuroda Masahiko

2023-Mar-10

artificial intelligence, breast cancer, digital pathology, neoadjuvant chemotherapy