Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Surgery Surgery

Computer Vision Analysis of Specimen Mammography to Predict Margin Status.

In medRxiv : the preprint server for health sciences

Intra-operative specimen mammography is a valuable tool in breast cancer surgery, providing immediate assessment of margins for a resected tumor. However, the accuracy of specimen mammography in detecting microscopic margin positivity is low. We sought to develop a deep learning-based model to predict the pathologic margin status of resected breast tumors using specimen mammography. A dataset of specimen mammography images matched with pathology reports describing margin status was collected. Models pre-trained on radiologic images were developed and compared with models pre-trained on non-medical images. Model performance was assessed using sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). The dataset included 821 images and 53% had positive margins. For three out of four model architectures tested, models pre-trained on radiologic images outperformed domain-agnostic models. The highest performing model, InceptionV3, showed a sensitivity of 84%, a specificity of 42%, and AUROC of 0.71. These results compare favorably with the published literature on surgeon and radiologist interpretation of specimen mammography. With further development, these models could assist clinicians with identifying positive margins intra-operatively and decrease the rate of positive margins and re-operation in breast-conserving surgery.

Chen Kevin A, Kirchoff Kathryn E, Butler Logan R, Holloway Alexa D, Kapadia Muneera R, Gallagher Kristalyn K, Gomez Shawn M

2023-Mar-08

General General

Sequence characteristics and an accurate model of high-occupancy target loci in the human genome.

In bioRxiv : the preprint server for biology

Enhancers and promoters are considered to be bound by a small set of TFs in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays have expanded. Particularly, high-occupancy target (HOT) loci attract hundreds of TFs with seemingly no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used 1,003 TF ChIP-seq datasets in HepG2, K562, and H1 cells to analyze the patterns of ChIP-seq peak co-occurrence combined with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions and determined that HOT promoters regulate housekeeping genes, whereas the HOT enhancers are involved in extremely tissue-specific processes. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of them being ultraconserved regions. Sequence-based classification of HOT loci using deep learning suggests that their formation is driven by sequence features, and the density of ChIP-seq peaks correlates with sequence features. Based on their affinities to bind to promoters and enhancers, we detected five distinct clusters of TFs that form the core of the HOT loci. We also observed that HOT loci are enriched in 3D chromatin hubs and disease-causal variants. In a challenge to the classical model of enhancer activity, we report an abundance of HOT loci in human genome and a commitment of 51% of all ChIP-seq binding events to HOT locus formation and propose a model of HOT locus formation based on the existence of large transcriptional condensates.

Hudaiberdiev Sanjarbek, Ovcharenko Ivan

2023-Feb-05

General General

MOVER: Medical Informatics Operating Room Vitals and Events Repository.

In medRxiv : the preprint server for health sciences

Artificial Intelligence (AI) holds great promise for transforming the healthcare industry. However, despite its potential, AI is yet to see widespread deployment in clinical settings in significant part due to the lack of publicly available clinical data and the lack of transparency in the published AI algorithms. There are few clinical data repositories publicly accessible to researchers to train and test AI algorithms, and even fewer that contain specialized data from the perioperative setting. To address this gap, we present and release the Medical Informatics Operating Room Vitals and Events Repository, which includes data from 58,799 unique patients and 83,468 surgeries collected from the UCI Medical Center over a period of seven years. MOVER is freely available to all researchers who sign a data usage agreement, and we hope that it will accelerate the integration of AI into healthcare settings, ultimately leading to improved patient outcomes.

Samad Muntaha, Rinehart Joseph, Angel Mirana, Kanomata Yuzo, Baldi Pierre, Cannesson Maxime

2023-Mar-12

General General

Preterm Preeclampsia Risk Modelling: Examining Hemodynamic, Biochemical, and Biophysical Markers Prior to Pregnancy.

In medRxiv : the preprint server for health sciences

UNLABELLED : Preeclampsia (PE) is a leading cause of maternal and perinatal death globally and can lead to unplanned preterm birth. Predicting risk for preterm or early-onset PE, has been investigated primarily after conception, and particularly in the early and mid-gestational periods. However, there is a distinct clinical advantage in identifying individuals at risk for PE prior to conception, when a wider array of preventive interventions are available. In this work, we leverage machine learning techniques to identify potential pre-pregnancy biomarkers of PE in a sample of 80 women, 10 of whom were diagnosed with preterm preeclampsia during their subsequent pregnancy. We explore biomarkers derived from hemodynamic, biophysical, and biochemical measurements and several modeling approaches. A support vector machine (SVM) optimized with stochastic gradient descent yields the highest overall performance with ROC AUC and detection rates up to .88 and .70, respectively on subject-wise cross validation. The best performing models leverage biophysical and hemodynamic biomarkers. While preliminary, these results indicate the promise of a machine learning based approach for detecting individuals who are at risk for developing preterm PE before they become pregnant. These efforts may inform gestational planning and care, reducing risk for adverse PE-related outcomes.

CLINICAL RELEVANCE : This work considers the development and optimization of pre-pregnancy biomarkers for improving the identification of preterm (early-onset) preeclampsia risk prior to conception.

Loftness Bryn C, Bernstein Ira, McBride Carole A, Cheney Nick, McGinnis Ellen W, McGinnis Ryan S

2023-Mar-06

General General

Synthesize Extremely High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model.

In Research square

Synthetic electronic health records (EHRs) that are both realistic and preserve privacy can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. However, generating high-fidelity and granular electronic health record (EHR) data in its original, highly-dimensional form poses challenges for existing methods due to the complexities inherent in high-dimensional data. In this paper, we propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR, which preserve the statistical properties of real EHR and can be used to train accurate ML models without privacy concerns. Our HALO method, designed as a hierarchical autoregressive model, generates a probability density function of medical codes, clinical visits, and patient records, allowing for the generation of realistic EHR data in its original, unaggregated form without the need for variable selection or aggregation. Additionally, our model also produces high-quality continuous variables in a longitudinal and probabilistic manner. We conducted extensive experiments and demonstrate that HALO can generate high-fidelity EHR data with high-dimensional disease code probabilities ( d ≈ 10,000), disease code co-occurrence probabilities within a visit ( d ≈ 1,000,000), and conditional probabilities across consecutive visits ( d ≈ 5,000,000) and achieve above 0.9 R 2 correlation in comparison to real EHR data. In comparison to the leading baseline, HALO improves predictive modeling by over 17% in its predictive accuracy and perplexity on a hold-off test set of real EHR data. This performance then enables downstream ML models trained on its synthetic data to achieve comparable accuracy to models trained on real data (0.938 area under the ROC curve with HALO data vs. 0.943 with real data). Finally, using a combination of real and synthetic data enhances the accuracy of ML models beyond that achieved by using only real EHR data.

Theodorou Brandon, Xiao Cao, Sun Jimeng

2023-Mar-10

General General

Direct prediction of Homologous Recombination Deficiency from routine histology in ten different tumor types with attention-based Multiple Instance Learning: a development and validation study.

In medRxiv : the preprint server for health sciences

BACKGROUND : Homologous Recombination Deficiency (HRD) is a pan-cancer predictive biomarker that identifies patients who benefit from therapy with PARP inhibitors (PARPi). However, testing for HRD is highly complex. Here, we investigated whether Deep Learning can predict HRD status solely based on routine Hematoxylin & Eosin (H&E) histology images in ten cancer types.

METHODS : We developed a fully automated deep learning pipeline with attention-weighted multiple instance learning (attMIL) to predict HRD status from histology images. A combined genomic scar HRD score, which integrated loss of heterozygosity (LOH), telomeric allelic imbalance (TAI) and large-scale state transitions (LST) was calculated from whole genome sequencing data for n=4,565 patients from two independent cohorts. The primary statistical endpoint was the Area Under the Receiver Operating Characteristic curve (AUROC) for the prediction of genomic scar HRD with a clinically used cutoff value.

RESULTS : We found that HRD status is predictable in tumors of the endometrium, pancreas and lung, reaching cross-validated AUROCs of 0.79, 0.58 and 0.66. Predictions generalized well to an external cohort with AUROCs of 0.93, 0.81 and 0.73 respectively. Additionally, an HRD classifier trained on breast cancer yielded an AUROC of 0.78 in internal validation and was able to predict HRD in endometrial, prostate and pancreatic cancer with AUROCs of 0.87, 0.84 and 0.67 indicating a shared HRD-like phenotype is across tumor entities.

CONCLUSION : In this study, we show that HRD is directly predictable from H&E slides using attMIL within and across ten different tumor types.

Lavinia Loeffler Chiara Maria, El Nahhas Omar S M, Muti Hannah Sophie, Seibel Tobias, Cifci Didem, van Treeck Marko, Gustav Marco, Carrero Zunamys I, Gaisa Nadine T, Lehmann Kjong-Van, Leary Alexandra, Selenica Pier, Reis-Filho Jorge S, Bruechle Nadina Ortiz, Kather Jakob Nikolas

2023-Mar-10