Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit

ArXiv Preprint

Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts to the Reddit forum r/COVID19Positive contain first-hand accounts from COVID-19 positive patients, giving insight into personal struggles with the virus. These posts often feature a temporal structure indicating the number of days after developing symptoms the text refers to. Using topic modelling and sentiment analysis, we quantify the change in discussion of COVID-19 throughout individuals' experiences for the first 14 days since symptom onset. Discourse on early symptoms such as fever, cough, and sore throat was concentrated towards the beginning of the posts, while language indicating breathing issues peaked around ten days. Some conversation around critical cases was also identified and appeared at a roughly constant rate. We identified two clear clusters of positive and negative emotions associated with the evolution of these symptoms and mapped their relationships. Our results provide a perspective on the patient experience of COVID-19 that complements other medical data streams and can potentially reveal when mental health issues might appear.

Curtis Murray, Lewis Mitchell, Jonathan Tuke, Mark Mackay


oncology Oncology

Use of Deep Learning to Develop and Analyze Computational Hematoxylin and Eosin Staining of Prostate Core Biopsy Images for Tumor Diagnosis.

In JAMA network open

Importance : Histopathological diagnoses of tumors from tissue biopsy after hematoxylin and eosin (H&E) dye staining is the criterion standard for oncological care, but H&E staining requires trained operators, dyes and reagents, and precious tissue samples that cannot be reused.

Objectives : To use deep learning algorithms to develop models that perform accurate computational H&E staining of native nonstained prostate core biopsy images and to develop methods for interpretation of H&E staining deep learning models and analysis of computationally stained images by computer vision and clinical approaches.

Design, Setting, and Participants : This cross-sectional study used hundreds of thousands of native nonstained RGB (red, green, and blue channel) whole slide image (WSI) patches of prostate core tissue biopsies obtained from excess tissue material from prostate core biopsies performed in the course of routine clinical care between January 7, 2014, and January 7, 2017, at Brigham and Women's Hospital, Boston, Massachusetts. Biopsies were registered with their H&E-stained versions. Conditional generative adversarial neural networks (cGANs) that automate conversion of native nonstained RGB WSI to computational H&E-stained images were then trained. Deidentified whole slide images of prostate core biopsy and medical record data were transferred to Massachusetts Institute of Technology, Cambridge, for computational research. Results were shared with physicians for clinical evaluations. Data were analyzed from July 2018 to February 2019.

Main Outcomes and Measures : Methods for detailed computer vision image analytics, visualization of trained cGAN model outputs, and clinical evaluation of virtually stained images were developed. The main outcome was interpretable deep learning models and computational H&E-stained images that achieved high performance in these metrics.

Results : Among 38 patients who provided samples, single core biopsy images were extracted from each whole slide, resulting in 102 individual nonstained and H&E dye-stained image pairs that were compared with matched computationally stained and unstained images. Calculations showed high similarities between computationally and H&E dye-stained images, with a mean (SD) structural similarity index (SSIM) of 0.902 (0.026), Pearson correlation coefficient (PCC) of 0.962 (0.096), and peak signal to noise ratio (PSNR) of 22.821 (1.232) dB. A second cGAN performed accurate computational destaining of H&E-stained images back to their native nonstained form, with a mean (SD) SSIM of 0.900 (0.030), PCC of 0.963 (0.011), and PSNR of 25.646 (1.943) dB compared with native nonstained images. A single blind prospective study computed approximately 95% pixel-by-pixel overlap among prostate tumor annotations provided by 5 board certified pathologists on computationally stained images, compared with those on H&E dye-stained images. This study also used the first visualization and explanation of neural network kernel activation maps during H&E staining and destaining of RGB images by cGANs. High similarities between kernel activation maps of computationally and H&E-stained images (mean-squared errors <0.0005) provide additional mathematical and mechanistic validation of the staining system.

Conclusions and Relevance : These findings suggest that computational H&E staining of native unlabeled RGB images of prostate core biopsy could reproduce Gleason grade tumor signatures that were easily assessed and validated by clinicians. Methods for benchmarking, visualization, and clinical validation of deep learning models and virtually H&E-stained images communicated in this study have wide applications in clinical informatics and oncology research. Clinical researchers may use these systems for early indications of possible abnormalities in native nonstained tissue biopsies prior to histopathological workflows.

Rana Aman, Lowe Alarice, Lithgow Marie, Horback Katharine, Janovitz Tyler, Da Silva Annacarolina, Tsai Harrison, Shanmugam Vignesh, Bayat Akram, Shah Pratik


General General

Placental Flattening via Volumetric Parameterization.

In Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

We present a volumetric mesh-based algorithm for flattening the placenta to a canonical template to enable effective visualization of local anatomy and function. Monitoring placental function in vivo promises to support pregnancy assessment and to improve care outcomes. We aim to alleviate visualization and interpretation challenges presented by the shape of the placenta when it is attached to the curved uterine wall. To do so, we flatten the volumetric mesh that captures placental shape to resemble the well-studied ex vivo shape. We formulate our method as a map from the in vivo shape to a flattened template that minimizes the symmetric Dirichlet energy to control distortion throughout the volume. Local injectivity is enforced via constrained line search during gradient descent. We evaluate the proposed method on 28 placenta shapes extracted from MRI images in a clinical study of placental function. We achieve sub-voxel accuracy in mapping the boundary of the placenta to the template while successfully controlling distortion throughout the volume. We illustrate how the resulting mapping of the placenta enhances visualization of placental anatomy and function. Our implementation is freely available at

Abulnaga S Mazdak, Turk Esra Abaci, Bessmeltsev Mikhail, Grant P Ellen, Solomon Justin, Golland Polina


Anatomy visualization, Fetal MRI, Flattening, Injective maps, Placenta, Volumetric mesh parameterization

General General

Unsupervised Deep Learning for Bayesian Brain MRI Segmentation.

In Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Probabilistic atlas priors have been commonly used to derive adaptive and robust brain MRI segmentation algorithms. Widely-used neuroimage analysis pipelines rely heavily on these techniques, which are often computationally expensive. In contrast, there has been a recent surge of approaches that leverage deep learning to implement segmentation tools that are computationally efficient at test time. However, most of these strategies rely on learning from manually annotated images. These supervised deep learning methods are therefore sensitive to the intensity profiles in the training dataset. To develop a deep learning-based segmentation model for a new image dataset (e.g., of different contrast), one usually needs to create a new labeled training dataset, which can be prohibitively expensive, or rely on suboptimal ad hoc adaptation or augmentation approaches. In this paper, we propose an alternative strategy that combines a conventional probabilistic atlas-based segmentation with deep learning, enabling one to train a segmentation model for new MRI scans without the need for any manually segmented images. Our experiments include thousands of brain MRI scans and demonstrate that the proposed method achieves good accuracy for a brain MRI segmentation task for different MRI contrasts, requiring only approximately 15 seconds at test time on a GPU.

Dalca Adrian V, Yu Evan, Golland Polina, Fischl Bruce, Sabuncu Mert R, Iglesias Juan Eugenio


Bayesian Modeling, Brain MRI, Convolutional Neural Networks, Deep Learning, Segmentation, Unsupervised learning

General General

Disease Knowledge Transfer across Neurodegenerative Diseases.

In Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

We introduce Disease Knowledge Transfer (DKT), a novel technique for transferring biomarker information between related neurodegenerative diseases. DKT infers robust multimodal biomarker trajectories in rare neurodegenerative diseases even when only limited, unimodal data is available, by transferring information from larger multimodal datasets from common neurodegenerative diseases. DKT is a joint-disease generative model of biomarker progressions, which exploits biomarker relationships that are shared across diseases. Our proposed method allows, for the first time, the estimation of plausible multimodal biomarker trajectories in Posterior Cortical Atrophy (PCA), a rare neurodegenerative disease where only unimodal MRI data is available. For this we train DKT on a combined dataset containing subjects with two distinct diseases and sizes of data available: 1) a larger, multimodal typical AD (tAD) dataset from the TADPOLE Challenge, and 2) a smaller unimodal Posterior Cortical Atrophy (PCA) dataset from the Dementia Research Centre (DRC), for which only a limited number of Magnetic Resonance Imaging (MRI) scans are available. Although validation is challenging due to lack of data in PCA, we validate DKT on synthetic data and two patient datasets (TADPOLE and PCA cohorts), showing it can estimate the ground truth parameters in the simulation and predict unseen biomarkers on the two patient datasets. While we demonstrated DKT on Alzheimer's variants, we note DKT is generalisable to other forms of related neurodegenerative diseases. Source code for DKT is available online:

Marinescu Răzvan V, Lorenzi Marco, Blumberg Stefano B, Young Alexandra L, Planell-Morell Pere, Oxtoby Neil P, Eshaghi Arman, Yong Keir X, Crutch Sebastian J, Golland Polina, Alexander Daniel C


Alzheimer’s Disease, Disease Progression Modelling, Manifold Learning, Posterior Cortical Atrophy, Transfer Learning

General General

Ensembles of Hydrophobicity Scales as Potent Classifiers for Chimeric Virus-Like Particle Solubility - An Amino Acid Sequence-Based Machine Learning Approach.

In Frontiers in bioengineering and biotechnology

Virus-like particles (VLPs) are protein-based nanoscale structures that show high potential as immunotherapeutics or cargo delivery vehicles. Chimeric VLPs are decorated with foreign peptides resulting in structures that confer immune responses against the displayed epitope. However, insertion of foreign sequences often results in insoluble proteins, calling for methods capable of assessing a VLP candidate's solubility in silico. The prediction of VLP solubility requires a model that can identify critical hydrophobicity-related parameters, distinguishing between VLP-forming aggregation and aggregation leading to insoluble virus protein clusters. Therefore, we developed and implemented a soft ensemble vote classifier (sEVC) framework based on chimeric hepatitis B core antigen (HBcAg) amino acid sequences and 91 publicly available hydrophobicity scales. Based on each hydrophobicity scale, an individual decision tree was induced as classifier in the sEVC. An embedded feature selection algorithm and stratified sampling proved beneficial for model construction. With a learning experiment, model performance in the space of model training set size and number of included classifiers in the sEVC was explored. Additionally, seven models were created from training data of 24-384 chimeric HBcAg constructs, which were validated by 100-fold Monte Carlo cross-validation. The models predicted external test sets of 184-544 chimeric HBcAg constructs. Best models showed a Matthew's correlation coefficient of >0.6 on the validation and the external test set. Feature selection was evaluated for classifiers with best and worst performance in the chimeric HBcAg VLP solubility scenario. Analysis of the associated hydrophobicity scales allowed for retrieval of biological information related to the mechanistic backgrounds of VLP solubility, suggesting a special role of arginine for VLP assembly and solubility. In the future, the developed sEVC could further be applied to hydrophobicity-related problems in other domains, such as monoclonal antibodies.

Vormittag Philipp, Klamp Thorsten, Hubbuch Jürgen


feature selection, hydrophobicity, hydrophobicity scales, machine learning, solubility, virus-like particles