Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVES : Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research.

MATERIALS AND METHODS : We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language.

RESULTS : We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena.

DISCUSSION : Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods.

CONCLUSIONS : Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization.

Newman-Griffis Denis, Divita Guy, Desmet Bart, Zirikly Ayah, Rosé Carolyn P, Fosler-Lussier Eric

2020-Dec-15

Unified Medical Language System, controlled, machine learning, natural language processing, semantics, vocabulary

General General

The default network of the human brain is associated with perceived social isolation.

In Nature communications ; h5-index 260.0

Humans survive and thrive through social exchange. Yet, social dependency also comes at a cost. Perceived social isolation, or loneliness, affects physical and mental health, cognitive performance, overall life expectancy, and increases vulnerability to Alzheimer's disease-related dementias. Despite severe consequences on behavior and health, the neural basis of loneliness remains elusive. Using the UK Biobank population imaging-genetics cohort (n = ~40,000, aged 40-69 years when recruited, mean age = 54.9), we test for signatures of loneliness in grey matter morphology, intrinsic functional coupling, and fiber tract microstructure. The loneliness-linked neurobiological profiles converge on a collection of brain regions known as the 'default network'. This higher associative network shows more consistent loneliness associations in grey matter volume than other cortical brain networks. Lonely individuals display stronger functional communication in the default network, and greater microstructural integrity of its fornix pathway. The findings fit with the possibility that the up-regulation of these neural circuits supports mentalizing, reminiscence and imagination to fill the social void.

Spreng R Nathan, Dimas Emile, Mwilambwe-Tshilobo Laetitia, Dagher Alain, Koellinger Philipp, Nave Gideon, Ong Anthony, Kernbach Julius M, Wiecki Thomas V, Ge Tian, Li Yue, Holmes Avram J, Yeo B T Thomas, Turner Gary R, Dunbar Robin I M, Bzdok Danilo

2020-12-15

General General

Emerging Materials for Neuromorphic Devices and Systems.

In iScience

Neuromorphic devices and systems have attracted attention as next-generation computing due to their high efficiency in processing complex data. So far, they have been demonstrated using both machine-learning software and complementary metal-oxide-semiconductor-based hardware. However, these approaches have drawbacks in power consumption and learning speed. An energy-efficient neuromorphic computing system requires hardware that can mimic the functions of a brain. Therefore, various materials have been introduced for the development of neuromorphic devices. Here, recent advances in neuromorphic devices are reviewed. First, the functions of biological synapses and neurons are discussed. Also, deep neural networks and spiking neural networks are described. Then, the operation mechanism and the neuromorphic functions of emerging devices are reviewed. Finally, the challenges and prospects for developing neuromorphic devices that use emerging materials are discussed.

Kim Min-Kyu, Park Youngjun, Kim Ik-Jyae, Lee Jang-Sik

2020-Dec-18

Devices, Electronic Materials, Materials Design, Memory Structure

General General

Revealing Epigenetic Factors of circRNA Expression by Machine Learning in Various Cellular Contexts.

In iScience

Circular RNAs (circRNAs) have been identified as naturally occurring RNAs that are highly represented in the eukaryotic transcriptome. Although a large number of circRNAs have been reported, the underlying regulatory mechanism of circRNAs biogenesis remains largely unknown. Here, we integrated in-depth multi-omics data including epigenome, transcriptome, and non-coding RNA and identified candidate circRNAs in six cellular contexts. Next, circRNAs were divided into two classes (high versus low) with different expression levels. Machine learning models were constructed that predicted circRNA expression levels based on 11 different histone modifications and host gene expression. We found that the models achieve great accuracy in predicting high versus low expressed circRNAs. Furthermore, the expression levels of host genes of circRNAs, H3k36me3, H3k79me2, and H4k20me1 contributed greatly to the classification models in six cellular contexts. In summary, all these results suggest that epigenetic modifications, particularly histone modifications, can effectively predict expression levels of circRNAs.

Zhang Mengying, Xu Kang, Fu Limei, Wang Qi, Chang Zhenghong, Zou Haozhe, Zhang Yan, Li Yongsheng

2020-Dec-18

Bioinformatics, Omics, Transcriptomics

Internal Medicine Internal Medicine

Machine Learning Analysis of the Bleomycin Mouse Model Reveals the Compartmental and Temporal Inflammatory Pulmonary Fingerprint.

In iScience

The bleomycin mouse model is the extensively used model to study pulmonary fibrosis; however, the inflammatory cell kinetics and their compartmentalization is still incompletely understood. Here we assembled historical flow cytometry data, totaling 303 samples and 16 inflammatory-cell populations, and applied advanced data modeling and machine learning methods to conclusively detail these kinetics. Three days post-bleomycin, the inflammatory profile was typified by acute innate inflammation, pronounced neutrophilia, especially of SiglecF+ neutrophils, and alveolar macrophage loss. Between 14 and 21 days, rapid responders were increasingly replaced by T and B cells and monocyte-derived alveolar macrophages. Multicolour imaging revealed the spatial-temporal cell distribution and the close association of T cells with deposited collagen. Unbiased immunophenotyping and data modeling exposed the dynamic shifts in immune-cell composition over the course of bleomycin-triggered lung injury. These results and workflow provide a reference point for future investigations and can easily be applied in the analysis of other datasets.

Bordag Natalie, Biasin Valentina, Schnoegl Diana, Valzano Francesco, Jandl Katharina, Nagy Bence M, Sharma Neha, Wygrecka Malgorzata, Kwapiszewska Grazyna, Marsh Leigh M

2020-Dec-18

Artificial Intelligence, Immune Response, Immunology

General General

Cellpose: a generalist algorithm for cellular segmentation.

In Nature methods ; h5-index 152.0

Many biological applications require the segmentation of cell bodies, membranes and nuclei from microscopy images. Deep learning has enabled great progress on this problem, but current methods are specialized for images that have large training datasets. Here we introduce a generalist, deep learning-based segmentation method called Cellpose, which can precisely segment cells from a wide range of image types and does not require model retraining or parameter adjustments. Cellpose was trained on a new dataset of highly varied images of cells, containing over 70,000 segmented objects. We also demonstrate a three-dimensional (3D) extension of Cellpose that reuses the two-dimensional (2D) model and does not require 3D-labeled data. To support community contributions to the training data, we developed software for manual labeling and for curation of the automated results. Periodically retraining the model on the community-contributed data will ensure that Cellpose improves constantly.

Stringer Carsen, Wang Tim, Michaelos Michalis, Pachitariu Marius

2020-Dec-14

General General

High-throughput phenotyping with temporal sequences.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

MATERIALS AND METHODS : We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

RESULTS : Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

DISCUSSION : The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

CONCLUSION : Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Estiri Hossein, Strasser Zachary H, Murphy Shawn N

2020-Dec-14

electronic health records, phenotyping, sequential pattern mining, temporal data representation

oncology Oncology

Automated model versus treating physician for predicting survival time of patients with metastatic cancer.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Being able to predict a patient's life expectancy can help doctors and patients prioritize treatments and supportive care. For predicting life expectancy, physicians have been shown to outperform traditional models that use only a few predictor variables. It is possible that a machine learning model that uses many predictor variables and diverse data sources from the electronic medical record can improve on physicians' performance. For patients with metastatic cancer, we compared accuracy of life expectancy predictions by the treating physician, a machine learning model, and a traditional model.

MATERIALS AND METHODS : A machine learning model was trained using 14 600 metastatic cancer patients' data to predict each patient's distribution of survival time. Data sources included note text, laboratory values, and vital signs. From 2015-2016, 899 patients receiving radiotherapy for metastatic cancer were enrolled in a study in which their radiation oncologist estimated life expectancy. Survival predictions were also made by the machine learning model and a traditional model using only performance status. Performance was assessed with area under the curve for 1-year survival and calibration plots.

RESULTS : The radiotherapy study included 1190 treatment courses in 899 patients. A total of 879 treatment courses in 685 patients were included in this analysis. Median overall survival was 11.7 months. Physicians, machine learning model, and traditional model had area under the curve for 1-year survival of 0.72 (95% CI 0.63-0.81), 0.77 (0.73-0.81), and 0.68 (0.65-0.71), respectively.

CONCLUSIONS : The machine learning model's predictions were more accurate than those of the treating physician or a traditional model.

Gensheimer Michael F, Aggarwal Sonya, Benson Kathryn R K, Carter Justin N, Henry A Solomon, Wood Douglas J, Soltys Scott G, Hancock Steven, Pollom Erqi, Shah Nigam H, Chang Daniel T

2020-Dec-14

machine learning, natural language processing, neoplasms, prognosis, radiotherapy

General General

Learning grain boundary segregation energy spectra in polycrystals.

In Nature communications ; h5-index 260.0

The segregation of solute atoms at grain boundaries (GBs) can profoundly impact the structural properties of metallic alloys, and induce effects that range from strengthening to embrittlement. And, though known to be anisotropic, there is a limited understanding of the variation of solute segregation tendencies across the full, multidimensional GB space, which is critically important in polycrystals where much of that space is represented. Here we develop a machine learning framework that can accurately predict the segregation tendency-quantified by the segregation enthalpy spectrum-of solute atoms at GB sites in polycrystals, based solely on the undecorated (pre-segregation) local atomic environment of such sites. We proceed to use the learning framework to scan across the alloy space, and build an extensive database of segregation energy spectra for more than 250 metal-based binary alloys. The resulting machine learning models and segregation database are key to unlocking the full potential of GB segregation as an alloy design tool, and enable the design of microstructures that maximize the useful impacts of segregation.

Wagih Malik, Larsen Peter M, Schuh Christopher A

2020-Dec-11

General General

Machine learning uncovers independently regulated modules in the Bacillus subtilis transcriptome.

In Nature communications ; h5-index 260.0

The transcriptional regulatory network (TRN) of Bacillus subtilis coordinates cellular functions of fundamental interest, including metabolism, biofilm formation, and sporulation. Here, we use unsupervised machine learning to modularize the transcriptome and quantitatively describe regulatory activity under diverse conditions, creating an unbiased summary of gene expression. We obtain 83 independently modulated gene sets that explain most of the variance in expression and demonstrate that 76% of them represent the effects of known regulators. The TRN structure and its condition-dependent activity uncover putative or recently discovered roles for at least five regulons, such as a relationship between histidine utilization and quorum sensing. The TRN also facilitates quantification of population-level sporulation states. As this TRN covers the majority of the transcriptome and concisely characterizes the global expression state, it could inform research on nearly every aspect of transcriptional regulation in B. subtilis.

Rychel Kevin, Sastry Anand V, Palsson Bernhard O

2020-12-11

General General

A social engineering model for poverty alleviation.

In Nature communications ; h5-index 260.0

Poverty, the quintessential denominator of a developing nation, has been traditionally defined against an arbitrary poverty line; individuals (or countries) below this line are deemed poor and those above it, not so! This has two pitfalls. First, absolute reliance on a single poverty line, based on basic food consumption, and not on total consumption distribution, is only a partial poverty index at best. Second, a single expense descriptor is an exogenous quantity that does not evolve from income-expenditure statistics. Using extensive income-expenditure statistics from India, here we show how a self-consistent endogenous poverty line can be derived from an agent-based stochastic model of market exchange, combining all expenditure modes (basic food, other food and non-food), whose parameters are probabilistically estimated using advanced Machine Learning tools. Our mathematical study establishes a consumption based poverty measure that combines labor, commodity, and asset market outcomes, delivering an excellent tool for economic policy formulation.

Chattopadhyay Amit K, Kumar T Krishna, Rice Iain

2020-12-11

Pathology Pathology

Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images.

In Nature communications ; h5-index 260.0

Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin scanned images from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify TCGA pathologist-annotated tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995 ± 0.008), as well as subtypes with lower but significant accuracy (AUC 0.87 ± 0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88 ± 0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45 ± 0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial behaviors across tumors.

Noorbakhsh Javad, Farahmand Saman, Foroughi Pour Ali, Namburi Sandeep, Caruana Dennis, Rimm David, Soltanieh-Ha Mohammad, Zarringhalam Kourosh, Chuang Jeffrey H

2020-12-11

General General

A mathematical model of local and global attention in natural scene viewing.

In PLoS computational biology

Understanding the decision process underlying gaze control is an important question in cognitive neuroscience with applications in diverse fields ranging from psychology to computer vision. The decision for choosing an upcoming saccade target can be framed as a selection process between two states: Should the observer further inspect the information near the current gaze position (local attention) or continue with exploration of other patches of the given scene (global attention)? Here we propose and investigate a mathematical model motivated by switching between these two attentional states during scene viewing. The model is derived from a minimal set of assumptions that generates realistic eye movement behavior. We implemented a Bayesian approach for model parameter inference based on the model's likelihood function. In order to simplify the inference, we applied data augmentation methods that allowed the use of conjugate priors and the construction of an efficient Gibbs sampler. This approach turned out to be numerically efficient and permitted fitting interindividual differences in saccade statistics. Thus, the main contribution of our modeling approach is two-fold; first, we propose a new model for saccade generation in scene viewing. Second, we demonstrate the use of novel methods from Bayesian inference in the field of scan path modeling.

Malem-Shinitski Noa, Opper Manfred, Reich Sebastian, Schwetlick Lisa, Seelig Stefan A, Engbert Ralf

2020-Dec-14

General General

Mapping molar shapes on signaling pathways.

In PLoS computational biology

A major challenge in evolutionary developmental biology is to understand how genetic mutations underlie phenotypic changes. In principle, selective pressures on the phenotype screen the gene pool of the population. Teeth are an excellent model for understanding evolutionary changes in the genotype-phenotype relationship since they exist throughout vertebrates. Genetically modified mice (mutants) with abnormalities in teeth have been used to explore tooth development. The relationship between signaling pathways and molar shape, however, remains elusive due to the high intrinsic complexity of tooth crowns. This hampers our understanding of the extent to which developmental factors explored in mutants explain developmental and phenotypic variation in natural species that represent the consequence of natural selection. Here we combine a novel morphometric method with two kinds of data mining techniques to extract data sets from the three-dimensional surface models of lower first molars: i) machine learning to maximize classification accuracy of 22 mutants, and ii) phylogenetic signal for 31 Murinae species. Major shape variation among mutants is explained by the number of cusps and cusp distribution on a tooth crown. The distribution of mutant mice in morphospace suggests a nonlinear relationship between the signaling pathways and molar shape variation. Comparative analysis of mutants and wild murines reveals that mutant variation overlaps naturally occurring diversity, including more ancestral and derived morphologies. However, taxa with transverse lophs are not fully covered by mutant variation, suggesting experimentally unexplored developmental factors in the evolutionary radiation of Murines.

Morita Wataru, Morimoto Naoki, Jernvall Jukka

2020-Dec

General General

Five novel clinical phenotypes for critically ill patients with mechanical ventilation in intensive care units: a retrospective and multi database study.

In Respiratory research ; h5-index 45.0

BACKGROUND : Although protective mechanical ventilation (MV) has been used in a variety of applications, lung injury may occur in both patients with and without acute respiratory distress syndrome (ARDS). The purpose of this study is to use machine learning to identify clinical phenotypes for critically ill patients with MV in intensive care units (ICUs).

METHODS : A retrospective cohort study was conducted with 5013 patients who had undergone MV and treatment in the Department of Critical Care Medicine, Peking Union Medical College Hospital. Statistical and machine learning methods were used. All the data used in this study, including demographics, vital signs, circulation parameters and mechanical ventilator parameters, etc., were automatically extracted from the electronic health record (EHR) system. An external database, Medical Information Mart for Intensive Care III (MIMIC III), was used for validation.

RESULTS : Phenotypes were derived from a total of 4009 patients who underwent MV using a latent profile analysis of 22 variables. The associations between the phenotypes and disease severity and clinical outcomes were assessed. Another 1004 patients in the database were enrolled for validation. Of the five derived phenotypes, phenotype I was the most common subgroup (n = 2174; 54.2%) and was mostly composed of the postoperative population. Phenotype II (n = 480; 12.0%) led to the most severe conditions. Phenotype III (n = 241; 6.01%) was associated with high positive end-expiratory pressure (PEEP) and low mean airway pressure. Phenotype IV (n = 368; 9.18%) was associated with high driving pressure, and younger patients comprised a large proportion of the phenotype V group (n = 746; 18.6%). In addition, we found that the mortality rate of Phenotype IV was significantly higher than that of the other phenotypes. In this subgroup, the number of patients in the sequential organ failure assessment (SOFA) score segment (9,22] was 198, the number of deaths was 88, and the mortality rate was higher than 44%. However, the cumulative 28-day mortality of Phenotypes IV and II, which were 101 of 368 (27.4%) and 87 of 480 (18.1%) unique patients, respectively, was significantly higher than those of the other phenotypes. There were consistent phenotype distributions and differences in biomarker patterns by phenotype in the validation cohort, and external verification with MIMIC III further generated supportive results.

CONCLUSIONS : Five clinical phenotypes were correlated with different disease severities and clinical outcomes, which suggested that these phenotypes may help in understanding heterogeneity in MV treatment effects.

Su Longxiang, Zhang Zhongheng, Zheng Fanglan, Pan Pan, Hong Na, Liu Chun, He Jie, Zhu Weiguo, Long Yun, Liu Dawei

2020-Dec-10

Clinical phenotype, Critically ill patients, Machine learning, Mechanical ventilation

Public Health Public Health

Comparison of Use of Health Care Services and Spending for Unauthorized Immigrants vs Authorized Immigrants or US Citizens Using a Machine Learning Model.

In JAMA network open

Importance : Knowledge about use of health care services (health care utilization) and expenditures among unauthorized immigrant populations is uncertain because of limitations in ascertaining legal status in population data.

Objective : To examine health care utilization and expenditures that are attributable to unauthorized and authorized immigrants vs US-born individuals.

Design, Setting, and Participants : This cross-sectional study used the data on documentation status from the Los Angeles Family and Neighborhood Survey (LAFANS) to develop a random forest classifier machine learning model. K-fold cross-validation was used to test model performance. The LAFANS is a randomized, multilevel, in-person survey of households residing in Los Angeles County, California, consisting of 2 waves. Wave 1 began in April 2000 and ended in January 2002, and wave 2 began in August 2006 and ended in December 2008. The machine learning model was then applied to a nationally representative database, the 2016-2017 Medical Expenditure Panel Survey (MEPS), to predict health care expenditures and utilization among unauthorized and authorized immigrants and US-born individuals. A generalized linear model analyzed health care expenditures. Logistic regression modeling estimated dichotomous use of emergency department (ED), inpatient, outpatient, and office-based physician visits by immigrant groups with adjusting for confounding factors. Data were analyzed from May 1, 2019, to October 14, 2020.

Exposures : Self-reported immigration status (US-born, authorized, and unauthorized status).

Main Outcomes and Measures : Annual health care expenditures per capita and use of ED, outpatient, inpatient, and office-based physician care.

Results : Of 47 199 MEPS respondents with nonmissing data, 35 079 (74.3%) were US born, 10 816 (22.9%) were authorized immigrants, and 1304 (2.8%) were unauthorized immigrants (51.7% female; mean age, 47.6 [95% CI, 47.4-47.8] years). Compared with authorized immigrants and US-born individuals, unauthorized immigrants were more likely to be aged 18 to 44 years (80.8%), Latino (96.3%), and Spanish speaking (95.2%) and to have less than 12 years of education (53.7%). Half of unauthorized immigrants (47.1%) were uninsured compared with 15.9% of authorized immigrants and 6.0% of US-born individuals. Mean annual health care expenditures per person were $1629 (95% CI, $1330-$1928) for unauthorized immigrants, $3795 (95% CI, $3555-$4035) for authorized immigrants, and $6088 (95% CI, $5935-$6242) for US-born individuals.

Conclusions and Relevance : Contrary to much political discourse in the US, this cross-sectional study found no evidence that unauthorized immigrants are a substantial economic burden on safety net facilities such as EDs. This study illustrates the value of machine learning in the study of unauthorized immigrants using large-scale, secondary databases.

Wilson Fernando A, Zallman Leah, Pagán José A, Ortega Alexander N, Wang Yang, Tatar Moosa, Stimpson Jim P

2020-Dec-01

General General

Integration and Co-design of Memristive Devices and Algorithms for Artificial Intelligence.

In iScience

Memristive devices share remarkable similarities to biological synapses, dendrites, and neurons at both the physical mechanism level and unit functionality level, making the memristive approach to neuromorphic computing a promising technology for future artificial intelligence. However, these similarities do not directly transfer to the success of efficient computation without device and algorithm co-designs and optimizations. Contemporary deep learning algorithms demand the memristive artificial synapses to ideally possess analog weighting and linear weight-update behavior, requiring substantial device-level and circuit-level optimization. Such co-design and optimization have been the main focus of memristive neuromorphic engineering, which often abandons the "non-ideal" behaviors of memristive devices, although many of them resemble what have been observed in biological components. Novel brain-inspired algorithms are being proposed to utilize such behaviors as unique features to further enhance the efficiency and intelligence of neuromorphic computing, which calls for collaborations among electrical engineers, computing scientists, and neuroscientists.

Wang Wei, Song Wenhao, Yao Peng, Li Yang, Van Nostrand Joseph, Qiu Qinru, Ielmini Daniele, Yang J Joshua

2020-Dec-18

Computer Architecture, Hardware Co-design, Materials Science

Public Health Public Health

Assessment of Machine Learning to Estimate the Individual Treatment Effect of Corticosteroids in Septic Shock.

In JAMA network open

Importance : The survival benefit of corticosteroids in septic shock remains uncertain.

Objective : To estimate the individual treatment effect (ITE) of corticosteroids in adults with septic shock in intensive care units using machine learning and to evaluate the net benefit of corticosteroids when the decision to treat is based on the individual estimated absolute treatment effect.

Design, Setting, and Participants : This cohort study used individual patient data from 4 trials on steroid supplementation in adults with septic shock as a training cohort to model the ITE using an ensemble machine learning approach. Data from a double-blinded, placebo-controlled randomized clinical trial comparing hydrocortisone with placebo were used for external validation. Data analysis was conducted from September 2019 to February 2020.

Exposures : Intravenous hydrocortisone 50 mg dose every 6 hours for 5 to 7 days with or without enteral 50 μg of fludrocortisone daily for 7 days. The control was either the placebo or usual care.

Main Outcomes and Measures : All-cause 90-day mortality.

Results : A total of 2548 participants were included in the development cohort, with median (interquartile range [IQR]) age of 66 (55-76) years and 1656 (65.0%) men. The median (IQR) Simplified Acute Physiology Score (SAPS II) was 55 [42-69], and median (IQR) Sepsis-related Organ Failure Assessment score on day 1 was 11 (9-13). The crude pooled relative risk (RR) of death at 90 days was 0.89 (95% CI, 0.83 to 0.96) in favor of corticosteroids. According to the optimal individual model, the estimated median absolute risk reduction was of 2.90% (95% CI, 2.79% to 3.01%). In the external validation cohort of 75 patients, the area under the curve of the optimal individual model was 0.77 (95% CI, 0.59 to 0.92). For any number willing to treat (NWT; defined as the acceptable number of people to treat to avoid 1 additional outcome considering the risk of harm associated with the treatment) less than 25, the net benefit of treating all patients vs treating nobody was negative. When the NWT was 25, the net benefit was 0.01 for the treat all with hydrocortisone strategy, -0.01 for treat all with hydrocortisone and fludrocortisone strategy, 0.06 for the treat by SAPS II strategy, and 0.31 for the treat by optimal individual model strategy. The net benefit of the SAPS II and the optimal individual model treatment strategies converged to zero for a smaller number willing to treat, but the individual model was consistently superior than model based on the SAPS II score.

Conclusions and Relevance : These findings suggest that an individualized treatment strategy to decide which patient with septic shock to treat with corticosteroids yielded positive net benefit regardless of potential corticosteroid-associated side effects.

Pirracchio Romain, Hubbard Alan, Sprung Charles L, Chevret Sylvie, Annane Djillali

2020-Dec-01

General General

Impacts of speciation and extinction measured by an evolutionary decay clock.

In Nature ; h5-index 368.0

The hypothesis that destructive mass extinctions enable creative evolutionary radiations (creative destruction) is central to classic concepts of macroevolution1,2. However, the relative impacts of extinction and radiation on the co-occurrence of species have not been directly quantitatively compared across the Phanerozoic eon. Here we apply machine learning to generate a spatial embedding (multidimensional ordination) of the temporal co-occurrence structure of the Phanerozoic fossil record, covering 1,273,254 occurrences in the Paleobiology Database for 171,231 embedded species. This facilitates the simultaneous comparison of macroevolutionary disruptions, using measures independent of secular diversity trends. Among the 5% most significant periods of disruption, we identify the 'big five' mass extinction events2, seven additional mass extinctions, two combined mass extinction-radiation events and 15 mass radiations. In contrast to narratives that emphasize post-extinction radiations1,3, we find that the proportionally most comparable mass radiations and extinctions (such as the Cambrian explosion and the end-Permian mass extinction) are typically decoupled in time, refuting any direct causal relationship between them. Moreover, in addition to extinctions4, evolutionary radiations themselves cause evolutionary decay (modelled co-occurrence probability and shared fraction of species between times approaching zero), a concept that we describe as destructive creation. A direct test of the time to over-threshold macroevolutionary decay4 (shared fraction of species between two times ≤ 0.1), counted by the decay clock, reveals saw-toothed fluctuations around a Phanerozoic mean of 18.6 million years. As the Quaternary period began at a below-average decay-clock time of 11 million years, modern extinctions further increase life's decay-clock debt.

Hoyal Cuthill Jennifer F, Guttenberg Nicholas, Budd Graham E

2020-Dec-09

Public Health Public Health

On collaborative reinforcement learning to optimize the redistribution of critical medical supplies throughout the COVID-19 pandemic.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : This work investigates how reinforcement learning and deep learning models can facilitate the near-optimal redistribution of medical equipment in order to bolster public health responses to future crises similar to the COVID-19 pandemic.

MATERIALS AND METHODS : The system presented is simulated with disease impact statistics from the Institute of Health Metrics (IHME), Center for Disease Control, and Census Bureau[1, 2, 3]. We present a robust pipeline for data preprocessing, future demand inference, and a redistribution algorithm that can be adopted across broad scales and applications.

RESULTS : The reinforcement learning redistribution algorithm demonstrates performance optimality ranging from 93-95%. Performance improves consistently with the number of random states participating in exchange, demonstrating average shortage reductions of 78.74% (± 30.8) in simulations with 5 states to 93.50% (± 0.003) with 50 states.

CONCLUSION : These findings bolster confidence that reinforcement learning techniques can reliably guide resource allocation for future public health emergencies.

Bednarski Bryan P, Singh Akash Deep, Jones William M

2020-Dec-09

Allocation, Artificial Intelligence, Coronavirus, Machine Learning, Resource

Public Health Public Health

On collaborative reinforcement learning to optimize the redistribution of critical medical supplies throughout the COVID-19 pandemic.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : This work investigates how reinforcement learning and deep learning models can facilitate the near-optimal redistribution of medical equipment in order to bolster public health responses to future crises similar to the COVID-19 pandemic.

MATERIALS AND METHODS : The system presented is simulated with disease impact statistics from the Institute of Health Metrics (IHME), Center for Disease Control, and Census Bureau[1, 2, 3]. We present a robust pipeline for data preprocessing, future demand inference, and a redistribution algorithm that can be adopted across broad scales and applications.

RESULTS : The reinforcement learning redistribution algorithm demonstrates performance optimality ranging from 93-95%. Performance improves consistently with the number of random states participating in exchange, demonstrating average shortage reductions of 78.74% (± 30.8) in simulations with 5 states to 93.50% (± 0.003) with 50 states.

CONCLUSION : These findings bolster confidence that reinforcement learning techniques can reliably guide resource allocation for future public health emergencies.

Bednarski Bryan P, Singh Akash Deep, Jones William M

2020-Dec-09

Allocation, Artificial Intelligence, Coronavirus, Machine Learning, Resource

General General

Predicting materials properties without crystal structure: deep representation learning from stoichiometry.

In Nature communications ; h5-index 260.0

Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure - therefore only applicable to materials with already characterised structures - or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data.

Goodall Rhys E A, Lee Alpha A

2020-Dec-08

General General

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity.

In Nature communications ; h5-index 260.0

The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed 'genetic engineering attribution', would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and basic phenotype data can reach 70% attribution accuracy in distinguishing between over 1,300 labs. To make these models usable in practice, we introduce a framework for weighing predictions against other investigative evidence using calibration, and bring our model to within 1.6% of perfect calibration. Additionally, we demonstrate that simple models can accurately predict both the nation-state-of-origin and ancestor labs, forming the foundation of an integrated attribution toolkit which should promote responsible innovation and international security alike.

Alley Ethan C, Turpin Miles, Liu Andrew Bo, Kulp-McDowall Taylor, Swett Jacob, Edison Rey, Von Stetina Stephen E, Church George M, Esvelt Kevin M

2020-12-08

Radiology Radiology

Development and Validation of a Preoperative Magnetic Resonance Imaging Radiomics-Based Signature to Predict Axillary Lymph Node Metastasis and Disease-Free Survival in Patients With Early-Stage Breast Cancer.

In JAMA network open

Importance : Axillary lymph node metastasis (ALNM) status, typically estimated using an invasive procedure with a high false-negative rate, strongly affects the prognosis of recurrence in breast cancer. However, preoperative noninvasive tools to accurately predict ALNM status and disease-free survival (DFS) are lacking.

Objective : To develop and validate dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) radiomic signatures for preoperative identification of ALNM and to assess individual DFS in patients with early-stage breast cancer.

Design, Setting, and Participants : This retrospective prognostic study included patients with histologically confirmed early-stage breast cancer diagnosed at 4 hospitals in China from July 3, 2007, to September 21, 2019, randomly divided (7:3) into development and vaidation cohorts. All patients underwent preoperative MRI scans, were treated with surgery and sentinel lymph node biopsy or ALN dissection, and were pathologically examined to determine the ALNM status. Data analysis was conducted from February 15, 2019, to March 20, 2020.

Exposure : Clinical and DCE-MRI radiomic signatures.

Main Outcomes and Measures : The primary end points were ALNM and DFS.

Results : This study included 1214 women (median [IQR] age, 47 [42-55] years), split into development (849 [69.9%]) and validation (365 [30.1%]) cohorts. The radiomic signature identified ALNM in the development and validation cohorts with areas under the curve (AUCs) of 0.88 and 0.85, respectively, and the clinical-radiomic nomogram accurately predicted ALNM in the development and validation cohorts (AUC, 0.92 and 0.90, respectively) based on a least absolute shrinkage and selection operator (LASSO)-logistic regression model. The radiomic signature predicted 3-year DFS in the development and validation cohorts (AUC, 0.81 and 0.73, respectively), and the clinical-radiomic nomogram could discriminate high-risk from low-risk patients in the development cohort (hazard ratio [HR], 0.04; 95% CI, 0.01-0.11; P < .001) and the validation cohort (HR, 0.04; 95% CI, 0.004-0.32; P < .001) based on a random forest-Cox regression model. The clinical-radiomic nomogram was associated with 3-year DFS in the development and validation cohorts (AUC, 0.89 and 0.90, respectively). The decision curve analysis demonstrated that the clinical-radiomic nomogram displayed better clinical predictive usefulness than the clinical or radiomic signature alone.

Conclusions and Relevance : This study described the application of MRI-based machine learning in patients with breast cancer, presenting novel individualized clinical decision nomograms that could be used to predict ALNM status and DFS. The clinical-radiomic nomograms were useful in clinical decision-making associated with personalized selection of surgical interventions and therapeutic regimens for patients with early-stage breast cancer.

Yu Yunfang, Tan Yujie, Xie Chuanmiao, Hu Qiugen, Ouyang Jie, Chen Yongjian, Gu Yang, Li Anlin, Lu Nian, He Zifan, Yang Yaping, Chen Kai, Ma Jiafan, Li Chenchen, Ma Mudi, Li Xiaohong, Zhang Rong, Zhong Haitao, Ou Qiyun, Zhang Yiwen, He Yufang, Li Gang, Wu Zhuo, Su Fengxi, Song Erwei, Yao Herui

2020-Dec-01

Radiology Radiology

Pulmonary Ventilation Maps Generated with Free-breathing Proton MRI and a Deep Convolutional Neural Network.

In Radiology ; h5-index 91.0

Background Hyperpolarized noble gas MRI helps measure lung ventilation, but clinical translation remains limited. Free-breathing proton MRI may help quantify lung function using existing MRI systems without contrast material and may assist in providing information about ventilation not visible to the eye or easily extracted with segmentation methods. Purpose To explore the use of deep convolutional neural networks (DCNNs) to generate synthetic MRI ventilation scans from free-breathing MRI (deep learning [DL] ventilation MRI)-derived specific ventilation maps as a surrogate of noble gas MRI and to validate this approach across a wide range of lung diseases. Materials and Methods In this secondary analysis of prospective trials, 114 paired noble gas MRI and two-dimensional free-breathing MRI scans were obtained in healthy volunteers with no history of chronic or acute respiratory disease and in study participants with a range of different obstructive lung diseases, including asthma, bronchiectasis, chronic obstructive pulmonary disease, and non-small-cell lung cancer between September 2013 and April 2018 (ClinicalTrials.gov identifiers: NCT03169673, NCT02351141, NCT02263794, NCT02282202, NCT02279329, and NCT02002052). A U-Net-based DCNN model was trained to map free-breathing proton MRI to hyperpolarized helium 3 (3He) MRI ventilation and validated using a sixfold validation. During training, the DCNN ventilation maps were compared with noble gas MRI scans using the Pearson correlation coefficient (r) and mean absolute error. DCNN ventilation images were segmented for ventilation and ventilation defects and were compared with noble gas MRI scans using the Dice similarity coefficient (DSC). Relationships were evaluated with the Spearman correlation coefficient (rS). Results One hundred fourteen study participants (mean age, 56 years ± 15 [standard deviation]; 66 women) were evaluated. As compared with 3He MRI, DCNN model ventilation maps had a mean r value of 0.87 ± 0.08. The mean DSC for DL ventilation MRI and 3He MRI ventilation was 0.91 ± 0.07. The ventilation defect percentage for DL ventilation MRI was highly correlated with 3He MRI ventilation defect percentage (rS = 0.83, P < .001, mean bias = -2.0% ± 5). Both DL ventilation MRI (rS = -0.51, P < .001) and 3He MRI (rS = -0.61, P < .001) ventilation defect percentage were correlated with the forced expiratory volume in 1 second. The DCNN model required approximately 2 hours for training and approximately 1 second to generate a ventilation map. Conclusion In participants with diverse pulmonary pathologic findings, deep convolutional neural networks generated ventilation maps from free-breathing proton MRI trained with a hyperpolarized noble-gas MRI ventilation map data set. The maps showed correlation with noble gas MRI ventilation and pulmonary function measurements. © RSNA, 2020 See also the editorial by Vogel-Claussen in this issue.

Capaldi Dante P I, Guo Fumin, Xing Lei, Parraga Grace

2020-Dec-08

General General

Quantifying the influence of mutation detection on tumour subclonal reconstruction.

In Nature communications ; h5-index 260.0

Whole-genome sequencing can be used to estimate subclonal populations in tumours and this intra-tumoural heterogeneity is linked to clinical outcomes. Many algorithms have been developed for subclonal reconstruction, but their variabilities and consistencies are largely unknown. We evaluate sixteen pipelines for reconstructing the evolutionary histories of 293 localized prostate cancers from single samples, and eighteen pipelines for the reconstruction of 10 tumours with multi-region sampling. We show that predictions of subclonal architecture and timing of somatic mutations vary extensively across pipelines. Pipelines show consistent types of biases, with those incorporating SomaticSniper and Battenberg preferentially predicting homogenous cancer cell populations and those using MuTect tending to predict multiple populations of cancer cells. Subclonal reconstructions using multi-region sampling confirm that single-sample reconstructions systematically underestimate intra-tumoural heterogeneity, predicting on average fewer than half of the cancer cell populations identified by multi-region sequencing. Overall, these biases suggest caution in interpreting specific architectures and subclonal variants.

Liu Lydia Y, Bhandari Vinayak, Salcedo Adriana, Espiritu Shadrielle M G, Morris Quaid D, Kislinger Thomas, Boutros Paul C

2020-12-07

General General

Phase imaging with computational specificity (PICS) for measuring dry mass changes in sub-cellular compartments.

In Nature communications ; h5-index 260.0

Due to its specificity, fluorescence microscopy has become a quintessential imaging tool in cell biology. However, photobleaching, phototoxicity, and related artifacts continue to limit fluorescence microscopy's utility. Recently, it has been shown that artificial intelligence (AI) can transform one form of contrast into another. We present phase imaging with computational specificity (PICS), a combination of quantitative phase imaging and AI, which provides information about unlabeled live cells with high specificity. Our imaging system allows for automatic training, while inference is built into the acquisition software and runs in real-time. Applying the computed fluorescence maps back to the quantitative phase imaging (QPI) data, we measured the growth of both nuclei and cytoplasm independently, over many days, without loss of viability. Using a QPI method that suppresses multiple scattering, we measured the dry mass content of individual cell nuclei within spheroids. In its current implementation, PICS offers a versatile quantitative technique for continuous simultaneous monitoring of individual cellular components in biological applications where long-term label-free imaging is desirable.

Kandel Mikhail E, He Yuchen R, Lee Young Jae, Chen Taylor Hsuan-Yu, Sullivan Kathryn Michele, Aydin Onur, Saif M Taher A, Kong Hyunjoon, Sobh Nahil, Popescu Gabriel

2020-12-07

Public Health Public Health

Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease.

In Nature communications ; h5-index 260.0

Despite considerable progress on pathogenicity scores prioritizing variants for Mendelian disease, little is known about the utility of these scores for common disease. Here, we assess the informativeness of Mendelian disease-derived pathogenicity scores for common disease and improve upon existing scores. We first apply stratified linkage disequilibrium (LD) score regression to evaluate published pathogenicity scores across 41 common diseases and complex traits (average N = 320K). Several of the resulting annotations are informative for common disease, even after conditioning on a broad set of functional annotations. We then improve upon published pathogenicity scores by developing AnnotBoost, a machine learning framework to impute and denoise pathogenicity scores using a broad set of functional annotations. AnnotBoost substantially increases the informativeness for common disease of both previously uninformative and previously informative pathogenicity scores, implying that Mendelian and common disease variants share similar properties. The boosted scores also produce improvements in heritability model fit and in classifying disease-associated, fine-mapped SNPs. Our boosted scores may improve fine-mapping and candidate gene discovery for common disease.

Kim Samuel S, Dey Kushal K, Weissbrod Omer, Márquez-Luna Carla, Gazal Steven, Price Alkes L

2020-12-07

General General

Rationally patterned electrode of direct-current triboelectric nanogenerators for ultrahigh effective surface charge density.

In Nature communications ; h5-index 260.0

As a new-era of energy harvesting technology, the enhancement of triboelectric charge density of triboelectric nanogenerator (TENG) is always crucial for its large-scale application on Internet of Things (IoTs) and artificial intelligence (AI). Here, a microstructure-designed direct-current TENG (MDC-TENG) with rationally patterned electrode structure is presented to enhance its effective surface charge density by increasing the efficiency of contact electrification. Thus, the MDC-TENG achieves a record high charge density of ~5.4 mC m-2, which is over 2-fold the state-of-art of AC-TENGs and over 10-fold compared to previous DC-TENGs. The MDC-TENG realizes both the miniaturized device and high output performance. Meanwhile, its effective charge density can be further improved as the device size increases. Our work not only provides a miniaturization strategy of TENG for the application in IoTs and AI as energy supply or self-powered sensor, but also presents a paradigm shift for large-scale energy harvesting by TENGs.

Zhao Zhihao, Dai Yejing, Liu Di, Zhou Linglin, Li Shaoxin, Wang Zhong Lin, Wang Jie

2020-Dec-03

General General

Identifying signals associated with psychiatric illness utilizing language and images posted to Facebook.

In NPJ schizophrenia

Prior research has identified associations between social media activity and psychiatric diagnoses; however, diagnoses are rarely clinically confirmed. Toward the goal of applying novel approaches to improve outcomes, research using real patient data is necessary. We collected 3,404,959 Facebook messages and 142,390 images across 223 participants (mean age = 23.7; 41.7% male) with schizophrenia spectrum disorders (SSD), mood disorders (MD), and healthy volunteers (HV). We analyzed features uploaded up to 18 months before the first hospitalization using machine learning and built classifiers that distinguished SSD and MD from HV, and SSD from MD. Classification achieved AUC of 0.77 (HV vs. MD), 0.76 (HV vs. SSD), and 0.72 (SSD vs. MD). SSD used more (P < 0.01) perception words (hear, see, feel) than MD or HV. SSD and MD used more (P < 0.01) swear words compared to HV. SSD were more likely to express negative emotions compared to HV (P < 0.01). MD used more words related to biological processes (blood/pain) compared to HV (P < 0.01). The height and width of photos posted by SSD and MD were smaller (P < 0.01) than HV. MD photos contained more blues and less yellows (P < 0.01). Closer to hospitalization, use of punctuation increased (SSD vs HV), use of negative emotion words increased (MD vs. HV), and use of swear words increased (P < 0.01) for SSD and MD compared to HV. Machine-learning algorithms are capable of differentiating SSD and MD using Facebook activity alone over a year in advance of hospitalization. Integrating Facebook data with clinical information could one day serve to inform clinical decision-making.

Birnbaum Michael L, Norel Raquel, Van Meter Anna, Ali Asra F, Arenare Elizabeth, Eyigoz Elif, Agurto Carla, Germano Nicole, Kane John M, Cecchi Guillermo A

2020-Dec-03

General General

Highly multiplexed spatial mapping of microbial communities.

In Nature ; h5-index 368.0

Mapping the complex biogeography of microbial communities in situ with high taxonomic and spatial resolution poses a major challenge because of the high density1 and rich diversity2 of species in environmental microbiomes and the limitations of optical imaging technology3-6. Here we introduce high-phylogenetic-resolution microbiome mapping by fluorescence in situ hybridization (HiPR-FISH), a versatile technology that uses binary encoding, spectral imaging and decoding based on machine learning to create micrometre-scale maps of the locations and identities of hundreds of microbial species in complex communities. We show that 10-bit HiPR-FISH can distinguish between 1,023 isolates of Escherichia coli, each fluorescently labelled with a unique binary barcode. HiPR-FISH, in conjunction with custom algorithms for automated probe design and analysis of single-cell images, reveals the disruption of spatial networks in the mouse gut microbiome in response to treatment with antibiotics, and the longitudinal stability of spatial architectures in the human oral plaque microbiome. Combined with super-resolution imaging, HiPR-FISH shows the diverse strategies of ribosome organization that are exhibited by taxa in the human oral microbiome. HiPR-FISH provides a framework for analysing the spatial ecology of environmental microbial communities at single-cell resolution.

Shi Hao, Shi Qiaojuan, Grodner Benjamin, Lenz Joan Sesing, Zipfel Warren R, Brito Ilana Lauren, De Vlaminck Iwijn

2020-Dec-02

General General

Inference in artificial intelligence with deep optics and photonics.

In Nature ; h5-index 368.0

Artificial intelligence tasks across numerous applications require accelerators for fast and low-power execution. Optical computing systems may be able to meet these domain-specific needs but, despite half a century of research, general-purpose optical computing systems have yet to mature into a practical technology. Artificial intelligence inference, however, especially for visual computing applications, may offer opportunities for inference based on optical and photonic systems. In this Perspective, we review recent work on optical computing for artificial intelligence applications and discuss its promise and challenges.

Wetzstein Gordon, Ozcan Aydogan, Gigan Sylvain, Fan Shanhui, Englund Dirk, Soljačić Marin, Denz Cornelia, Miller David A B, Psaltis Demetri

2020-Dec

General General

Drug2ways: Reasoning over causal paths in biological networks for drug discovery.

In PLoS computational biology

Elucidating the causal mechanisms responsible for disease can reveal potential therapeutic targets for pharmacological intervention and, accordingly, guide drug repositioning and discovery. In essence, the topology of a network can reveal the impact a drug candidate may have on a given biological state, leading the way for enhanced disease characterization and the design of advanced therapies. Network-based approaches, in particular, are highly suited for these purposes as they hold the capacity to identify the molecular mechanisms underlying disease. Here, we present drug2ways, a novel methodology that leverages multimodal causal networks for predicting drug candidates. Drug2ways implements an efficient algorithm which reasons over causal paths in large-scale biological networks to propose drug candidates for a given disease. We validate our approach using clinical trial information and demonstrate how drug2ways can be used for multiple applications to identify: i) single-target drug candidates, ii) candidates with polypharmacological properties that can optimize multiple targets, and iii) candidates for combination therapy. Finally, we make drug2ways available to the scientific community as a Python package that enables conducting these applications on multiple standard network formats.

Rivas-Barragan Daniel, Mubeen Sarah, Guim Bernat Francesc, Hofmann-Apitius Martin, Domingo-Fernández Daniel

2020-Dec-02

General General

Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure.

In Nature communications ; h5-index 260.0

Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.

Zrimec Jan, Börlin Christoph S, Buric Filip, Muhammad Azam Sheikh, Chen Rhongzen, Siewers Verena, Verendel Vilhelm, Nielsen Jens, Töpel Mats, Zelezniak Aleksej

2020-12-01

oncology Oncology

Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects.

In Nature communications ; h5-index 260.0

We present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorization machines. The approach enables comboFM to leverage information from previous experiments performed on similar drugs and cells when predicting responses of new combinations in so far untested cells; thereby, it achieves highly accurate predictions despite sparsely populated data tensors. We demonstrate high predictive performance of comboFM in various prediction scenarios using data from cancer cell line pharmacogenomic screens. Subsequent experimental validation of a set of previously untested drug combinations further supports the practical and robust applicability of comboFM. For instance, we confirm a novel synergy between anaplastic lymphoma kinase (ALK) inhibitor crizotinib and proteasome inhibitor bortezomib in lymphoma cells. Overall, our results demonstrate that comboFM provides an effective means for systematic pre-screening of drug combinations to support precision oncology applications.

Julkunen Heli, Cichonska Anna, Gautam Prson, Szedmak Sandor, Douat Jane, Pahikkala Tapio, Aittokallio Tero, Rousu Juho

2020-12-01

oncology Oncology

Development of a "meta-model" to address missing data, predict patient-specific cancer survival and provide a foundation for clinical decision support.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Like most real-world data, electronic health record (EHR)-derived data from oncology patients typically exhibits wide interpatient variability in terms of available data elements. This interpatient variability leads to missing data and can present critical challenges in developing and implementing predictive models to underlie clinical decision support for patient-specific oncology care. Here, we sought to develop a novel ensemble approach to addressing missing data that we term the "meta-model" and apply the meta-model to patient-specific cancer prognosis.

MATERIALS AND METHODS : Using real-world data, we developed a suite of individual random survival forest models to predict survival in patients with advanced lung cancer, colorectal cancer, and breast cancer. Individual models varied by the predictor data used. We combined models for each cancer type into a meta-model that predicted survival for each patient using a weighted mean of the individual models for which the patient had all requisite predictors.

RESULTS : The meta-model significantly outperformed many of the individual models and performed similarly to the best performing individual models. Comparisons of the meta-model to a more traditional imputation-based method of addressing missing data supported the meta-model's utility.

CONCLUSIONS : We developed a novel machine learning-based strategy to underlie clinical decision support and predict survival in cancer patients, despite missing data. The meta-model may more generally provide a tool for addressing missing data across a variety of clinical prediction problems. Moreover, the meta-model may address other challenges in clinical predictive modeling including model extensibility and integration of predictive algorithms trained across different institutions and datasets.

Baron Jason M, Paranjape Ketan, Love Tara, Sharma Vishakha, Heaney Denise, Prime Matthew

2020-Dec-01

clinical decision support, imputation, machine learning, meta-model, missing data, survival

Surgery Surgery

Lung transplantation for patients with severe COVID-19.

In Science translational medicine ; h5-index 138.0

Lung transplantation can potentially be a life-saving treatment for patients with non-resolving COVID-19-associated respiratory failure. Concerns limiting lung transplantation include recurrence of SARS-CoV-2 infection in the allograft, technical challenges imposed by viral-mediated injury to the native lung, and the potential risk for allograft infection by pathogens causing ventilator-associated pneumonia in the native lung. Importantly, the native lung might recover, resulting in long-term outcomes preferable to those of transplant. Here, we report the results of lung transplantation in three patients with non-resolving COVID-19-associated respiratory failure. We performed single molecule fluorescent in situ hybridization (smFISH) to detect both positive and negative strands of SARS-CoV-2 RNA in explanted lung tissue from the three patients and in additional control lung tissue samples. We conducted extracellular matrix imaging and single cell RNA sequencing on explanted lung tissue from the three patients who underwent transplantation and on warm post-mortem lung biopsies from two patients who had died from COVID-19-associated pneumonia. Lungs from these five patients with prolonged COVID-19 disease were free of SARS-CoV-2 as detected by smFISH, but pathology showed extensive evidence of injury and fibrosis that resembled end-stage pulmonary fibrosis. Using machine learning, we compared single cell RNA sequencing data from the lungs of patients with late stage COVID-19 to that from the lungs of patients with pulmonary fibrosis and identified similarities in gene expression across cell lineages. Our findings suggest that some patients with severe COVID-19 develop fibrotic lung disease for which lung transplantation is their only option for survival.

Bharat Ankit, Querrey Melissa, Markov Nikolay S, Kim Samuel, Kurihara Chitaru, Garza-Castillon Rafael, Manerikar Adwaiy, Shilatifard Ali, Tomic Rade, Politanska Yuliya, Abdala-Valencia Hiam, Yeldandi Anjana V, Lomasney Jon W, Misharin Alexander V, Budinger G R Scott

2020-Nov-30

Radiology Radiology

A clinically applicable deep-learning model for detecting intracranial aneurysm in computed tomography angiography images.

In Nature communications ; h5-index 260.0

Intracranial aneurysm is a common life-threatening disease. Computed tomography angiography is recommended as the standard diagnosis tool; yet, interpretation can be time-consuming and challenging. We present a specific deep-learning-based model trained on 1,177 digital subtraction angiography verified bone-removal computed tomography angiography cases. The model has good tolerance to image quality and is tested with different manufacturers. Simulated real-world studies are conducted in consecutive internal and external cohorts, in which it achieves an improved patient-level sensitivity and lesion-level sensitivity compared to that of radiologists and expert neurosurgeons. A specific cohort of suspected acute ischemic stroke is employed and it is found that 99.0% predicted-negative cases can be trusted with high confidence, leading to a potential reduction in human workload. A prospective study is warranted to determine whether the algorithm could improve patients' care in comparison to clinicians' assessment.

Shi Zhao, Miao Chongchang, Schoepf U Joseph, Savage Rock H, Dargis Danielle M, Pan Chengwei, Chai Xue, Li Xiu Li, Xia Shuang, Zhang Xin, Gu Yan, Zhang Yonggang, Hu Bin, Xu Wenda, Zhou Changsheng, Luo Song, Wang Hao, Mao Li, Liang Kongming, Wen Lili, Zhou Longjiang, Yu Yizhou, Lu Guang Ming, Zhang Long Jiang

2020-11-30

Surgery Surgery

Lung transplantation for patients with severe COVID-19.

In Science translational medicine ; h5-index 138.0

Lung transplantation can potentially be a life-saving treatment for patients with non-resolving COVID-19-associated respiratory failure. Concerns limiting lung transplantation include recurrence of SARS-CoV-2 infection in the allograft, technical challenges imposed by viral-mediated injury to the native lung, and the potential risk for allograft infection by pathogens causing ventilator-associated pneumonia in the native lung. Importantly, the native lung might recover, resulting in long-term outcomes preferable to those of transplant. Here, we report the results of lung transplantation in three patients with non-resolving COVID-19-associated respiratory failure. We performed single molecule fluorescent in situ hybridization (smFISH) to detect both positive and negative strands of SARS-CoV-2 RNA in explanted lung tissue from the three patients and in additional control lung tissue samples. We conducted extracellular matrix imaging and single cell RNA sequencing on explanted lung tissue from the three patients who underwent transplantation and on warm post-mortem lung biopsies from two patients who had died from COVID-19-associated pneumonia. Lungs from these five patients with prolonged COVID-19 disease were free of SARS-CoV-2 as detected by smFISH, but pathology showed extensive evidence of injury and fibrosis that resembled end-stage pulmonary fibrosis. Using machine learning, we compared single cell RNA sequencing data from the lungs of patients with late stage COVID-19 to that from the lungs of patients with pulmonary fibrosis and identified similarities in gene expression across cell lineages. Our findings suggest that some patients with severe COVID-19 develop fibrotic lung disease for which lung transplantation is their only option for survival.

Bharat Ankit, Querrey Melissa, Markov Nikolay S, Kim Samuel, Kurihara Chitaru, Garza-Castillon Rafael, Manerikar Adwaiy, Shilatifard Ali, Tomic Rade, Politanska Yuliya, Abdala-Valencia Hiam, Yeldandi Anjana V, Lomasney Jon W, Misharin Alexander V, Budinger G R Scott

2020-Nov-30

General General

Development and validation of a real-time artificial intelligence-assisted system for detecting early gastric cancer: A multicentre retrospective diagnostic study.

In EBioMedicine

BACKGROUND : We aimed to develop and validate a real-time deep convolutional neural networks (DCNNs) system for detecting early gastric cancer (EGC).

METHODS : All 45,240 endoscopic images from 1364 patients were divided into a training dataset (35823 images from 1085 patients) and a validation dataset (9417 images from 279 patients). Another 1514 images from three other hospitals were used as external validation. We compared the diagnostic performance of the DCNN system with endoscopists, and then evaluated the performance of endoscopists with or without referring to the system. Thereafter, we evaluated the diagnostic ability of the DCNN system in video streams. The accuracy, sensitivity, specificity, positive predictive value, negative predictive value and Cohen's kappa coefficient were measured to assess the detection performance.

FINDING : The DCNN system showed good performance in EGC detection in validation datasets, with accuracy (85.1%-91.2%), sensitivity (85.9%-95.5%), specificity (81.7%-90.3%), and AUC (0.887-0.940). The DCNN system showed better diagnostic performance than endoscopists and improved the performance of endoscopists. The DCNN system was able to process oesophagogastroduodenoscopy (OGD) video streams to detect EGC lesions in real time.

INTERPRETATION : We developed a real-time DCNN system for EGC detection with high accuracy and stability. Multicentre prospective validation is needed to acquire high-level evidence for its clinical application.

FUNDING : This work was supported by the National Natural Science Foundation of China (grant nos. 81672935 and 81871947), Jiangsu Clinical Medical Center of Digestive System Diseases and Gastrointestinal Cancer (grant no. YXZXB2016002), and Nanjing Science and Technology Development Foundation (grant no. 2017sb332019).

Tang Dehua, Wang Lei, Ling Tingsheng, Lv Ying, Ni Muhan, Zhan Qiang, Fu Yiwei, Zhuang Duanming, Guo Huimin, Dou Xiaotan, Zhang Wei, Xu Guifang, Zou Xiaoping

2020-Nov-27

Artificial intelligence, Convolutional neural network, Detection, Early gastric cancer

Public Health Public Health

Identifying longevity associated genes by integrating gene expression and curated annotations.

In PLoS computational biology

Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (C. elegans and S. cerevisiae) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro- and anti-longevity genes that are not currently in the GenAge database.

Townes F William, Carr Kareem, Miller Jeffrey W

2020-Nov-30

oncology Oncology

Evaluation of Deep Learning to Augment Image-Guided Radiotherapy for Head and Neck and Prostate Cancers.

In JAMA network open

Importance : Personalized radiotherapy planning depends on high-quality delineation of target tumors and surrounding organs at risk (OARs). This process puts additional time burdens on oncologists and introduces variability among both experts and institutions.

Objective : To explore clinically acceptable autocontouring solutions that can be integrated into existing workflows and used in different domains of radiotherapy.

Design, Setting, and Participants : This quality improvement study used a multicenter imaging data set comprising 519 pelvic and 242 head and neck computed tomography (CT) scans from 8 distinct clinical sites and patients diagnosed either with prostate or head and neck cancer. The scans were acquired as part of treatment dose planning from patients who received intensity-modulated radiation therapy between October 2013 and February 2020. Fifteen different OARs were manually annotated by expert readers and radiation oncologists. The models were trained on a subset of the data set to automatically delineate OARs and evaluated on both internal and external data sets. Data analysis was conducted October 2019 to September 2020.

Main Outcomes and Measures : The autocontouring solution was evaluated on external data sets, and its accuracy was quantified with volumetric agreement and surface distance measures. Models were benchmarked against expert annotations in an interobserver variability (IOV) study. Clinical utility was evaluated by measuring time spent on manual corrections and annotations from scratch.

Results : A total of 519 participants' (519 [100%] men; 390 [75%] aged 62-75 years) pelvic CT images and 242 participants' (184 [76%] men; 194 [80%] aged 50-73 years) head and neck CT images were included. The models achieved levels of clinical accuracy within the bounds of expert IOV for 13 of 15 structures (eg, left femur, κ = 0.982; brainstem, κ = 0.806) and performed consistently well across both external and internal data sets (eg, mean [SD] Dice score for left femur, internal vs external data sets: 98.52% [0.50] vs 98.04% [1.02]; P = .04). The correction time of autogenerated contours on 10 head and neck and 10 prostate scans was measured as a mean of 4.98 (95% CI, 4.44-5.52) min/scan and 3.40 (95% CI, 1.60-5.20) min/scan, respectively, to ensure clinically accepted accuracy, whereas contouring from scratch on the same scans was observed to be 73.25 (95% CI, 68.68-77.82) min/scan and 86.75 (95% CI, 75.21-92.29) min/scan, respectively, accounting for a 93% reduction in time.

Conclusions and Relevance : In this study, the models achieved levels of clinical accuracy within expert IOV while reducing manual contouring time and performing consistently well across previously unseen heterogeneous data sets. With the availability of open-source libraries and reliable performance, this creates significant opportunities for the transformation of radiation treatment planning.

Oktay Ozan, Nanavati Jay, Schwaighofer Anton, Carter David, Bristow Melissa, Tanno Ryutaro, Jena Rajesh, Barnett Gill, Noble David, Rimmer Yvonne, Glocker Ben, O’Hara Kenton, Bishop Christopher, Alvarez-Valle Javier, Nori Aditya

2020-Nov-02