Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

A greedy classifier optimization strategy to assess ion channel blocking activity and pro-arrhythmia in hiPSC-cardiomyocytes.

In PLoS computational biology

Novel studies conducting cardiac safety assessment using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) are promising but might be limited by their specificity and predictivity. It is often challenging to correctly classify ion channel blockers or to sufficiently predict the risk for Torsade de Pointes (TdP). In this study, we developed a method combining in vitro and in silico experiments to improve machine learning approaches in delivering fast and reliable prediction of drug-induced ion-channel blockade and proarrhythmic behaviour. The algorithm is based on the construction of a dictionary and a greedy optimization, leading to the definition of optimal classifiers. Finally, we present a numerical tool that can accurately predict compound-induced pro-arrhythmic risk and involvement of sodium, calcium and potassium channels, based on hiPSC-CM field potential data.

Raphel Fabien, De Korte Tessa, Lombardi Damiano, Braam Stefan, Gerbeau Jean-Frederic

2020-Sep-25

General General

Machine learning identifies scale-free properties in disordered materials.

In Nature communications ; h5-index 260.0

The vast amount of design freedom in disordered systems expands the parameter space for signal processing. However, this large degree of freedom has hindered the deterministic design of disordered systems for target functionalities. Here, we employ a machine learning approach for predicting and designing wave-matter interactions in disordered structures, thereby identifying scale-free properties for waves. To abstract and map the features of wave behaviors and disordered structures, we develop disorder-to-localization and localization-to-disorder convolutional neural networks, each of which enables the instantaneous prediction of wave localization in disordered structures and the instantaneous generation of disordered structures from given localizations. We demonstrate that the structural properties of the network architectures lead to the identification of scale-free disordered structures having heavy-tailed distributions, thus achieving multiple orders of magnitude improvement in robustness to accidental defects. Our results verify the critical role of neural network structures in determining machine-learning-generated real-space structures and their defect immunity.

Yu Sunkyu, Piao Xianji, Park Namkyoo

2020-09-24

Radiology Radiology

Rapid vessel segmentation and reconstruction of head and neck angiograms using 3D convolutional neural network.

In Nature communications ; h5-index 260.0

The computed tomography angiography (CTA) postprocessing manually recognized by technologists is extremely labor intensive and error prone. We propose an artificial intelligence reconstruction system supported by an optimized physiological anatomical-based 3D convolutional neural network that can automatically achieve CTA reconstruction in healthcare services. This system is trained and tested with 18,766 head and neck CTA scans from 5 tertiary hospitals in China collected between June 2017 and November 2018. The overall reconstruction accuracy of the independent testing dataset is 0.931. It is clinically applicable due to its consistency with manually processed images, which achieves a qualification rate of 92.1%. This system reduces the time consumed from 14.22 ± 3.64 min to 4.94 ± 0.36 min, the number of clicks from 115.87 ± 25.9 to 4 and the labor force from 3 to 1 technologist after five months application. Thus, the system facilitates clinical workflows and provides an opportunity for clinical technologists to improve humanistic patient care.

Fu Fan, Wei Jianyong, Zhang Miao, Yu Fan, Xiao Yueting, Rong Dongdong, Shan Yi, Li Yan, Zhao Cheng, Liao Fangzhou, Yang Zhenghan, Li Yuehua, Chen Yingmin, Wang Ximing, Lu Jie

2020-09-24

General General

Classification of estrogenic compounds by coupling high content analysis and machine learning algorithms.

In PLoS computational biology

Environmental toxicants affect human health in various ways. Of the thousands of chemicals present in the environment, those with adverse effects on the endocrine system are referred to as endocrine-disrupting chemicals (EDCs). Here, we focused on a subclass of EDCs that impacts the estrogen receptor (ER), a pivotal transcriptional regulator in health and disease. Estrogenic activity of compounds can be measured by many in vitro or cell-based high throughput assays that record various endpoints from large pools of cells, and increasingly at the single-cell level. To simultaneously capture multiple mechanistic ER endpoints in individual cells that are affected by EDCs, we previously developed a sensitive high throughput/high content imaging assay that is based upon a stable cell line harboring a visible multicopy ER responsive transcription unit and expressing a green fluorescent protein (GFP) fusion of ER. High content analysis generates voluminous multiplex data comprised of minable features that describe numerous mechanistic endpoints. In this study, we present a machine learning pipeline for rapid, accurate, and sensitive assessment of the endocrine-disrupting potential of benchmark chemicals based on data generated from high content analysis. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, both linear logistic regression and nonlinear Random Forest classifiers were benchmarked and evaluated for predicting the estrogenic activity of unknown compounds. Furthermore, through feature selection, data visualization, and model discrimination, the most informative features were identified for the classification of ER agonists/antagonists. The results of this data-driven study showed that highly accurate and generalized classification models with a minimum number of features can be constructed without loss of generality, where these machine learning models serve as a means for rapid mechanistic/phenotypic evaluation of the estrogenic potential of many chemicals.

Mukherjee Rajib, Beykal Burcu, Szafran Adam T, Onel Melis, Stossi Fabio, Mancini Maureen G, Lloyd Dillon, Wright Fred A, Zhou Lan, Mancini Michael A, Pistikopoulos Efstratios N

2020-Sep-24

Radiology Radiology

Validation of a Deep Learning Algorithm for the Detection of Malignant Pulmonary Nodules in Chest Radiographs.

In JAMA network open

Importance : The improvement of pulmonary nodule detection, which is a challenging task when using chest radiographs, may help to elevate the role of chest radiographs for the diagnosis of lung cancer.

Objective : To assess the performance of a deep learning-based nodule detection algorithm for the detection of lung cancer on chest radiographs from participants in the National Lung Screening Trial (NLST).

Design, Setting, and Participants : This diagnostic study used data from participants in the NLST ro assess the performance of a deep learning-based artificial intelligence (AI) algorithm for the detection of pulmonary nodules and lung cancer on chest radiographs using separate training (in-house) and validation (NLST) data sets. Baseline (T0) posteroanterior chest radiographs from 5485 participants (full T0 data set) were used to assess lung cancer detection performance, and a subset of 577 of these images (nodule data set) were used to assess nodule detection performance. Participants aged 55 to 74 years who currently or formerly (ie, quit within the past 15 years) smoked cigarettes for 30 pack-years or more were enrolled in the NLST at 23 US centers between August 2002 and April 2004. Information on lung cancer diagnoses was collected through December 31, 2009. Analyses were performed between August 20, 2019, and February 14, 2020.

Exposures : Abnormality scores produced by the AI algorithm.

Main Outcomes and Measures : The performance of an AI algorithm for the detection of lung nodules and lung cancer on radiographs, with lung cancer incidence and mortality as primary end points.

Results : A total of 5485 participants (mean [SD] age, 61.7 [5.0] years; 3030 men [55.2%]) were included, with a median follow-up duration of 6.5 years (interquartile range, 6.1-6.9 years). For the nodule data set, the sensitivity and specificity of the AI algorithm for the detection of pulmonary nodules were 86.2% (95% CI, 77.8%-94.6%) and 85.0% (95% CI, 81.9%-88.1%), respectively. For the detection of all cancers, the sensitivity was 75.0% (95% CI, 62.8%-87.2%), the specificity was 83.3% (95% CI, 82.3%-84.3%), the positive predictive value was 3.8% (95% CI, 2.6%-5.0%), and the negative predictive value was 99.8% (95% CI, 99.6%-99.9%). For the detection of malignant pulmonary nodules in all images of the full T0 data set, the sensitivity was 94.1% (95% CI, 86.2%-100.0%), the specificity was 83.3% (95% CI, 82.3%-84.3%), the positive predictive value was 3.4% (95% CI, 2.2%-4.5%), and the negative predictive value was 100.0% (95% CI, 99.9%-100.0%). In digital radiographs of the nodule data set, the AI algorithm had higher sensitivity (96.0% [95% CI, 88.3%-100.0%] vs 88.0% [95% CI, 75.3%-100.0%]; P = .32) and higher specificity (93.2% [95% CI, 89.9%-96.5%] vs 82.8% [95% CI, 77.8%-87.8%]; P = .001) for nodule detection compared with the NLST radiologists. For malignant pulmonary nodule detection on digital radiographs of the full T0 data set, the sensitivity of the AI algorithm was higher (100.0% [95% CI, 100.0%-100.0%] vs 94.1% [95% CI, 82.9%-100.0%]; P = .32) compared with the NLST radiologists, and the specificity (90.9% [95% CI, 89.6%-92.1%] vs 91.0% [95% CI, 89.7%-92.2%]; P = .91), positive predictive value (8.2% [95% CI, 4.4%-11.9%] vs 7.8% [95% CI, 4.1%-11.5%]; P = .65), and negative predictive value (100.0% [95% CI, 100.0%-100.0%] vs 99.9% [95% CI, 99.8%-100.0%]; P = .32) were similar to those of NLST radiologists.

Conclusions and Relevance : In this study, the AI algorithm performed better than NLST radiologists for the detection of pulmonary nodules on digital radiographs. When used as a second reader, the AI algorithm may help to detect lung cancer.

Yoo Hyunsuk, Kim Ki Hwan, Singh Ramandeep, Digumarthy Subba R, Kalra Mannudeep K

2020-Sep-01

Internal Medicine Internal Medicine

Risk prediction of delirium in hospitalized patients using machine learning: An implementation and prospective evaluation study.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Machine learning models trained on electronic health records have achieved high prognostic accuracy in test datasets, but little is known about their embedding into clinical workflows. We implemented a random forest-based algorithm to identify hospitalized patients at high risk for delirium, and evaluated its performance in a clinical setting.

MATERIALS AND METHODS : Delirium was predicted at admission and recalculated on the evening of admission. The defined prediction outcome was a delirium coded for the recent hospital stay. During 7 months of prospective evaluation, 5530 predictions were analyzed. In addition, 119 predictions for internal medicine patients were compared with ratings of clinical experts in a blinded and nonblinded setting.

RESULTS : During clinical application, the algorithm achieved a sensitivity of 74.1% and a specificity of 82.2%. Discrimination on prospective data (area under the receiver-operating characteristic curve = 0.86) was as good as in the test dataset, but calibration was poor. The predictions correlated strongly with delirium risk perceived by experts in the blinded (r = 0.81) and nonblinded (r = 0.62) settings. A major advantage of our setting was the timely prediction without additional data entry.

DISCUSSION : The implemented machine learning algorithm achieved a stable performance predicting delirium in high agreement with expert ratings, but improvement of calibration is needed. Future research should evaluate the acceptance of implemented machine learning algorithms by health professionals.

CONCLUSIONS : Our study provides new insights into the implementation process of a machine learning algorithm into a clinical workflow and demonstrates its predictive power for delirium.

Jauk Stefanie, Kramer Diether, Großauer Birgit, Rienmüller Susanne, Avian Alexander, Berghold Andrea, Leodolter Werner, Schulz Stefan

2020-Sep-24

Machine learning, clinical decision support, delirium, electronic health records, prospective studies

General General

The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research.

MATERIALS AND METHODS : Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches.

RESULTS : A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively.

CONCLUSIONS : Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.

Henry Sam, Wang Yanshan, Shen Feichen, Uzuner Ozlem

2020-Sep-24

clinical narratives, concept normalization, machine learning, natural language processing

General General

Third-order nanocircuit elements for neuromorphic engineering.

In Nature ; h5-index 368.0

Current hardware approaches to biomimetic or neuromorphic artificial intelligence rely on elaborate transistor circuits to simulate biological functions. However, these can instead be more faithfully emulated by higher-order circuit elements that naturally express neuromorphic nonlinear dynamics1-4. Generating neuromorphic action potentials in a circuit element theoretically requires a minimum of third-order complexity (for example, three dynamical electrophysical processes)5, but there have been few examples of second-order neuromorphic elements, and no previous demonstration of any isolated third-order element6-8. Using both experiments and modelling, here we show how multiple electrophysical processes-including Mott transition dynamics-form a nanoscale third-order circuit element. We demonstrate simple transistorless networks of third-order elements that perform Boolean operations and find analogue solutions to a computationally hard graph-partitioning problem. This work paves a way towards very compact and densely functional neuromorphic computing primitives, and energy-efficient validation of neuroscientific models.

Kumar Suhas, Williams R Stanley, Wang Ziwen

2020-Sep

General General

Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings.

In Nature communications ; h5-index 260.0

Social trust is linked to a host of positive societal outcomes, including improved economic performance, lower crime rates and more inclusive institutions. Yet, the origins of trust remain elusive, partly because social trust is difficult to document in time. Building on recent advances in social cognition, we design an algorithm to automatically generate trustworthiness evaluations for the facial action units (smile, eye brows, etc.) of European portraits in large historical databases. Our results show that trustworthiness in portraits increased over the period 1500-2000 paralleling the decline of interpersonal violence and the rise of democratic values observed in Western Europe. Further analyses suggest that this rise of trustworthiness displays is associated with increased living standards.

Safra Lou, Chevallier Coralie, Grèzes Julie, Baumard Nicolas

2020-Sep-22

General General

Alcoholic liver disease: A registry view on comorbidities and disease prediction.

In PLoS computational biology

Alcoholic-related liver disease (ALD) is the cause of more than half of all liver-related deaths. Sustained excess drinking causes fatty liver and alcohol-related steatohepatitis, which may progress to alcoholic liver fibrosis (ALF) and eventually to alcohol-related liver cirrhosis (ALC). Unfortunately, it is difficult to identify patients with early-stage ALD, as these are largely asymptomatic. Consequently, the majority of ALD patients are only diagnosed by the time ALD has reached decompensated cirrhosis, a symptomatic phase marked by the development of complications as bleeding and ascites. The main goal of this study is to discover relevant upstream diagnoses helping to understand the development of ALD, and to highlight meaningful downstream diagnoses that represent its progression to liver failure. Here, we use data from the Danish health registries covering the entire population of Denmark during nineteen years (1996-2014), to examine if it is possible to identify patients likely to develop ALF or ALC based on their past medical history. To this end, we explore a knowledge discovery approach by using high-dimensional statistical and machine learning techniques to extract and analyze data from the Danish National Patient Registry. Consistent with the late diagnoses of ALD, we find that ALC is the most common form of ALD in the registry data and that ALC patients have a strong over-representation of diagnoses associated with liver dysfunction. By contrast, we identify a small number of patients diagnosed with ALF who appear to be much less sick than those with ALC. We perform a matched case-control study using the group of patients with ALC as cases and their matched patients with non-ALD as controls. Machine learning models (SVM, RF, LightGBM and NaiveBayes) trained and tested on the set of ALC patients achieve a high performance for data classification (AUC = 0.89). When testing the same trained models on the small set of ALF patients, their performance unsurprisingly drops a lot (AUC = 0.67 for NaiveBayes). The statistical and machine learning results underscore small groups of upstream and downstream comorbidities that accurately detect ALC patients and show promise in prediction of ALF. Some of these groups are conditions either caused by alcohol or caused by malnutrition associated with alcohol-overuse. Others are comorbidities either related to trauma and life-style or to complications to cirrhosis, such as oesophageal varices. Our findings highlight the potential of this approach to uncover knowledge in registry data related to ALD.

Grissa Dhouha, Nytoft Rasmussen Ditlev, Krag Aleksander, Brunak Søren, Juhl Jensen Lars

2020-Sep-22

Radiology Radiology

Performance of a Deep Learning Algorithm Compared with Radiologic Interpretation for Lung Cancer Detection on Chest Radiographs in a Health Screening Population.

In Radiology ; h5-index 91.0

Background The performance of a deep learning algorithm for lung cancer detection on chest radiographs in a health screening population is unknown. Purpose To validate a commercially available deep learning algorithm for lung cancer detection on chest radiographs in a health screening population. Materials and Methods Out-of-sample testing of a deep learning algorithm was retrospectively performed using chest radiographs from individuals undergoing a comprehensive medical check-up between July 2008 and December 2008 (validation test). To evaluate the algorithm performance for visible lung cancer detection, the area under the receiver operating characteristic curve (AUC) and diagnostic measures, including sensitivity and false-positive rate (FPR), were calculated. The algorithm performance was compared with that of radiologists using the McNemar test and the Moskowitz method. Additionally, the deep learning algorithm was applied to a screening cohort undergoing chest radiography between January 2008 and December 2012, and its performances were calculated. Results In a validation test comprising 10 285 radiographs from 10 202 individuals (mean age, 54 years ± 11 [standard deviation]; 5857 men) with 10 radiographs of visible lung cancers, the algorithm's AUC was 0.99 (95% confidence interval: 0.97, 1), and it showed comparable sensitivity (90% [nine of 10 radiographs]) to that of the radiologists (60% [six of 10 radiographs]; P = .25) with a higher FPR (3.1% [319 of 10 275 radiographs] vs 0.3% [26 of 10 275 radiographs]; P < .001). In the screening cohort of 100 525 chest radiographs from 50 070 individuals (mean age, 53 years ± 11; 28 090 men) with 47 radiographs of visible lung cancers, the algorithm's AUC was 0.97 (95% confidence interval: 0.95, 0.99), and its sensitivity and FPR were 83% (39 of 47 radiographs) and 3% (2999 of 100 478 radiographs), respectively. Conclusion A deep learning algorithm detected lung cancers on chest radiographs with a performance comparable to that of radiologists, which will be helpful for radiologists in healthy populations with a low prevalence of lung cancer. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Armato in this issue.

Lee Jong Hyuk, Sun Hye Young, Park Sunggyun, Kim Hyungjin, Hwang Eui Jin, Goo Jin Mo, Park Chang Min

2020-Sep-22

General General

Addressing health disparities in the Food and Drug Administration's artificial intelligence and machine learning regulatory framework.

In Journal of the American Medical Informatics Association : JAMIA

The exponential growth of health data from devices, health applications, and electronic health records coupled with the development of data analysis tools such as machine learning offer opportunities to leverage these data to mitigate health disparities. However, these tools have also been shown to exacerbate inequities faced by marginalized groups. Focusing on health disparities should be part of good machine learning practice and regulatory oversight of software as medical devices. Using the Food and Drug Administration (FDA)'s proposed framework for regulating machine learning tools in medicine, I show that addressing health disparities during the premarket and postmarket stages of review can help anticipate and mitigate group harms.

Ferryman Kadija

2020-Sep-20

artificial intelligence, health disparities, health policy, machine learning

Public Health Public Health

Evaluating the informativeness of deep learning annotations for human complex diseases.

In Nature communications ; h5-index 260.0

Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations.

Dey Kushal K, van de Geijn Bryce, Kim Samuel Sungil, Hormozdiari Farhad, Kelley David R, Price Alkes L

2020-Sep-17

General General

Reinforcing materials modelling by encoding the structures of defects in crystalline solids into distortion scores.

In Nature communications ; h5-index 260.0

This work revises the concept of defects in crystalline solids and proposes a universal strategy for their characterization at the atomic scale using outlier detection based on statistical distances. The proposed strategy provides a generic measure that describes the distortion score of local atomic environments. This score facilitates automatic defect localization and enables a stratified description of defects, which allows to distinguish the zones with different levels of distortion within the structure. This work proposes applications for advanced materials modelling ranging from the surrogate concept for the energy per atom to the relevant information selection for evaluation of energy barriers from the mean force. Moreover, this concept can serve for design of robust interatomic machine learning potentials and high-throughput analysis of their databases. The proposed definition of defects opens up many perspectives for materials design and characterization, promoting thereby the development of novel techniques in materials science.

Goryaeva Alexandra M, Lapointe Clovis, Dai Chendi, Dérès Julien, Maillet Jean-Bernard, Marinica Mihai-Cosmin

2020-Sep-17

General General

Trialstreamer: A living, automatically updated database of clinical trial reports.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Randomized controlled trials (RCTs) are the gold standard method for evaluating whether a treatment works in health care but can be difficult to find and make use of. We describe the development and evaluation of a system to automatically find and categorize all new RCT reports.

MATERIALS AND METHODS : Trialstreamer continuously monitors PubMed and the World Health Organization International Clinical Trials Registry Platform, looking for new RCTs in humans using a validated classifier. We combine machine learning and rule-based methods to extract information from the RCT abstracts, including free-text descriptions of trial PICO (populations, interventions/comparators, and outcomes) elements and map these snippets to normalized MeSH (Medical Subject Headings) vocabulary terms. We additionally identify sample sizes, predict the risk of bias, and extract text conveying key findings. We store all extracted data in a database, which we make freely available for download, and via a search portal, which allows users to enter structured clinical queries. Results are ranked automatically to prioritize larger and higher-quality studies.

RESULTS : As of early June 2020, we have indexed 673 191 publications of RCTs, of which 22 363 were published in the first 5 months of 2020 (142 per day). We additionally include 304 111 trial registrations from the International Clinical Trials Registry Platform. The median trial sample size was 66.

CONCLUSIONS : We present an automated system for finding and categorizing RCTs. This yields a novel resource: a database of structured information automatically extracted for all published RCTs in humans. We make daily updates of this database available on our website (https://trialstreamer.robotreviewer.net).

Marshall Iain J, Nye Benjamin, Kuiper Joël, Noel-Storr Anna, Marshall Rachel, Maclean Rory, Soboczenski Frank, Nenkova Ani, Thomas James, Wallace Byron C

2020-Sep-17

automatic database curation, evidence based medicine, randomized controlled trials, research synthesis

General General

Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets.

In Nature communications ; h5-index 260.0

Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X's feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10×  faster than other tools. The advantage of HapTree-X's ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.

Berger Emily, Yorukoglu Deniz, Zhang Lillian, Nyquist Sarah K, Shalek Alex K, Kellis Manolis, Numanagić Ibrahim, Berger Bonnie

2020-09-16

General General

Publisher Correction: Improving the accuracy of medical diagnosis with causal machine learning.

In Nature communications ; h5-index 260.0

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Richens Jonathan G, Lee Ciarán M, Johri Saurabh

2020-09-16

General General

DeepHE: Accurately predicting human essential genes based on deep learning.

In PLoS computational biology

Accurately predicting essential genes using computational methods can greatly reduce the effort in finding them via wet experiments at both time and resource scales, and further accelerate the process of drug discovery. Several computational methods have been proposed for predicting essential genes in model organisms by integrating multiple biological data sources either via centrality measures or machine learning based methods. However, the methods aiming to predict human essential genes are still limited and the performance still need improve. In addition, most of the machine learning based essential gene prediction methods are lack of skills to handle the imbalanced learning issue inherent in the essential gene prediction problem, which might be one factor affecting their performance. We propose a deep learning based method, DeepHE, to predict human essential genes by integrating features derived from sequence data and protein-protein interaction (PPI) network. A deep learning based network embedding method is utilized to automatically learn features from PPI network. In addition, 89 sequence features were derived from DNA sequence and protein sequence for each gene. These two types of features are integrated to train a multilayer neural network. A cost-sensitive technique is used to address the imbalanced learning problem when training the deep neural network. The experimental results for predicting human essential genes show that our proposed method, DeepHE, can accurately predict human gene essentiality with an average performance of AUC higher than 94%, the area under precision-recall curve (AP) higher than 90%, and the accuracy higher than 90%. We also compare DeepHE with several widely used traditional machine learning models (SVM, Naïve Bayes, Random Forest, and Adaboost) using the same features and utilizing the same cost-sensitive technique to against the imbalanced learning issue. The experimental results show that DeepHE significantly outperforms the compared machine learning models. We have demonstrated that human essential genes can be accurately predicted by designing effective machine learning algorithm and integrating representative features captured from available biological data. The proposed deep learning framework is effective for such task.

Zhang Xue, Xiao Wangxin, Xiao Weijia

2020-Sep-16

Public Health Public Health

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning.

In JAMA network open

Importance : Childhood lead poisoning causes irreversible neurobehavioral deficits, but current practice is secondary prevention.

Objective : To validate a machine learning (random forest) prediction model of elevated blood lead levels (EBLLs) by comparison with a parsimonious logistic regression.

Design, Setting, and Participants : This prognostic study for temporal validation of multivariable prediction models used data from the Women, Infants, and Children (WIC) program of the Chicago Department of Public Health. Participants included a development cohort of children born from January 1, 2007, to December 31, 2012, and a validation WIC cohort born from January 1 to December 31, 2013. Blood lead levels were measured until December 31, 2018. Data were analyzed from January 1 to October 31, 2019.

Exposures : Blood lead level test results; lead investigation findings; housing characteristics, permits, and violations; and demographic variables.

Main Outcomes and Measures : Incident EBLL (≥6 μg/dL). Models were assessed using the area under the receiver operating characteristic curve (AUC) and confusion matrix metrics (positive predictive value, sensitivity, and specificity) at various thresholds.

Results : Among 6812 children in the WIC validation cohort, 3451 (50.7%) were female, 3057 (44.9%) were Hispanic, 2804 (41.2%) were non-Hispanic Black, 458 (6.7%) were non-Hispanic White, and 442 (6.5%) were Asian (mean [SD] age, 5.5 [0.3] years). The median year of housing construction was 1919 (interquartile range, 1903-1948). Random forest AUC was 0.69 compared with 0.64 for logistic regression (difference, 0.05; 95% CI, 0.02-0.08). When predicting the 5% of children at highest risk to have EBLLs, random forest and logistic regression models had positive predictive values of 15.5% and 7.8%, respectively (difference, 7.7%; 95% CI, 3.7%-11.3%), sensitivity of 16.2% and 8.1%, respectively (difference, 8.1%; 95% CI, 3.9%-11.7%), and specificity of 95.5% and 95.1% (difference, 0.4%; 95% CI, 0.0%-0.7%).

Conclusions and Relevance : The machine learning model outperformed regression in predicting childhood lead poisoning, especially in identifying children at highest risk. Such a model could be used to target the allocation of lead poisoning prevention resources to these children.

Potash Eric, Ghani Rayid, Walsh Joe, Jorgensen Emile, Lohff Cortland, Prachand Nik, Mansour Raed

2020-Sep-01

Surgery Surgery

Reporting of demographic data and representativeness in machine learning models using electronic health records.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic health record (EHR) data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility.

MATERIALS AND METHODS : We searched PubMed for articles applying ML models to improve clinical decision-making using EHR data. We limited our search to papers published between 2015 and 2019.

RESULTS : Across the 164 studies reviewed, demographic variables were inconsistently reported and/or included as model inputs. Race/ethnicity was not reported in 64%; gender and age were not reported in 24% and 21% of studies, respectively. Socioeconomic status of the population was not reported in 92% of studies. Studies that mentioned these variables often did not report if they were included as model inputs. Few models (12%) were validated using external populations. Few studies (17%) open-sourced their code. Populations in the ML studies include higher proportions of White and Black yet fewer Hispanic subjects compared to the general US population.

DISCUSSION : The demographic characteristics of study populations are poorly reported in the ML literature based on EHR data. Demographic representativeness in training data and model transparency is necessary to ensure that ML models are deployed in an equitable and reproducible manner. Wider adoption of reporting guidelines is warranted to improve representativeness and reproducibility.

Bozkurt Selen, Cahan Eli M, Seneviratne Martin G, Sun Ran, Lossio-Ventura Juan A, Ioannidis John P A, Hernandez-Boussard Tina

2020-Sep-16

clinical decision support, bias, transparency, demographic data, electronic health record, machine learning

Dermatology Dermatology

Age and life expectancy clocks based on machine learning analysis of mouse frailty.

In Nature communications ; h5-index 260.0

The identification of genes and interventions that slow or reverse aging is hampered by the lack of non-invasive metrics that can predict the life expectancy of pre-clinical models. Frailty Indices (FIs) in mice are composite measures of health that are cost-effective and non-invasive, but whether they can accurately predict health and lifespan is not known. Here, mouse FIs are scored longitudinally until death and machine learning is employed to develop two clocks. A random forest regression is trained on FI components for chronological age to generate the FRIGHT (Frailty Inferred Geriatric Health Timeline) clock, a strong predictor of chronological age. A second model is trained on remaining lifespan to generate the AFRAID (Analysis of Frailty and Death) clock, which accurately predicts life expectancy and the efficacy of a lifespan-extending intervention up to a year in advance. Adoption of these clocks should accelerate the identification of longevity genes and aging interventions.

Schultz Michael B, Kane Alice E, Mitchell Sarah J, MacArthur Michael R, Warner Elisa, Vogel David S, Mitchell James R, Howlett Susan E, Bonkowski Michael S, Sinclair David A

2020-09-15

General General

Deep learning enabled smart mats as a scalable floor monitoring system.

In Nature communications ; h5-index 260.0

Toward smart building and smart home, floor as one of our most frequently interactive interfaces can be implemented with embedded sensors to extract abundant sensory information without the video-taken concerns. Yet the previously developed floor sensors are normally of small scale, high implementation cost, large power consumption, and complicated device configuration. Here we show a smart floor monitoring system through the integration of self-powered triboelectric floor mats and deep learning-based data analytics. The floor mats are fabricated with unique "identity" electrode patterns using a low-cost and highly scalable screen printing technique, enabling a parallel connection to reduce the system complexity and the deep-learning computational cost. The stepping position, activity status, and identity information can be determined according to the instant sensory data analytics. This developed smart floor technology can establish the foundation using floor as the functional interface for diverse applications in smart building/home, e.g., intelligent automation, healthcare, and security.

Shi Qiongfeng, Zhang Zixuan, He Tianyiyi, Sun Zhongda, Wang Bingjie, Feng Yuqin, Shan Xuechuan, Salam Budiman, Lee Chengkuo

2020-Sep-14

General General

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.

MATERIALS AND METHODS : Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers.

RESULTS : Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers.

DISCUSSION AND CONCLUSIONS : Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.

Carrell David S, Malin Bradley A, Cronkite David J, Aberdeen John S, Clark Cheryl, Li Muqun Rachel, Bastakoty Dikshya, Nyemba Steve, Hirschman Lynette

2020-Sep-15

biomedical research, confidentiality, de-identification, electronic health records, natural language processing, privacy

General General

Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning.

MATERIALS AND METHODS : We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, ICD-9, ICD-10, and PheWAS, and evaluated the prediction performances of these terminologies on 2 different tasks: the risk prediction of heart failure in diabetes patients and the risk prediction of pancreatic cancer. Two popular models were evaluated: logistic regression and a recurrent neural network.

RESULTS : For logistic regression, using UMLS delivered the optimal area under the receiver operating characteristics (AUROC) results in both dengue hemorrhagic fever (81.15%) and pancreatic cancer (80.53%) tasks. For recurrent neural network, UMLS worked best for pancreatic cancer prediction (AUROC 82.24%), second only (AUROC 85.55%) to PheWAS (AUROC 85.87%) for dengue hemorrhagic fever prediction.

DISCUSSION/CONCLUSION : In our experiments, terminologies with larger vocabularies and finer-grained representations were associated with better prediction performances. In particular, UMLS is consistently 1 of the best-performing ones. We believe that our work may help to inform better designs of predictive models, although further investigation is warranted.

Rasmy Laila, Tiryaki Firat, Zhou Yujia, Xiang Yang, Tao Cui, Xu Hua, Zhi Degui

2020-Sep-15

UMLS, electronic health records, predictive modeling, terminology representation

Public Health Public Health

Electronic health record-based disease surveillance systems: A systematic literature review on challenges and solutions.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Disease surveillance systems are expanding using electronic health records (EHRs). However, there are many challenges in this regard. In the present study, the solutions and challenges of implementing EHR-based disease surveillance systems (EHR-DS) have been reviewed.

MATERIALS AND METHODS : We searched the related keywords in ProQuest, PubMed, Web of Science, Cochrane Library, Embase, and Scopus. Then, we assessed and selected articles using the inclusion and exclusion criteria and, finally, classified the identified solutions and challenges.

RESULTS : Finally, 50 studies were included, and 52 unique solutions and 47 challenges were organized into 6 main themes (policy and regulatory, technical, management, standardization, financial, and data quality). The results indicate that due to the multifaceted nature of the challenges, the implementation of EHR-DS is not low cost and easy to implement and requires a variety of interventions. On the one hand, the most common challenges include the need to invest significant time and resources; the poor data quality in EHRs; difficulty in analyzing, cleaning, and accessing unstructured data; data privacy and security; and the lack of interoperability standards. On the other hand, the most common solutions are the use of natural language processing and machine learning algorithms for unstructured data; the use of appropriate technical solutions for data retrieval, extraction, identification, and visualization; the collaboration of health and clinical departments to access data; standardizing EHR content for public health; and using a unique health identifier for individuals.

CONCLUSIONS : EHR systems have an important role in modernizing disease surveillance systems. However, there are many problems and challenges facing the development and implementation of EHR-DS that need to be appropriately addressed.

Aliabadi Ali, Sheikhtaheri Abbas, Ansari Hossein

2020-Sep-14

challenges, disease surveillance, electronic health record, public health, solutions

Surgery Surgery

NuSeT: A deep learning tool for reliably separating and analyzing crowded cells.

In PLoS computational biology

Segmenting cell nuclei within microscopy images is a ubiquitous task in biological research and clinical applications. Unfortunately, segmenting low-contrast overlapping objects that may be tightly packed is a major bottleneck in standard deep learning-based models. We report a Nuclear Segmentation Tool (NuSeT) based on deep learning that accurately segments nuclei across multiple types of fluorescence imaging data. Using a hybrid network consisting of U-Net and Region Proposal Networks (RPN), followed by a watershed step, we have achieved superior performance in detecting and delineating nuclear boundaries in 2D and 3D images of varying complexities. By using foreground normalization and additional training on synthetic images containing non-cellular artifacts, NuSeT improves nuclear detection and reduces false positives. NuSeT addresses common challenges in nuclear segmentation such as variability in nuclear signal and shape, limited training sample size, and sample preparation artifacts. Compared to other segmentation models, NuSeT consistently fares better in generating accurate segmentation masks and assigning boundaries for touching nuclei.

Yang Linfeng, Ghosh Rajarshi P, Franklin J Matthew, Chen Simon, You Chenyu, Narayan Raja R, Melcher Marc L, Liphardt Jan T

2020-Sep-14

General General

Who Gets Credit for AI-Generated Art?

In iScience

The recent sale of an artificial intelligence (AI)-generated portrait for $432,000 at Christie's art auction has raised questions about how credit and responsibility should be allocated to individuals involved and how the anthropomorphic perception of the AI system contributed to the artwork's success. Here, we identify natural heterogeneity in the extent to which different people perceive AI as anthropomorphic. We find that differences in the perception of AI anthropomorphicity are associated with different allocations of responsibility to the AI system and credit to different stakeholders involved in art production. We then show that perceptions of AI anthropomorphicity can be manipulated by changing the language used to talk about AI-as a tool versus agent-with consequences for artists and AI practitioners. Our findings shed light on what is at stake when we anthropomorphize AI systems and offer an empirical lens to reason about how to allocate credit and responsibility to human stakeholders.

Epstein Ziv, Levine Sydney, Rand David G, Rahwan Iyad

2020-Aug-29

Artificial Intelligence, Computer Science, Economics

Ophthalmology Ophthalmology

Aberrant expression of PAX6 gene associated with classical aniridia: identification and functional characterization of novel noncoding mutations.

In Journal of human genetics

The PAX6 is essential for ocular morphogenesis and is known to be highly sensitive to changes in gene expression, where neither over- nor under-expression ensures normal ocular development. Two unrelated probands with classical aniridia who were previously considered "PAX6-negative", were studied by whole-genome sequencing. Through the use of multiple in silico deep learning-based algorithms, we identified two novel putative causal mutations, c.-133_-132del in the 5' untranslated region (5'-UTR) and c.-52 + 5G>A in an intron upstream of the PAX6 gene. The luciferase activity was significantly increased and VAX2 binding was disrupted with the former 5'-UTR variant compared with wild-type sequence, which resulted in a striking overexpression of PAX6. The minigene assay showed that the c.-52 + 5G>A mutation caused defective splicing, which resulted in the formation of truncated transcripts.

Lee Junwon, Suh Yoonjong, Jeong Han, Kim Gu-Hwan, Byeon Suk Ho, Han Jinu, Lim Hyun Taek

2020-Sep-12

General General

Detection and segmentation of morphologically complex eukaryotic cells in fluorescence microscopy images via feature pyramid fusion.

In PLoS computational biology

Detection and segmentation of macrophage cells in fluorescence microscopy images is a challenging problem, mainly due to crowded cells, variation in shapes, and morphological complexity. We present a new deep learning approach for cell detection and segmentation that incorporates previously learned nucleus features. A novel fusion of feature pyramids for nucleus detection and segmentation with feature pyramids for cell detection and segmentation is used to improve performance on a microscopic image dataset created by us and provided for public use, containing both nucleus and cell signals. Our experimental results indicate that cell detection and segmentation performance significantly benefit from the fusion of previously learned nucleus features. The proposed feature pyramid fusion architecture clearly outperforms a state-of-the-art Mask R-CNN approach for cell detection and segmentation with relative mean average precision improvements of up to 23.88% and 23.17%, respectively.

Korfhage Nikolaus, Mühling Markus, Ringshandl Stephan, Becker Anke, Schmeck Bernd, Freisleben Bernd

2020-Sep-08

Radiology Radiology

Novel Approaches to Screening for Breast Cancer.

In Radiology ; h5-index 91.0

Screening for breast cancer reduces breast cancer-related mortality and earlier detection facilitates less aggressive treatment. Unfortunately, current screening modalities are imperfect, suffering from limited sensitivity and high false-positive rates. Novel techniques in the field of breast imaging may soon play a role in breast cancer screening: digital breast tomosynthesis, contrast material-enhanced spectral mammography, US (automated three-dimensional breast US, transmission tomography, elastography, optoacoustic imaging), MRI (abbreviated and ultrafast, diffusion-weighted imaging), and molecular breast imaging. Artificial intelligence and radiomics have the potential to further improve screening strategies. Furthermore, nonimaging-based screening tests such as liquid biopsy and breathing tests may transform the screening landscape. © RSNA, 2020 Online supplemental material is available for this article.

Mann Ritse M, Hooley Regina, Barr Richard G, Moy Linda

2020-Sep-08

General General

Synthetic minority oversampling of vital statistics data with generative adversarial networks.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data. This is usual for vital statistics data, in which the outcome incidence dictates the amount of positive observations. In this article, we developed a novel neural network-based oversampling method called actGAN (activation-specific generative adversarial network) that can derive useful synthetic observations in terms of increasing prediction performance in this context.

MATERIALS AND METHODS : From vital statistics data, the outcome of early stillbirth was chosen to be predicted based on demographics, pregnancy history, and infections. The data contained 363 560 live births and 139 early stillbirths, resulting in class imbalance of 99.96% and 0.04%. The hyperparameters of actGAN and a baseline method SMOTE-NC (Synthetic Minority Over-sampling Technique-Nominal Continuous) were tuned with Bayesian optimization, and both were compared against a cost-sensitive learning-only approach.

RESULTS : While SMOTE-NC provided mixed results, actGAN was able to improve true positive rate at a clinically significant false positive rate and area under the curve from the receiver-operating characteristic curve consistently.

DISCUSSION : Including an activation-specific output layer to a generator network of actGAN enables the addition of information about the underlying data structure, which overperforms the nominal mechanism of SMOTE-NC.

CONCLUSIONS : actGAN provides an improvement to the prediction performance for our learning task. Our developed method could be applied to other mixed-type data prediction tasks that are known to be afflicted by class imbalance and limited data availability.

Koivu Aki, Sairanen Mikko, Airola Antti, Pahikkala Tapio

2020-Sep-04

artificial intelligence, deep learning, machine learning, stillbirth, vital statistics

General General

Shape-to-graph mapping method for efficient characterization and classification of complex geometries in biological images.

In PLoS computational biology

With the ever-increasing quality and quantity of imaging data in biomedical research comes the demand for computational methodologies that enable efficient and reliable automated extraction of the quantitative information contained within these images. One of the challenges in providing such methodology is the need for tailoring algorithms to the specifics of the data, limiting their areas of application. Here we present a broadly applicable approach to quantification and classification of complex shapes and patterns in biological or other multi-component formations. This approach integrates the mapping of all shape boundaries within an image onto a global information-rich graph and machine learning on the multidimensional measures of the graph. We demonstrated the power of this method by (1) extracting subtle structural differences from visually indistinguishable images in our phenotype rescue experiments using the endothelial tube formations assay, (2) training the algorithm to identify biophysical parameters underlying the formation of different multicellular networks in our simulation model of collective cell behavior, and (3) analyzing the response of U2OS cell cultures to a broad array of small molecule perturbations.

Pilcher William, Yang Xingyu, Zhurikhina Anastasia, Chernaya Olga, Xu Yinghan, Qiu Peng, Tsygankov Denis

2020-Sep

Ophthalmology Ophthalmology

Cost-effectiveness of Autonomous Point-of-Care Diabetic Retinopathy Screening for Pediatric Patients With Diabetes.

In JAMA ophthalmology ; h5-index 58.0

Importance : Screening for diabetic retinopathy is recommended for children with type 1 diabetes (T1D) and type 2 diabetes (T2D), yet screening rates remain low. Point-of-care diabetic retinopathy screening using autonomous artificial intelligence (AI) has become available, providing immediate results in the clinic setting, but the cost-effectiveness of this strategy compared with standard examination is unknown.

Objective : To assess the cost-effectiveness of detecting and treating diabetic retinopathy and its sequelae among children with T1D and T2D using AI diabetic retinopathy screening vs standard screening by an eye care professional (ECP).

Design, Setting, and Participants : In this economic evaluation, parameter estimates were obtained from the literature from 1994 to 2019 and assessed from March 2019 to January 2020. Parameters included out-of-pocket cost for autonomous AI screening, ophthalmology visits, and treating diabetic retinopathy; probability of undergoing standard retinal examination; relative odds of undergoing screening; and sensitivity, specificity, and diagnosability of the ECP screening examination and autonomous AI screening.

Main Outcomes and Measures : Costs or savings to the patient based on mean patient payment for diabetic retinopathy screening examination and cost-effectiveness based on costs or savings associated with the number of true-positive results identified by diabetic retinopathy screening.

Results : In this study, the expected true-positive proportions for standard ophthalmologic screening by an ECP were 0.006 for T1D and 0.01 for T2D, and the expected true-positive proportions for autonomous AI were 0.03 for T1D and 0.04 for T2D. The base case scenario of 20% adherence estimated that use of autonomous AI would result in a higher mean patient payment ($8.52 for T1D and $10.85 for T2D) than conventional ECP screening ($7.91 for T1D and $8.20 for T2D). However, autonomous AI screening was the preferred strategy when at least 23% of patients adhered to diabetic retinopathy screening.

Conclusions and Relevance : These results suggest that point-of-care diabetic retinopathy screening using autonomous AI systems is effective and cost saving for children with diabetes and their caregivers at recommended adherence rates.

Wolf Risa M, Channa Roomasa, Abramoff Michael D, Lehmann Harold P

2020-Sep-03

General General

Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks.

In Nature communications ; h5-index 260.0

Deep learning with Convolutional Neural Networks has shown great promise in image-based classification and enhancement but is often unsuitable for predictive modeling using features without spatial correlations. We present a feature representation approach termed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) to arrange high-dimensional vectors in a compact image form conducible for CNN-based deep learning. We consider the similarities between features to generate a concise feature map in the form of a two-dimensional image by minimizing the pairwise distance values following a Bayesian Metric Multidimensional Scaling Approach. We hypothesize that this approach enables embedded feature extraction and, integrated with CNN-based deep learning, can boost the predictive accuracy. We illustrate the superior predictive capabilities of the proposed framework as compared to state-of-the-art methodologies in drug sensitivity prediction scenarios using synthetic datasets, drug chemical descriptors as predictors from NCI60, and both transcriptomic information and drug descriptors as predictors from GDSC.

Bazgir Omid, Zhang Ruibo, Dhruba Saugato Rahman, Rahman Raziur, Ghosh Souparno, Pal Ranadip

2020-Sep-01

General General

Astrocyte-mediated switch in spike timing-dependent plasticity during hippocampal development.

In Nature communications ; h5-index 260.0

Presynaptic spike timing-dependent long-term depression (t-LTD) at hippocampal CA3-CA1 synapses is evident until the 3rd postnatal week in mice, disappearing during the 4th week. At more mature stages, we found that the protocol that induced t-LTD induced t-LTP. We characterized this form of t-LTP and the mechanisms involved in its induction, as well as that driving this switch from t-LTD to t-LTP. We found that this t-LTP is expressed presynaptically at CA3-CA1 synapses, as witnessed by coefficient of variation, number of failures, paired-pulse ratio and miniature responses analysis. Additionally, this form of presynaptic t-LTP does not require NMDARs but the activation of mGluRs and the entry of Ca2+ into the postsynaptic neuron through L-type voltage-dependent Ca2+ channels and the release of Ca2+ from intracellular stores. Nitric oxide is also required as a messenger from the postsynaptic neuron. Crucially, the release of adenosine and glutamate by astrocytes is required for t-LTP induction and for the switch from t-LTD to t-LTP. Thus, we have discovered a developmental switch of synaptic transmission from t-LTD to t-LTP at hippocampal CA3-CA1 synapses in which astrocytes play a central role and revealed a form of presynaptic LTP and the rules for its induction.

Falcón-Moya Rafael, Pérez-Rodríguez Mikel, Prius-Mengual José, Andrade-Talavera Yuniesky, Arroyo-García Luis E, Pérez-Artés Rocío, Mateos-Aparicio Pedro, Guerra-Gomes Sónia, Oliveira João Filipe, Flores Gonzalo, Rodríguez-Moreno Antonio

2020-Sep-01

Internal Medicine Internal Medicine

Assessment of a Deep Learning Model to Predict Hepatocellular Carcinoma in Patients With Hepatitis C Cirrhosis.

In JAMA network open

Importance : Deep learning, a family of machine learning models that use artificial neural networks, has achieved great success at predicting outcomes in nonmedical domains.

Objective : To examine whether deep learning recurrent neural network (RNN) models that use raw longitudinal data extracted directly from electronic health records outperform conventional regression models in predicting the risk of developing hepatocellular carcinoma (HCC).

Design, Setting, and Participants : This prognostic study included 48 151 patients with hepatitis C virus (HCV)-related cirrhosis in the national Veterans Health Administration who had at least 3 years of follow-up after the diagnosis of cirrhosis. Patients were identified by having at least 1 positive HCV RNA test between January 1, 2000, to January 1, 2016, and were followed up from the diagnosis of cirrhosis to January 1, 2019, for the development of incident HCC. A total of 3 models predicting HCC during a 3-year period were developed and compared, as follows: (1) logistic regression (LR) with cross-sectional inputs (cross-sectional LR); (2) LR with longitudinal inputs (longitudinal LR); and (3) RNN with longitudinal inputs. Data analysis was conducted from April 2018 to August 2020.

Exposures : Development of HCC.

Main Outcomes and Measures : Area under the receiver operating characteristic curve, area under the precision-recall curve, and Brier score.

Results : During a mean (SD) follow-up of 11.6 (5.0) years, 10 741 of 48 151 patients (22.3%) developed HCC (annual incidence, 3.1%), and a total of 52 983 samples (51 948 [98.0%] from men) were collected. Patients who developed HCC within 3 years were older than patients who did not (mean [SD] age, 58.2 [6.6] years vs 56.9 [6.9] years). RNN models had superior mean (SD) area under the receiver operating characteristic curve (0.759 [0.009]) and mean (SD) Brier score (0.136 [0.003]) than cross-sectional LR (0.689 [0.009] and 0.149 [0.003], respectively) and longitudinal LR (0.682 [0.007] and 0.150 [0.003], respectively) models. Using the RNN model, the samples with the mean (SD) highest 51% (1.5%) of HCC risk, in which 80% of all HCCs occurred, or the mean (SD) highest 66% (1.2%) of HCC risk, in which 90% of all HCCs occurred, could potentially be targeted. Among samples from patients who achieved sustained virologic response, the performance of the RNN models was even better (mean [SD] area under receiver operating characteristic curve, 0.806 [0.025]; mean [SD] Brier score, 0.117 [0.007]).

Conclusions and Relevance : In this study, deep learning RNN models outperformed conventional LR models, suggesting that RNN models could be used to identify patients with HCV-related cirrhosis with a high risk of developing HCC for risk-based HCC outreach and surveillance strategies.

Ioannou George N, Tang Weijing, Beste Lauren A, Tincopa Monica A, Su Grace L, Van Tony, Tapper Elliot B, Singal Amit G, Zhu Ji, Waljee Akbar K

2020-Sep-01

General General

Predicting complications of diabetes mellitus using advanced machine learning algorithms.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : We sought to predict if patients with type 2 diabetes mellitus (DM2) would develop 10 selected complications. Accurate prediction of complications could help with more targeted measures that would prevent or slow down their development.

MATERIALS AND METHODS : Experiments were conducted on the Healthcare Cost and Utilization Project State Inpatient Databases of California for the period of 2003 to 2011. Recurrent neural network (RNN) long short-term memory (LSTM) and RNN gated recurrent unit (GRU) deep learning methods were designed and compared with random forest and multilayer perceptron traditional models. Prediction accuracy of selected complications were compared on 3 settings corresponding to minimum number of hospitalizations between diabetes diagnosis and the diagnosis of complications.

RESULTS : The diagnosis domain was used for experiments. The best results were achieved with RNN GRU model, followed by RNN LSTM model. The prediction accuracy achieved with RNN GRU model was between 73% (myocardial infarction) and 83% (chronic ischemic heart disease), while accuracy of traditional models was between 66% - 76%.

DISCUSSION : The number of hospitalizations was an important factor for the prediction accuracy. Experiments with 4 hospitalizations achieved significantly better accuracy than with 2 hospitalizations. To achieve improved accuracy deep learning models required training on at least 1000 patients and accuracy significantly dropped if training datasets contained 500 patients. The prediction accuracy of complications decreases over time period. Considering individual complications, the best accuracy was achieved on depressive disorder and chronic ischemic heart disease.

CONCLUSIONS : The RNN GRU model was the best choice for electronic medical record type of data, based on the achieved results.

Ljubic Branimir, Hai Ameen Abdel, Stanojevic Marija, Diaz Wilson, Polimac Daniel, Pavlovski Martin, Obradovic Zoran

2020-Sep-01

RNN models, deep learning, diabetes mellitus, diabetes mellitus complications, machine learning