Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Radiology Radiology

The sub-millisievert era in CTCA: the technical basis of the new radiation dose approach.

In La Radiologia medica

Computed tomography coronary angiography (CTCA) has become a cornerstone in the diagnostic process of the heart disease. Although the cardiac imaging with interventional procedures is responsible for approximately 40% of the cumulative effective dose in medical imaging, a relevant radiation dose reduction over the last decade was obtained, with the beginning of the sub-mSv era in CTCA. The main technical basis to obtain a radiation dose reduction in CTCA is the use of a low tube voltage, the adoption of a prospective electrocardiogram-triggering spiral protocol and the application of the tube current modulation with the iterative reconstruction technique. Nevertheless, CTCA examinations are characterized by a wide range of radiation doses between different radiology departments. Moreover, the dose exposure in CTCA is extremely important because the benefit-risk calculus in comparison with other modalities also depends on it. Finally, because anatomical evaluation not adequately predicts the hemodynamic relevance of coronary stenosis, a low radiation dose in routine CTCA would allow the greatest use of the myocardial CT perfusion, fractional flow reserve-CT, dual-energy CT and artificial intelligence, to shift focus from morphological assessment to a comprehensive morphological and functional evaluation of the stenosis. Therefore, the aim of this work is to summarize the correct use of the technical basis in order that CTCA becomes an established examination for assessment of the coronary artery disease with low radiation dose.

Schicchi Nicolò, Fogante Marco, Palumbo Pierpaolo, Agliata Giacomo, Esposto Pirani Paolo, Di Cesare Ernesto, Giovagnoni Andrea


Cardiac CT, Coronary CT, Dual-source CT, High-pitch protocol, Radiation dose, Radiation reduction

General General

Accurate and efficient structure-based computational mutagenesis for modeling fluorescence levels of Aequorea victoria green fluorescent protein mutants.

In Protein engineering, design & selection : PEDS

A computational mutagenesis technique was used to characterize the structural effects associated with over 46 000 single and multiple amino acid variants of Aequorea victoria green fluorescent protein (GFP), whose functional effects (fluorescence levels) were recently measured by experimental researchers. For each GFP mutant, the approach generated a single score reflecting the overall change in sequence-structure compatibility relative to native GFP, as well as a vector of environmental perturbation (EP) scores characterizing the impact at all GFP residue positions. A significant GFP structure-function relationship (P < 0.0001) was elucidated by comparing the sequence-structure compatibility scores with the functional data. Next, the computed vectors for GFP mutants were used to train predictive models of fluorescence by implementing random forest (RF) classification and tree regression machine learning algorithms. Classification performance reached 0.93 for sensitivity, 0.91 for precision and 0.90 for balanced accuracy, and regression models led to Pearson's correlation as high as r = 0.83 between experimental and predicted GFP mutant fluorescence. An RF model trained on a subset of over 1000 experimental single residue GFP mutants with measured fluorescence was used for predicting the 3300 remaining unstudied single residue mutants, with results complementing known GFP biochemical and biophysical properties. In addition, models trained on the subset of experimental GFP mutants harboring multiple residue replacements successfully predicted fluorescence of the single residue GFP mutants. The models developed for this study were accurate and efficient, and their predictions outperformed those of several related state-of-the-art methods.

Masso Majid


GFP, machine learning, prediction, structure–function relationships

General General

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.

MATERIALS AND METHODS : Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers.

RESULTS : Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers.

DISCUSSION AND CONCLUSIONS : Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.

Carrell David S, Malin Bradley A, Cronkite David J, Aberdeen John S, Clark Cheryl, Li Muqun Rachel, Bastakoty Dikshya, Nyemba Steve, Hirschman Lynette


biomedical research, confidentiality, de-identification, electronic health records, natural language processing, privacy

General General

Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.

In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning.

MATERIALS AND METHODS : We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, ICD-9, ICD-10, and PheWAS, and evaluated the prediction performances of these terminologies on 2 different tasks: the risk prediction of heart failure in diabetes patients and the risk prediction of pancreatic cancer. Two popular models were evaluated: logistic regression and a recurrent neural network.

RESULTS : For logistic regression, using UMLS delivered the optimal area under the receiver operating characteristics (AUROC) results in both dengue hemorrhagic fever (81.15%) and pancreatic cancer (80.53%) tasks. For recurrent neural network, UMLS worked best for pancreatic cancer prediction (AUROC 82.24%), second only (AUROC 85.55%) to PheWAS (AUROC 85.87%) for dengue hemorrhagic fever prediction.

DISCUSSION/CONCLUSION : In our experiments, terminologies with larger vocabularies and finer-grained representations were associated with better prediction performances. In particular, UMLS is consistently 1 of the best-performing ones. We believe that our work may help to inform better designs of predictive models, although further investigation is warranted.

Rasmy Laila, Tiryaki Firat, Zhou Yujia, Xiang Yang, Tao Cui, Xu Hua, Zhi Degui


UMLS, electronic health records, predictive modeling, terminology representation

Public Health Public Health

An Innovative Artificial Intelligence-Based App for the Diagnosis of Gestational Diabetes Mellitus (GDM-AI): Development Study.

In Journal of medical Internet research ; h5-index 88.0

BACKGROUND : Gestational diabetes mellitus (GDM) can cause adverse consequences to both mothers and their newborns. However, pregnant women living in low- and middle-income areas or countries often fail to receive early clinical interventions at local medical facilities due to restricted availability of GDM diagnosis. The outstanding performance of artificial intelligence (AI) in disease diagnosis in previous studies demonstrates its promising applications in GDM diagnosis.

OBJECTIVE : This study aims to investigate the implementation of a well-performing AI algorithm in GDM diagnosis in a setting, which requires fewer medical equipment and staff and to establish an app based on the AI algorithm. This study also explores possible progress if our app is widely used.

METHODS : An AI model that included 9 algorithms was trained on 12,304 pregnant outpatients with their consent who received a test for GDM in the obstetrics and gynecology department of the First Affiliated Hospital of Jinan University, a local hospital in South China, between November 2010 and October 2017. GDM was diagnosed according to American Diabetes Association (ADA) 2011 diagnostic criteria. Age and fasting blood glucose were chosen as critical parameters. For validation, we performed k-fold cross-validation (k=5) for the internal dataset and an external validation dataset that included 1655 cases from the Prince of Wales Hospital, the affiliated teaching hospital of the Chinese University of Hong Kong, a non-local hospital. Accuracy, sensitivity, and other criteria were calculated for each algorithm.

RESULTS : The areas under the receiver operating characteristic curve (AUROC) of external validation dataset for support vector machine (SVM), random forest, AdaBoost, k-nearest neighbors (kNN), naive Bayes (NB), decision tree, logistic regression (LR), eXtreme gradient boosting (XGBoost), and gradient boosting decision tree (GBDT) were 0.780, 0.657, 0.736, 0.669, 0.774, 0.614, 0.769, 0.742, and 0.757, respectively. SVM also retained high performance in other criteria. The specificity for SVM retained 100% in the external validation set with an accuracy of 88.7%.

CONCLUSIONS : Our prospective and multicenter study is the first clinical study that supports the GDM diagnosis for pregnant women in resource-limited areas, using only fasting blood glucose value, patients' age, and a smartphone connected to the internet. Our study proved that SVM can achieve accurate diagnosis with less operation cost and higher efficacy. Our study (referred to as GDM-AI study, ie, the study of AI-based diagnosis of GDM) also shows our app has a promising future in improving the quality of maternal health for pregnant women, precision medicine, and long-distance medical care. We recommend future work should expand the dataset scope and replicate the process to validate the performance of the AI algorithms.

Shen Jiayi, Chen Jiebin, Zheng Zequan, Zheng Jiabin, Liu Zherui, Song Jian, Wong Sum Yi, Wang Xiaoling, Huang Mengqi, Fang Po-Han, Jiang Bangsheng, Tsang Winghei, He Zonglin, Liu Taoran, Akinwunmi Babatunde, Wang Chi Chiu, Zhang Casper J P, Huang Jian, Ming Wai-Kit


AI, app, application, artificial intelligence, diabetes, diagnosis, disease diagnosis, gestational diabetes, innovation, maternal health care, rural, women

Public Health Public Health

Machine Learning-Based DNA Methylation Score for Fetal Exposure to Maternal Smoking: Development and Validation in Samples Collected from Adolescents and Adults.

In Environmental health perspectives ; h5-index 89.0

BACKGROUND : Fetal exposure to maternal smoking during pregnancy is associated with the development of noncommunicable diseases in the offspring. Maternal smoking may induce such long-term effects through persistent changes in the DNA methylome, which therefore hold the potential to be used as a biomarker of this early life exposure. With declining costs for measuring DNA methylation, we aimed to develop a DNA methylation score that can be used on adolescent DNA methylation data and thereby generate a score for in utero cigarette smoke exposure.

METHODS : We used machine learning methods to create a score reflecting exposure to maternal smoking during pregnancy. This score is based on peripheral blood measurements of DNA methylation (Illumina's Infinium HumanMethylation450K BeadChip). The score was developed and tested in the Raine Study with data from 995 white 17-y-old participants using 10-fold cross-validation. The score was further tested and validated in independent data from the Northern Finland Birth Cohort 1986 (NFBC1986) (16-y-olds) and 1966 (NFBC1966) (31-y-olds). Further, three previously proposed DNA methylation scores were applied for comparison. The final score was developed with 204 CpGs using elastic net regression.

RESULTS : Sensitivity and specificity values for the best performing previously developed classifier ("Reese Score") were 88% and 72% for Raine, 87% and 61% for NFBC1986 and 72% and 70% for NFBC1966, respectively; corresponding figures using the elastic net regression approach were 91% and 76% (Raine), 87% and 75% (NFBC1986), and 72% and 78% for NFBC1966.

CONCLUSION : We have developed a DNA methylation score for exposure to maternal smoking during pregnancy, outperforming the three previously developed scores. One possible application of the current score could be for model adjustment purposes or to assess its association with distal health outcomes where part of the effect can be attributed to maternal smoking. Further, it may provide a biomarker for fetal exposure to maternal smoking.

Rauschert Sebastian, Melton Phillip E, Heiskala Anni, Karhunen Ville, Burdge Graham, Craig Jeffrey M, Godfrey Keith M, Lillycrop Karen, Mori Trevor A, Beilin Lawrence J, Oddy Wendy H, Pennell Craig, Järvelin Marjo-Riitta, Sebert Sylvain, Huang Rae-Chi