Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Predicting Daily Sheltering Arrangements among Youth Experiencing Homelessness Using Diary Measurements Collected by Ecological Momentary Assessment.

In International journal of environmental research and public health ; h5-index 73.0

Youths experiencing homelessness (YEH) often cycle between various sheltering locations including spending nights on the streets, in shelters and with others. Few studies have explored the patterns of daily sheltering over time. A total of 66 participants completed 724 ecological momentary assessments that assessed daily sleeping arrangements. Analyses applied a hypothesis-generating machine learning algorithm (component-wise gradient boosting) to build interpretable models that would select only the best predictors of daily sheltering from a large set of 92 variables while accounting for the correlated nature of the data. Sheltering was examined as a three-category outcome comparing nights spent literally homeless, unstably housed or at a shelter. The final model retained 15 predictors. These predictors included (among others) specific stressors (e.g., not having a place to stay, parenting and hunger), discrimination (by a friend or nonspecified other; due to race or homelessness), being arrested and synthetic cannabinoids use (a.k.a., "kush"). The final model demonstrated success in classifying the categorical outcome. These results have implications for developing just-in-time adaptive interventions for improving the lives of YEH.

Suchting Robert, Businelle Michael S, Hwang Stephen W, Padhye Nikhil S, Yang Yijiong, Santa Maria Diane M


daily sleeping arrangement, data science, electronic momentary assessment, machine learning, youth experiencing homelessness

oncology Oncology

Parental Attitudes toward Artificial Intelligence-Driven Precision Medicine Technologies in Pediatric Healthcare.

In Children (Basel, Switzerland)

Precision medicine relies upon artificial intelligence (AI)-driven technologies that raise ethical and practical concerns. In this study, we developed and validated a measure of parental openness and concerns with AI-driven technologies in their child's healthcare. In this cross-sectional survey, we enrolled parents of children <18 years in 2 rounds for exploratory (n = 418) and confirmatory (n = 386) factor analysis. We developed a 12-item measure of parental openness to AI-driven technologies, and a 33-item measure identifying concerns that parents found important when considering these technologies. We also evaluated associations between openness and attitudes, beliefs, personality traits, and demographics. Parents (N = 804) reported mean openness to AI-driven technologies of M = 3.4/5, SD = 0.9. We identified seven concerns that parents considered important when evaluating these technologies: quality/accuracy, privacy, shared decision making, convenience, cost, human element of care, and social justice. In multivariable linear regression, parental openness was positively associated with quality (beta = 0.23), convenience (beta = 0.16), and cost (beta = 0.11), as well as faith in technology (beta = 0.23) and trust in health information systems (beta = 0.12). Parental openness was negatively associated with the perceived importance of shared decision making (beta = -0.16) and being female (beta = -0.12). Developers might support parental openness by addressing these concerns during the development and implementation of novel AI-driven technologies.

Sisk Bryan A, Antes Alison L, Burrous Sara, DuBois James M


artificial intelligence, biomedical technology, child health, ethics, machine learning, pediatrics, personalized medicine, precision medicine

General General

Accurate Blood-Based Diagnostic Biosignatures for Alzheimer's Disease via Automated Machine Learning.

In Journal of clinical medicine

Alzheimer's disease (AD) is the most common form of neurodegenerative dementia and its timely diagnosis remains a major challenge in biomarker discovery. In the present study, we analyzed publicly available high-throughput low-sample -omics datasets from studies in AD blood, by the AutoML technology Just Add Data Bio (JADBIO), to construct accurate predictive models for use as diagnostic biosignatures. Considering data from AD patients and age-sex matched cognitively healthy individuals, we produced three best performing diagnostic biosignatures specific for the presence of AD: A. A 506-feature transcriptomic dataset from 48 AD and 22 controls led to a miRNA-based biosignature via Support Vector Machines with three miRNA predictors (AUC 0.975 (0.906, 1.000)), B. A 38,327-feature transcriptomic dataset from 134 AD and 100 controls led to six mRNA-based statistically equivalent signatures via Classification Random Forests with 25 mRNA predictors (AUC 0.846 (0.778, 0.905)) and C. A 9483-feature proteomic dataset from 25 AD and 37 controls led to a protein-based biosignature via Ridge Logistic Regression with seven protein predictors (AUC 0.921 (0.849, 0.972)). These performance metrics were also validated through the JADBIO pipeline confirming stability. In conclusion, using the automated machine learning tool JADBIO, we produced accurate predictive biosignatures extrapolating available low sample -omics data. These results offer options for minimally invasive blood-based diagnostic tests for AD, awaiting clinical validation based on respective laboratory assays. They also highlight the value of AutoML in biomarker discovery.

Karaglani Makrina, Gourlia Krystallia, Tsamardinos Ioannis, Chatzaki Ekaterini


Alzheimer’s disease, blood, classifier, machine learning, predictive model

Pathology Pathology

De-identifying free text of Japanese electronic health records.

In Journal of biomedical semantics ; h5-index 23.0

BACKGROUND : Recently, more electronic data sources are becoming available in the healthcare domain. Electronic health records (EHRs), with their vast amounts of potentially available data, can greatly improve healthcare. Although EHR de-identification is necessary to protect personal information, automatic de-identification of Japanese language EHRs has not been studied sufficiently. This study was conducted to raise de-identification performance for Japanese EHRs through classic machine learning, deep learning, and rule-based methods, depending on the dataset.

RESULTS : Using three datasets, we implemented de-identification systems for Japanese EHRs and compared the de-identification performances found for rule-based, Conditional Random Fields (CRF), and Long-Short Term Memory (LSTM)-based methods. Gold standard tags for de-identification are annotated manually for age, hospital, person, sex, and time. We used different combinations of our datasets to train and evaluate our three methods. Our best F1-scores were 84.23, 68.19, and 81.67 points, respectively, for evaluations of the MedNLP dataset, a dummy EHR dataset that was virtually written by a medical doctor, and a Pathology Report dataset. Our LSTM-based method was the best performing, except for the MedNLP dataset. The rule-based method was best for the MedNLP dataset. The LSTM-based method achieved a good score of 83.07 points for this MedNLP dataset, which differs by 1.16 points from the best score obtained using the rule-based method. Results suggest that LSTM adapted well to different characteristics of our datasets. Our LSTM-based method performed better than our CRF-based method, yielding a 7.41 point F1-score, when applied to our Pathology Report dataset. This report is the first of study applying this LSTM-based method to any de-identification task of a Japanese EHR.

CONCLUSIONS : Our LSTM-based machine learning method was able to extract named entities to be de-identified with better performance, in general, than that of our rule-based methods. However, machine learning methods are inadequate for processing expressions with low occurrence. Our future work will specifically examine the combination of LSTM and rule-based methods to achieve better performance. Our currently achieved level of performance is sufficiently higher than that of publicly available Japanese de-identification tools. Therefore, our system will be applied to actual de-identification tasks in hospitals.

Kajiyama Kohei, Horiguchi Hiromasa, Okumura Takashi, Morita Mizuki, Kano Yoshinobu


De-identification, Electronic health records, Japanese language

General General

Utilizing imbalanced electronic health records to predict acute kidney injury by ensemble learning and time series model.

In BMC medical informatics and decision making ; h5-index 38.0

BACKGROUND : Acute Kidney Injury (AKI) is a shared complication among Intensive Care Unit (ICU), marked by high cost, high morbidity and high mortality. As the early prediction of AKI is critical for patients' outcomes and data mining is such a powerful prediction tool, many AKI prediction models based on machine learning methods have been proposed. Our motivation is inspired by the fact that the incidence of AKI is a changing temporal sequence affected by the joint action of patients' daily drug combinations and their physiological indexes. However, most existing models have not considered such a temporal correlation. Besides, due to great challenges caused by sparse, high-dimensional and highly imbalanced clinical data, it is hard to achieve ideal performance.

METHODS : We develop a fast, simple and less-costly model based on an ensemble learning algorithm, named Ensemble Time Series Model (ETSM). Besides benefiting from vital signs and laboratory results as explicit indicators, ETSM explores the effect of drug combinations as possible implicit indicators for the AKI prediction. The model transforms temporal medication information into a multidimensional vector to consider and measure drug cumulative effects that may cause AKI.

RESULTS : We compare ETSM with state-of-the-art models on ICUC and MIMIC III datasets. On the basis of the experimental results, our model obtains satisfactory performance (ICUC: AUC 24 hours ahead: 0.81, 48 hours ahead: 0.78; MIMIC III: AUC 24 hours ahead: 0.95, 48 hours ahead: 0.95). Meanwhile, we compare the effects of different sampling and feature generation methods on the model performance. In the ablation study, we validate that medication information improves model performance (24 hours ahead: AUC increased from 0.74 to 0.81). We also find that the model's performance is closely related to the balanced level of the derivation dataset. The optimal ratio of major class size to minor class size for the model is found for AKI prediction.

CONCLUSIONS : ETSM is an effective method for the early prediction of AKI. The model verifies that AKI incidence is related to the clinical medication. In comparison with other prediction methods, ETSM provides comparable performance results and better interpretability.

Wang Yuan, Wei Yake, Yang Hao, Li Jingwei, Zhou Yubo, Wu Qin


Acute kidney injury (AKI), Drug combination, ETSM, Ensemble learning, Prediction

General General

Viral pandemic preparedness: A pluripotent stem cell-based machine-learning platform for simulating SARS-CoV-2 infection to enable drug discovery and repurposing.

In Stem cells translational medicine

Infection with the SARS-CoV-2 virus has rapidly become a global pandemic for which we were not prepared. Several clinical trials using previously approved drugs and drug combinations are urgently underway to improve our current situation. Unfortunately, a vaccine option is optimistically at least a year away. It is imperative that for future viral pandemic preparedness, we have a rapid screening technology for drug discovery and repurposing. The primary purpose of this research project was to evaluate the DeepNEU stem-cell based platform by creating and validating computer simulations of artificial lung cells infected with SARS-CoV-2 to enable the rapid identification of antiviral therapeutic targets and drug repurposing. The data generated from this project indicate that (a) human alveolar type lung cells can be simulated by DeepNEU (v5.0), (b) these simulated cells can then be infected with simulated SARS-CoV-2 virus, (c) the unsupervised learning system performed well in all simulations based on available published wet lab data, and (d) the platform identified potentially effective anti-SARS-CoV2 combinations of known drugs for urgent clinical study. The data also suggest that DeepNEU can identify potential therapeutic targets for expedited vaccine development. We conclude that based on published data plus current DeepNEU results, continued development of the DeepNEU platform will improve our preparedness for and response to future viral outbreaks. This can be achieved through rapid identification of potential therapeutic options for clinical testing as soon as the viral genome has been confirmed.

Esmail Sally, Danter Wayne R


DeepNEU, SARS-CoV-2, antiviral, drug discovery and repurposing, pandemic preparedness, unsupervised learning