Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Increasing the Density of Laboratory Measures for Machine Learning Applications.

In Journal of clinical medicine

BACKGROUND : The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications.

METHOD : We analyzed the laboratory measures derived from Geisinger's EHR on patients in three distinct cohorts-patients tested for Clostridioides difficile (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns.

RESULTS : We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for C. difficile infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as -35.5 for the Cdiff, -8.3 for the IBD, and -11.3 for the OA dataset.

CONCLUSIONS : An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.

Abedi Vida, Li Jiang, Shivakumar Manu K, Avula Venkatesh, Chaudhary Durgesh P, Shellenberger Matthew J, Khara Harshit S, Zhang Yanfei, Lee Ming Ta Michael, Wolk Donna M, Yeasin Mohammed, Hontecillas Raquel, Bassaganya-Riera Josep, Zand Ramin


C. difficile infection, EHR, complex diseases, electronic health records, imputation, inflammatory bowel disease, laboratory measures, machine learning, medical informatics, osteoarthritis

Public Health Public Health

A Sentiment Analysis Approach to Predict an Individual's Awareness of the Precautionary Procedures to Prevent COVID-19 Outbreaks in Saudi Arabia.

In International journal of environmental research and public health ; h5-index 73.0

In March 2020, the World Health Organization (WHO) declared the outbreak of Coronavirus disease 2019 (COVID-19) as a pandemic, which affected all countries worldwide. During the outbreak, public sentiment analyses contributed valuable information toward making appropriate public health responses. This study aims to develop a model that predicts an individual's awareness of the precautionary procedures in five main regions in Saudi Arabia. In this study, a dataset of Arabic COVID-19 related tweets was collected, which fell in the period of the curfew. The dataset was processed, based on several machine learning predictive models: Support Vector Machine (SVM), K-nearest neighbors (KNN), and Naïve Bayes (NB), along with the N-gram feature extraction technique. The results show that applying the SVM classifier along with bigram in Term Frequency-Inverse Document Frequency (TF-IDF) outperformed other models with an accuracy of 85%. The results of awareness prediction showed that the south region observed the highest level of awareness towards COVID-19 containment measures, whereas the middle region was the least. The proposed model can support the medical sectors and decision-makers to decide the appropriate procedures for each region based on their attitudes towards the pandemic.

Aljameel Sumayh S, Alabbad Dina A, Alzahrani Norah A, Alqarni Shouq M, Alamoudi Fatimah A, Babili Lana M, Aljaafary Somiah K, Alshamrani Fatima M


Arabic sentiment analysis, K-nearest neighbor, N-gram, Twitter, machine learning, natural language processing, naïve bayes, support vector machine

General General

Designing a Randomized Trial with an Age Simulation Suit-Representing People with Health Impairments.

In Healthcare (Basel, Switzerland)

Due to demographic change, there is an increasing demand for professional care services, whereby this demand cannot be met by available caregivers. To enable adequate care by relieving informal and formal care, the independence of people with chronic diseases has to be preserved for as long as possible. Assistance approaches can be used that support promoting physical activity, which is a main predictor of independence. One challenge is to design and test such approaches without affecting the people in focus. In this paper, we propose a design for a randomized trial to enable the use of an age simulation suit to generate reference data of people with health impairments with young and healthy participants. Therefore, we focus on situations of increased physical activity.

Timm Ingo J, Spaderna Heike, Rodermund Stephanie C, Lohr Christian, Buettner Ricardo, Berndt Jan Ole


experimental design, fitness tracker, physical activity, questionnaires

General General

Probabilistic Predictions with Federated Learning.

In Entropy (Basel, Switzerland)

Probabilistic predictions with machine learning are important in many applications. These are commonly done with Bayesian learning algorithms. However, Bayesian learning methods are computationally expensive in comparison with non-Bayesian methods. Furthermore, the data used to train these algorithms are often distributed over a large group of end devices. Federated learning can be applied in this setting in a communication-efficient and privacy-preserving manner but does not include predictive uncertainty. To represent predictive uncertainty in federated learning, our suggestion is to introduce uncertainty in the aggregation step of the algorithm by treating the set of local weights as a posterior distribution for the weights of the global model. We compare our approach to state-of-the-art Bayesian and non-Bayesian probabilistic learning algorithms. By applying proper scoring rules to evaluate the predictive distributions, we show that our approach can achieve similar performance as the benchmark would achieve in a non-distributed setting.

Thorgeirsson Adam Thor, Gauterin Frank


Bayesian deep learning, federated learning, predictive uncertainty, probabilistic machine learning

Radiology Radiology

Development and Validation of an Automated Radiomic CT Signature for Detecting COVID-19.

In Diagnostics (Basel, Switzerland)

The coronavirus disease 2019 (COVID-19) outbreak has reached pandemic status. Drastic measures of social distancing are enforced in society and healthcare systems are being pushed to and beyond their limits. To help in the fight against this threat on human health, a fully automated AI framework was developed to extract radiomics features from volumetric chest computed tomography (CT) exams. The detection model was developed on a dataset of 1381 patients (181 COVID-19 patients plus 1200 non COVID control patients). A second, independent dataset of 197 RT-PCR confirmed COVID-19 patients and 500 control patients was used to assess the performance of the model. Diagnostic performance was assessed by the area under the receiver operating characteristic curve (AUC). The model had an AUC of 0.882 (95% CI: 0.851-0.913) in the independent test dataset (641 patients). The optimal decision threshold, considering the cost of false negatives twice as high as the cost of false positives, resulted in an accuracy of 85.18%, a sensitivity of 69.52%, a specificity of 91.63%, a negative predictive value (NPV) of 94.46% and a positive predictive value (PPV) of 59.44%. Benchmarked against RT-PCR confirmed cases of COVID-19, our AI framework can accurately differentiate COVID-19 from routine clinical conditions in a fully automated fashion. Thus, providing rapid accurate diagnosis in patients suspected of COVID-19 infection, facilitating the timely implementation of isolation procedures and early intervention.

Guiot Julien, Vaidyanathan Akshayaa, Deprez Louis, Zerka Fadila, Danthine Denis, Frix Anne-Noëlle, Thys Marie, Henket Monique, Canivet Gregory, Mathieu Stephane, Eftaxia Evanthia, Lambin Philippe, Tsoutzidis Nathan, Miraglio Benjamin, Walsh Sean, Moutschen Michel, Louis Renaud, Meunier Paul, Vos Wim, Leijenaar Ralph T H, Lovinfosse Pierre


COVID-19, artificial intelligence, computed tomography, machine learning, radiomics

Radiology Radiology

Factors associated with worsening oxygenation in patient with non-severe COVID-19 pneumonia.

In Tuberculosis and respiratory diseases

Background : This study aimed to determine parameters for worsening oxygenation in non-severe COVID-19 pneumonia.

Methods : This retrospective cohort study included confirmed COVID-19 pneumonia in a public hospital in South Korea. The worsening oxygenation group was defined as those with SpO2 ≤ 94%, or received oxygen or mechanical ventilation (MV) throughout the clinical course versus the non-worsening group who were without any respiratory event. Parameters were compared, and the extent of viral pneumonia from an initial chest CT were calculated using artificial intelligence (AI) and measured visually by a radiologist.

Results : We included 136 patients with 32 (23.5%) in the worsening oxygenation group, of whom two needed MV and one died. Initial vital signs and duration of symptoms showed no difference between the two groups, however, univariate logistic regression analysis revealed that a variety of parameters at admission were associated with an increased risk of a desaturation event. A subset of patients were studied to eliminate potential bias, that ferritin ≥ 280 μg/L (p=0.029), LDH ≥ 240 U/L (p=0.029), pneumonia volume (p=0.021), and extent (p=0.030) by AI, and visual severity scores (p=0.042) were the predictive parameters for worsening oxygenation in a sex-, age-, and comorbid illness-matched case-control study using propensity score (n=52).

Conclusion : Our study presents initial CT evaluated by AI or visual severity scoring as well as serum markers of inflammation at admission are significantly associated with worsening oxygenation in this COVID-19 pneumonia cohort.

Hahm Cho Rom, Lee Young Kyung, Oh Dong Hyun, Ahn Mi Young, Choi Jae-Phil, Kang Na Ree, Oh Jungkyun, Choi Hanzo, Kim Suhyun


COVID-19, Computed tomography, Oxygenation, Pneumonia, artificial intelligence