Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

DeepMiceTL: a deep transfer learning based prediction of mice cardiac conduction diseases using early electrocardiograms.

In Briefings in bioinformatics
Cardiac conduction disease is a major cause of morbidity and mortality worldwide. There is considerable clinical significance and an emerging need of early detection of these diseases for preventive treatment success before more severe arrhythmias occur. However, developing such early screening tools is challenging due to the lack of early electrocardiograms (ECGs) before symptoms occur in patients. Mouse models are widely used in cardiac arrhythmia research. The goal of this paper is to develop deep learning models to predict cardiac conduction diseases in mice using their early ECGs. We hypothesize that mutant mice present subtle abnormalities in their early ECGs before severe arrhythmias present. These subtle patterns can be detected by deep learning though they are hard to be identified by human eyes. We propose a deep transfer learning model, DeepMiceTL, which leverages knowledge from human ECGs to learn mouse ECG patterns. We further apply the Bayesian optimization and $k$-fold cross validation methods to tune the hyperparameters of the DeepMiceTL. Our results show that DeepMiceTL achieves a promising performance (F1-score: 83.8%, accuracy: 84.8%) in predicting the occurrence of cardiac conduction diseases using early mouse ECGs. This study is among the first efforts that use state-of-the-art deep transfer learning to identify ECG patterns during the early course of cardiac conduction disease in mice. Our approach not only could help in cardiac conduction disease research in mice, but also suggest a feasibility for early clinical diagnosis of human cardiac conduction diseases and other types of cardiac arrythmias using deep transfer learning in the future.
Liao Ying, Xiang Yisha, Zheng Mingjie, Wang Jun

2023-Mar-18

Bayesian optimization, cardiac conduction disease, deep transfer learning, electrocardiogram (ECG), mouse model

oncology

Oncology

Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review.

In Journal of clinical epidemiology ; h5-index 60.0

BACKGROUND : In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualised risk prediction.

STUDY DESIGN AND SETTING : We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 01/01/2019 and 05/09/2019. We used existing spin frameworks and described areas of highly suggestive spin practices.

RESULTS : We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion or conclusion.

CONCLUSION : The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.

Dhiman Paula, Ma Jie, Andaur Navarro Constanza L, Speich Benjamin, Bullock Garrett, Damen Johanna Aa, Hooft Lotty, Kirtley Shona, Riley Richard D, Van Calster Ben, Moons Karel Gm, Collins Gary S

2023-Mar-17

machine learning, prediction model, spin

General

General

Utilizing Shared Frailty with the Cox Proportional Hazards Regression: Post Discharge Survival Analysis of CHF Patients.

In Journal of biomedical informatics ; h5-index 55.0
Understanding patients' survival probability as well as the factors affecting it constitute a significant concern for researchers and practitioners, in particular for patients with severe chronic illnesses such as congestive heart failure (CHF). CHF is a clinical syndrome characterized by comorbidities and adverse medical events. Risk stratification to identify patients most likely to die shortly after hospital discharge can improve the quality of care by better allocating organizational resources and personalized interventions. Probability assessment improves clinical decision-making, contributes to personalized care, and saves costs. Although one of the most informative indices is the time to an adverse event for each patient, commonly analyzed using survival analysis methods, these are often challenging to implement due to the complexity of the medical data. Numerous studies have used the Cox proportional hazards (PH) regression method to generate the survival distribution pattern and factors affecting survival. This model, although advantageous for survival analysis, assumes the homogeneity of the hazard ratio across patients and independence of the observations in terms of survival time. These assumptions are often violated in real-world data, especially when the dataset is composed of readmission data for chronically ill patients, since these recurring observations are inherently dependent. This study ran the Cox PH regression on a feature set selected by machine learning algorithms from a rich hospital dataset. The event modeled here was patient mortality within 90 days post-hospital discharge. The sample was composed of medical records of patients hospitalized in the Israeli Sheba Medical Center more than once, with CHF as the primary diagnosis. We modeled the survival of CHF patients using the Cox PH regression with and without the shared frailty correction that addresses the shortcomings of the Cox Model. The results of the two models of the Cox PH regression - with and without the shared frailty correction were compared. The results demonstrate that the shared frailty correction, which was statistically significant in our analysis, improved the performance of the basic Cox PH model. While this is the main contribution, we also show that this model outperforms two commonly used measures (ADHERE and EFFECT) for predicting early mortality of CHF patients. Thus, the results illustrate how applying advanced analytics can outperform traditional methods. An additional contribution is the feature set selected using machine-learning methods that is different from those used in the extant literature.
Ben-Assuli Ofir, Ramon-Gonen Roni, Heart Tsipi, Jacobi Arie, Klempfner Robert

2023-Mar-17

Cox proportional hazards regression, congestive heart failure, shared frailty, statistical methods, survival analysis

Internal Medicine

Internal Medicine

Representing and Utilizing Clinical Textual Data for Real World Studies: An OHDSI Approach.

In Journal of biomedical informatics ; h5-index 55.0
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.
Keloth Vipina K, Banda Juan M, Gurley Michael, Heider Paul M, Kennedy Georgina, Liu Hongfang, Liu Feifan, Miller Timothy, Natarajan Karthik, V Patterson Olga, Peng Yifan, Raja Kalpana, Reeves Ruth M, Rouhizadeh Masoud, Shi Jianlin, Wang Xiaoyan, Wang Yanshan, Wei Wei-Qi, Williams Andrew E, Zhang Rui, Belenkaya Rimma, Reich Christian, Blacketer Clair, Ryan Patrick, Hripcsak George, Elhadad Noémie, Xu Hua

2023-Mar-17

Electronic health records, Natural language processing, Real-world study

General

General

A deep learning-based novel approach to generate continuous daily stream nitrate concentration for nitrate data-sparse watersheds.

In The Science of the total environment
High-frequency stream nitrate concentration provides critical insights into nutrient dynamics and can help to improve the effectiveness of management decisions to maintain a sustainable ecosystem. However, nitrate monitoring is conventionally conducted through lab analysis using in situ water samples and is typically at coarse temporal resolution. In the last decade, many agencies started collecting high-frequency (5-60 min intervals) nitrate data using optical sensors. The hypothesis of the study is that the data-driven models can learn the trend and temporal variability in nitrate concentration from high-frequency sensor-based nitrate data in the region and generate continuous nitrate data for unavailable data periods and data-limited locations. A Long Short-Term Memory (LSTM) model-based framework was developed to estimate continuous daily stream nitrate for dozens of gauge locations in Iowa, USA. The promising results supported the hypothesis; the LSTM model demonstrated median test-period Nash-Sutcliffe efficiency (NSE) = 0.75 and RMSE = 1.53 mg/L for estimating continuous daily nitrate concentration in 42 sites, which are unprecedented performance levels. Twenty-one sites (50 % of all sites) and thirty-four sites (76 % of all sites) demonstrated NSE >0.75 and 0.50, respectively. The average nitrate concentration of neighboring sites was identified as a crucial determinant of continuous daily nitrate concentration. Seasonal model performance evaluation showed that the model performed effectively in the summer and fall seasons. About 26 sites showed correlations >0.60 between estimated nitrate concentration and discharge. The concentration-discharge (c-Q) relationship analysis showed that the study watersheds had four dominant nitrate transport patterns from landscapes to streams with increasing discharge, including the flushing pattern being the most dominant one. Stream nitrate estimation impedes due to data inadequacy. The modeling framework can be used to generate temporally continuous nitrate at nitrate data-limited regions with a nearby sensor-based nitrate gauge. Watershed planners and policymakers could utilize the continuous nitrate data to gain more information on the regional nitrate status and design conservation practices accordingly.
Saha Gourab, Rahmani Farshid, Shen Chaopeng, Li Li, Raj Cibin

2023-Mar-17

C-Q relationship, Deep learning, LSTM, Machine learning, Stream nitrate modeling, Water quality

General

General

Assessment of removal rate coefficient in vertical flow constructed wetland employing machine learning for low organic loaded systems.

In Bioresource technology
Secondary datasets of 42 low organic loading Vertical flow constructed wetlands (LOLVFCWs) were assessed to optimize their area requirements for N and P (nutrients) removal. Significant variations in removal rate coefficients (k₂₀) (0.002-0.464 m d^-1) indicated scope for optimization. Data classification based on nitrogen loading rate, temperature and depth could reduce the relative standard deviations of the k₂₀ values only in some cases. As an alternative method of deriving k₂₀ values, the effluent concentrations of the targeted pollutants were predicted using two machine learning approaches, MLR and SVR. The latter was found to perform better (R² =0.87-0.9; RMSE = 0.08-3.64) as validated using primary data of a lab-scale VFCW. The generated model equations for predicting effluent parameters and computing corresponding k₂₀ values can assist in a customized design for nutrient removal employing minimal surface area for such systems for attaining the desired standards.
Soti Abhishek, Singh Saurabh, Verma Vishesh, Mohan Kulshreshtha Niha, Brighu Urmila, Kalbar Pradip, Bhushan Gupta Akhilendra

2023-Mar-17

Kikuth kinetic approach, low organic loading vertical flow constructed wetland, remediation efficiency, removal rate coefficient