Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Identification of Latent Risk Clinical Attributes for Children Born Under IUGR Condition Using Machine Learning Techniques.

In Computer methods and programs in biomedicine

BACKGROUND AND OBJECTIVE : Intrauterine Growth Restriction (IUGR) is a condition in which a fetus does not grow to the expected weight during pregnancy. There are several well documented causes in the literature for this issue, such as maternal disorder, and genetic influences. Nevertheless, besides the risk during pregnancy and labour periods, in a long term perspective, the impact of IUGR condition during the child development is an area of research itself. The main objective of this work is to propose a machine learning solution to identify the most significant features of importance based on physiological, clinical or socioeconomic factors correlated with previous IUGR condition after 10 years of birth.

METHODS : In this work, 41 IUGR (18 male) and 34 Non-IUGR (22 male) children were followed up 9 years after the birth, in average (9.1786 ± 0.6784 years old). A group of machine learning algorithms is proposed to classify children previously identified as born under IUGR condition based on 24-hours monitoring of ECG (Holter) and blood pressure (ABPM), and other clinical and socioeconomic attributes. In additional, an algorithm of relevance determination based on the classifier is also proposed, to determine the level of importance of the considered features.

RESULTS : The proposed classification solution achieved accuracy up to 94.73%, and better performance than seven state-of-the-art machine learning algorithms. Also, relevant latent factors related to HRV and BP monitoring are proposed, such as: day-time heart rate (day-time HR), day-night systolic blood pressure (day-night SBP), 24-hour standard deviation (SD) of SBP, dropped, morning cortisol creatinine, 24-hour mean of SDs of all NN intervals for each 5 minutes segment (24-hour SDNNi), among others.

CONCLUSION : With outstanding accuracy of our proposed solutions, the classification system and the indication of relevant attributes may support medical teams on the clinical monitoring of IUGR children during their childhood development.

Nguyen Van Sau, Lobo Marques J A, Biala T A, Li Ye


ABPM (Ambulatory Blood Pressure Monitoring), HRV (Heart Rate Variability), IUGR (Intrauterine Growth Restriction), Machine Learning

Public Health Public Health

Natural language processing and entrustable professional activity text feedback in surgery: A machine learning model of resident autonomy.

In American journal of surgery

BACKGROUND : Entrustable Professional Activities (EPAs) contain narrative 'entrustment roadmaps' designed to describe specific behaviors associated with different entrustment levels. However, these roadmaps were created using expert committee consensus, with little data available for guidance. Analysis of actual EPA assessment narrative comments using natural language processing may enhance our understanding of resident entrustment in actual practice.

METHODS : All text comments associated with EPA microassessments at a single institution were combined. EPA-entrustment level pairs (e.g. Gallbladder Disease-Level 1) were identified as documents. Latent Dirichlet Allocation (LDA), a common machine learning algorithm, was used to identify latent topics in the documents associated with a single EPA. These topics were then reviewed for interpretability by human raters.

RESULTS : Over 18 months, 1015 faculty EPA microassessments were collected from 64 faculty for 80 residents. LDA analysis identified topics that mapped 1:1 to EPA entrustment levels (Gammas >0.99). These LDA topics appeared to trend coherently with entrustment levels (words demonstrating high entrustment were consistently found in high entrustment topics, word demonstrating low entrustment were found in low entrustment topics).

CONCLUSIONS : LDA is capable of identifying topics relevant to progressive surgical entrustment and autonomy in EPA comments. These topics provide insight into key behaviors that drive different level of resident autonomy and may allow for data-driven revision of EPA entrustment maps.

Stahl Christopher C, Jung Sarah A, Rosser Alexandra A, Kraut Aaron S, Schnapp Benjamin H, Westergaard Mary, Hamedani Azita G, Minter Rebecca M, Greenberg Jacob A


Assessment, Entrustable professional activities, Feedback, Natural language processing, Surgery education

Public Health Public Health

Trends and influencing factors of plasma folate levels in Chinese women at mid-pregnancy, late pregnancy, and lactation periods.

In The British journal of nutrition

Folate status for women during early pregnancy has been investigated, but data for women during mid-pregnancy, late pregnancy, or lactation are sparse or lacking. Between May and July 2014, we conducted a cross-sectional study in 1211 pregnant and lactating women from three representative regions in China. Approximately 135 women were enrolled in each stratum by physiologic periods (mid-pregnancy, late pregnancy, or lactation) and regions (south, central, or north). Plasma folate concentrations were measured by microbiological assay. The adjusted medians (interquartile ranges [IQR]) of folate concentration decreased from 28.8 (19.9 - 38.2) nmol/L in mid-pregnancy to 18.6 (13.2 - 26.4) nmol/L in late pregnancy, and to 17.0 (12.3 - 22.5) nmol/L in lactation (P for trend <0.001). Overall, lower folate concentrations were more likely to be observed in women residing in northern region, with younger age, higher pre-pregnancy BMI, lower education or multiparity, and in lactating women underwent a caesarean delivery or breastfed exclusively. In total, 380 (31.4%) women had a suboptimal folate status (folate concentration <13.5 nmol/L). Women of being at late pregnancy and lactation, residing in northern region, having multiparity and low education level had a higher risk of suboptimal folate status, while those with older age had a lower risk. In conclusion, maternal plasma folate concentrations decreased as pregnancy progressed, and influenced by geographic region and maternal social-demographic characteristics. Future studies are warranted to assess the necessity of folic acid supplementation during later pregnancy and lactation especially for women at a higher risk of folate depletion.

Zhou Yu-Bo, Si Ke-Yi, Li Hong-Tian, Li Xiu-Cui, Meng Ying, Liu Jian-Meng


China, influencing factor, lactation, late pregnancy, mid-pregnancy, plasma folate

oncology Oncology

A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles.

In Journal of translational medicine

BACKGROUND : Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagnosis in various pathologies. We propose a novel machine learning-based clinical tool for predicting muscle disease subtypes using multi-cohort microarray expression data.

MATERIALS AND METHODS : Muscle tissue samples originating from 1260 patients with muscle weakness. Data was curated from 42 independent cohorts with expression profiles in public microarray gene expression repositories, which represent a broad range of patient ages and peripheral muscles. Cohorts were categorized into five muscle disease subtypes: immobility, inflammatory myopathies, intensive care unit acquired weakness (ICUAW), congenital, and chronic systemic disease. The data contains expression data on 34,099 genes. Data augmentation techniques were used to address class imbalances in the muscle disease subtypes. Support vector machine (SVM) models were trained on two-thirds of the 1260 samples based on the top selected gene signature using analysis of variance (ANOVA). The model was validated in the remaining samples using area under the receiver operator curve (AUC). Gene enrichment analysis was used to identify enriched biological functions in the gene signature.

RESULTS : The AUC ranges from 0.611 to 0.649 in the observed imbalanced data. Overall, using the augmented data, chronic systemic disease was the best predicted class with AUC 0.872 (95% confidence interval (CI): 0.824-0.920). The least discriminated classes were ICUAW with AUC 0.777 (95% CI: 0.668-0.887) and immobility with AUC 0.789 (95% CI: 0.716-0.861). Disease-specific gene set enrichment results showed that the gene signature was enriched in biological processes including neural precursor cell proliferation for ICUAW and aerobic respiration for congenital (false discovery rate q-value < 0.001).

CONCLUSION : Our results present a well-performing molecular classification tool with the selected gene markers for muscle disease classification. In practice, this tool addresses an important gap in the literature on myopathies and presents a potentially useful clinical tool for muscle disease subtype diagnosis.

Tran Andrew, Walsh Chris J, Batt Jane, Dos Santos Claudia C, Hu Pingzhao


Biomarker, Clinical tool, Machine learning, Microarray, Muscle diseases

General General

Fully‑automated deep‑learning segmentation of pediatric cardiovascular magnetic resonance of patients with complex congenital heart diseases.

In Journal of cardiovascular magnetic resonance : official journal of the Society for Cardiovascular Magnetic Resonance

BACKGROUND : For the growing patient population with congenital heart disease (CHD), improving clinical workflow, accuracy of diagnosis, and efficiency of analyses are considered unmet clinical needs. Cardiovascular magnetic resonance (CMR) imaging offers non-invasive and non-ionizing assessment of CHD patients. However, although CMR data facilitates reliable analysis of cardiac function and anatomy, clinical workflow mostly relies on manual analysis of CMR images, which is time consuming. Thus, an automated and accurate segmentation platform exclusively dedicated to pediatric CMR images can significantly improve the clinical workflow, as the present work aims to establish.

METHODS : Training artificial intelligence (AI) algorithms for CMR analysis requires large annotated datasets, which are not readily available for pediatric subjects and particularly in CHD patients. To mitigate this issue, we devised a novel method that uses a generative adversarial network (GAN) to synthetically augment the training dataset via generating synthetic CMR images and their corresponding chamber segmentations. In addition, we trained and validated a deep fully convolutional network (FCN) on a dataset, consisting of [Formula: see text] pediatric subjects with complex CHD, which we made publicly available. Dice metric, Jaccard index and Hausdorff distance as well as clinically-relevant volumetric indices are reported to assess and compare our platform with other algorithms including U-Net and cvi42, which is used in clinics.

RESULTS : For congenital CMR dataset, our FCN model yields an average Dice metric of [Formula: see text] and [Formula: see text] for LV at end-diastole and end-systole, respectively, and [Formula: see text] and [Formula: see text] for RV at end-diastole and end-systole, respectively. Using the same dataset, the cvi42, resulted in [Formula: see text], [Formula: see text], [Formula: see text] and [Formula: see text] for LV and RV at end-diastole and end-systole, and the U-Net architecture resulted in [Formula: see text], [Formula: see text], [Formula: see text] and [Formula: see text] for LV and RV at end-diastole and end-systole, respectively.

CONCLUSIONS : The chambers' segmentation results from our fully-automated method showed strong agreement with manual segmentation and no significant statistical difference was found by two independent statistical analyses. Whereas cvi42 and U-Net segmentation results failed to pass the t-test. Relying on these outcomes, it can be inferred that by taking advantage of GANs, our method is clinically relevant and can be used for pediatric and congenital CMR segmentation and analysis.

Karimi-Bidhendi Saeed, Arafati Arghavan, Cheng Andrew L, Wu Yilei, Kheradvar Arash, Jafarkhani Hamid


CMR image analysis, Complex CHD analysis, Deep learning, Fully convolutional networks, Generative adversarial networks, Machine learning

Pathology Pathology

Development and validation of a 25-Gene Panel urine test for prostate cancer diagnosis and potential treatment follow-up.

In BMC medicine ; h5-index 89.0

BACKGROUND : Heterogeneity of prostate cancer (PCa) contributes to inaccurate cancer screening and diagnosis, unnecessary biopsies, and overtreatment. We intended to develop non-invasive urine tests for accurate PCa diagnosis to avoid unnecessary biopsies.

METHODS : Using a machine learning program, we identified a 25-Gene Panel classifier for distinguishing PCa and benign prostate. A non-invasive test using pre-biopsy urine samples collected without digital rectal examination (DRE) was used to measure gene expression of the panel using cDNA preamplification followed by real-time qRT-PCR. The 25-Gene Panel urine test was validated in independent multi-center retrospective and prospective studies. The diagnostic performance of the test was assessed against the pathological diagnosis from biopsy by discriminant analysis. Uni- and multivariate logistic regression analysis was performed to assess its diagnostic improvement over PSA and risk factors. In addition, the 25-Gene Panel urine test was used to identify clinically significant PCa. Furthermore, the 25-Gene Panel urine test was assessed in a subset of patients to examine if cancer was detected after prostatectomy.

RESULTS : The 25-Gene Panel urine test accurately detected cancer and benign prostate with AUC of 0.946 (95% CI 0.963-0.929) in the retrospective cohort (n = 614), AUC of 0.901 (0.929-0.873) in the prospective cohort (n = 396), and AUC of 0.936 (0.956-0.916) in the large combination cohort (n = 1010). It greatly improved diagnostic accuracy over PSA and risk factors (p < 0.0001). When it was combined with PSA, the AUC increased to 0.961 (0.980-0.942). Importantly, the 25-Gene Panel urine test was able to accurately identify clinically significant and insignificant PCa with AUC of 0.928 (95% CI 0.947-0.909) in the combination cohort (n = 727). In addition, it was able to show the absence of cancer after prostatectomy with high accuracy.

CONCLUSIONS : The 25-Gene Panel urine test is the first highly accurate and non-invasive liquid biopsy method without DRE for PCa diagnosis. In clinical practice, it may be used for identifying patients in need of biopsy for cancer diagnosis and patients with clinically significant cancer for immediate treatment, and potentially assisting cancer treatment follow-up.

Johnson Heather, Guo Jinan, Zhang Xuhui, Zhang Heqiu, Simoulis Athanasios, Wu Alan H B, Xia Taolin, Li Fei, Tan Wanlong, Johnson Allan, Dizeyi Nishtman, Abrahamsson Per-Anders, Kenner Lukas, Feng Xiaoyan, Zou Chang, Xiao Kefeng, Persson Jenny L, Chen Lingwu


Clinically significant prostate cancer, Gene Panel, Prostate cancer, Prostate cancer diagnosis, Prostate cancer treatment follow-up, Urine test