Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Public Health Public Health

Machine Learning-Based DNA Methylation Score for Fetal Exposure to Maternal Smoking: Development and Validation in Samples Collected from Adolescents and Adults.

In Environmental health perspectives ; h5-index 89.0

BACKGROUND : Fetal exposure to maternal smoking during pregnancy is associated with the development of noncommunicable diseases in the offspring. Maternal smoking may induce such long-term effects through persistent changes in the DNA methylome, which therefore hold the potential to be used as a biomarker of this early life exposure. With declining costs for measuring DNA methylation, we aimed to develop a DNA methylation score that can be used on adolescent DNA methylation data and thereby generate a score for in utero cigarette smoke exposure.

METHODS : We used machine learning methods to create a score reflecting exposure to maternal smoking during pregnancy. This score is based on peripheral blood measurements of DNA methylation (Illumina's Infinium HumanMethylation450K BeadChip). The score was developed and tested in the Raine Study with data from 995 white 17-y-old participants using 10-fold cross-validation. The score was further tested and validated in independent data from the Northern Finland Birth Cohort 1986 (NFBC1986) (16-y-olds) and 1966 (NFBC1966) (31-y-olds). Further, three previously proposed DNA methylation scores were applied for comparison. The final score was developed with 204 CpGs using elastic net regression.

RESULTS : Sensitivity and specificity values for the best performing previously developed classifier ("Reese Score") were 88% and 72% for Raine, 87% and 61% for NFBC1986 and 72% and 70% for NFBC1966, respectively; corresponding figures using the elastic net regression approach were 91% and 76% (Raine), 87% and 75% (NFBC1986), and 72% and 78% for NFBC1966.

CONCLUSION : We have developed a DNA methylation score for exposure to maternal smoking during pregnancy, outperforming the three previously developed scores. One possible application of the current score could be for model adjustment purposes or to assess its association with distal health outcomes where part of the effect can be attributed to maternal smoking. Further, it may provide a biomarker for fetal exposure to maternal smoking.

Rauschert Sebastian, Melton Phillip E, Heiskala Anni, Karhunen Ville, Burdge Graham, Craig Jeffrey M, Godfrey Keith M, Lillycrop Karen, Mori Trevor A, Beilin Lawrence J, Oddy Wendy H, Pennell Craig, Järvelin Marjo-Riitta, Sebert Sylvain, Huang Rae-Chi


Radiology Radiology

Machine Learning-Based MRI Texture Analysis to Predict the Histologic Grade of Oral Squamous Cell Carcinoma.

In AJR. American journal of roentgenology

OBJECTIVE. This study aimed to explore the performance of machine learning (ML)-based MRI texture analysis in discriminating between well-differentiated (WD) oral squamous cell carcinoma (OSCC) and moderately or poorly differentiated OSCC. MATERIALS AND METHODS. The study enrolled 80 patients with pathologically confirmed OSCC (18 WD OSCCs and 62 moderately or poorly differentiated OSCCs) who underwent pretreatment MRI. ROIs were manually delineated to cover the entire tumor to the greatest possible extent on T2-weighted imaging and contrast-enhanced T1-weighted imaging, and 1118 texture features were extracted. Dimension reduction was performed using reproducibility analysis by two radiologists, collinearity analysis, and feature selection with a minimum-redundancy maximum-relevance algorithm. Models were created using random forest (RF), artificial neural network, and logistic regression (LR) alone and with a synthetic minority oversampling technique (SMOTE). Classifier performance was assessed using 10-fold cross-validation. RESULTS. Dimension reduction steps yielded eight texture features, including four features from each sequence. None of the clinical variables was selected. Among the eight texture features, five and seven texture features showed significant differences between the two groups in the actual data and balanced data, respectively (p < 0.05). All classifiers with SMOTE achieved better performances than those alone. The RF classifier with SMOTE achieved the best performance with an area under the ROC curve of 0.936 and accuracy of 86.3%. CONCLUSION. ML-based MRI texture analysis provides a promising noninvasive approach for predicting the histologic grade of OSCC.

Ren Jiliang, Qi Meng, Yuan Ying, Duan Shaofeng, Tao Xiaofeng


MRI, head and neck cancer, machine learning, texture analysis

Surgery Surgery

Machine Learning Predicts the Fall Risk of Total Hip Arthroplasty Patients Based on Wearable Sensor Instrumented Performance Tests.

In The Journal of arthroplasty ; h5-index 65.0

BACKGROUND : The prevalence of falls affects the wellbeing of aging adults and places an economic burden on the healthcare system. Integration of wearable sensors into existing fall risk assessment tools enables objective data collection that describes the functional ability of patients. In this study, supervised machine learning was applied to sensor-derived metrics to predict the fall risk of patients following total hip arthroplasty.

METHODS : At preoperative, 2-week, and 6-week postoperative appointments, patients (n = 72) were instrumented with sensors while they performed the timed-up-and-go walking test. Preoperative and 2-week postoperative data were used to form the feature sets and 6-week total times were used as labels. Support vector machine and linear discriminant analysis classifier models were developed and tested on various combinations of feature sets and feature reduction schemes. Using a 10-fold leave-some-subjects-out testing scheme, the accuracy, sensitivity, specificity, and area under the receiver-operator curve (AUC) were evaluated for all models.

RESULTS : A high performance model (accuracy = 0.87, sensitivity = 0.97, specificity = 0.46, AUC = 0.82) was obtained with a support vector machine classifier using sensor-derived metrics from only the preoperative appointment. An overall improved performance (accuracy = 0.90, sensitivity = 0.93, specificity = 0.59, AUC = 0.88) was achieved with a linear discriminant analysis classifier when 2-week postoperative data were added to the preoperative data.

CONCLUSION : The high accuracy of the fall risk prediction models is valuable for patients, clinicians, and the healthcare system. High-risk patients can implement preventative measures and low-risk patients can be directed to enhanced recovery care programs.

Polus Jennifer S, Bloomfield Riley A, Vasarhelyi Edward M, Lanting Brent A, Teeter Matthew G


fall risk, machine learning, timed-up-and-go test, total hip arthroplasty, wearable sensors

Public Health Public Health

Infodemiological study to understand the community risk perceptions of COVID-19 outbreak in South Korea.

In Journal of medical Internet research ; h5-index 88.0

BACKGROUND : South Korea is among the best-performing countries in tackling the coronavirus pandemic by utilizing mass drive-through testing, facemasks use, and extensive social distancing. However, understanding the patterns of risk perception could also facilitate effective risk communication to minimize the impacts of disease spread during this crisis.

OBJECTIVE : We attempted to explore patterns of community health risk perceptions of COVID-19 in South Korea using Internet search data.

METHODS : Google Trends (GT) and NAVER relative search volumes (RSVs) data were collected using COVID-19-related terms in the Korean language and were retrieved according to time, gender, age groups, types of device, and location. Online queries were compared to the number of daily new COVID-19 cases and tests reported in the Kaggle open-access dataset for time period of December 5, 2019 to May 31, 2020. Spearman's rank correlation coefficients were employed to assess whether correlations between new COVID-19 cases and Internet searches were affected by time. We also constructed a prediction model of new COVID-19 cases using the number of COVID-19 cases, tests, GT, and NAVER RSVs in lag periods (of 3 to 1 days). Single and multiple regressions were employed using backward elimination and a variance inflation factor (VIF) of <5.

RESULTS : Numbers of COVID-19-related queries in South Korea increased during local events including local transmission, approval of coronavirus test kits, implementation of coronavirus drive-through tests, a facemask shortage, and a widespread campaign for social distancing as well as during international events such as the announcement of a Public Health Emergency of International Concern by the World Health Organization. Online queries were also stronger in women (r=0.763~0.823; p<0.05), and age groups of ≤29 (r=0.726~0.821; p<0.05), 30~44 (r=0.701~0.826; p<0.05), and ≥50 years (r=0.706~0.725; p<0.05). In terms of spatial distribution, GT and NAVER RSVs were higher in affected areas. Moreover, greater correlations were found in mobile searches (r=0.704~0.804; p<0.05) compared to those of desktop searches (r=0.705~0.717; p<0.05), indicating changing behaviors in searching for online health information during the outbreak. Those varied Internet searches related to COVID-19 represented community health risk perceptions. In addition, as a country with a high number of coronavirus tests, results showed that adults perceived coronavirus test-related information as being more important than disease-related knowledge. Meanwhile, younger and older age groups had different perceptions. Moreover, NAVER RSVs can potentially be used for health risk perception assessments and disease predictions. Adding COVID-19-related searches provided by NAVER could increase the performance of the model compared to that of the COVID-19 case-based model and potentially be used to predict epidemic curves.

CONCLUSIONS : The use of both GT and NAVER RSVs to explore patterns of community health risk perceptions could be beneficial for targeting risk communication from several perspectives, including time, population characteristics, and location.


Husnayain Atina, Shim Eunha, Fuad Anis, Su Emily Chia-Yu


Pathology Pathology

RCNN for Region of Interest Detection in Whole Slide Images

ArXiv Preprint

Digital pathology has attracted significant attention in recent years. Analysis of Whole Slide Images (WSIs) is challenging because they are very large, i.e., of Giga-pixel resolution. Identifying Regions of Interest (ROIs) is the first step for pathologists to analyse further the regions of diagnostic interest for cancer detection and other anomalies. In this paper, we investigate the use of RCNN, which is a deep machine learning technique, for detecting such ROIs only using a small number of labelled WSIs for training. For experimentation, we used real WSIs from a public hospital pathology service in Western Australia. We used 60 WSIs for training the RCNN model and another 12 WSIs for testing. The model was further tested on a new set of unseen WSIs. The results show that RCNN can be effectively used for ROI detection from WSIs.

A Nugaliyadde, Kok Wai Wong, Jeremy Parry, Ferdous Sohel, Hamid Laga, Upeka V. Somaratne, Chris Yeomans, Orchid Foster


General General

Predicting fine spatial scale traffic noise using mobile measurements and machine learning.

In Environmental science & technology ; h5-index 132.0

Environmental noise has been associated with a variety of health endpoints including cardiovascular disease, sleep disturbance, depression, and psychosocial stress. Most population noise exposure comes from vehicular traffic, which has large fine-scale spatial variability that is difficult to characterize using traditional fixed-site measurement techniques. To address this challenge, we collected A-weighted, equivalent noise (LAeq in decibels, dB) data on hour-long foot journeys around 16 locations throughout Long Beach, CA, and trained four machine learning models, linear regression, random forest, extreme gradient boosting, and a neural network to predict noise with 20 m resolution. Input variables to the models included traffic metrics, road network features, meteorological conditions, and land use type. Among all machine learning models, extreme gradient boosting had the best results in validation tests (leave-one-route-out R2 = 0.71, root mean square error (RMSE) 4.54 dB; 5-fold R2 = 0.96, RMSE 1.8 dB). Local traffic volume was the most important predictor of noise; road features, land use, and meteorology including humidity, temperature, and wind speed also contributed. We show that a novel, on-foot mobile noise measurement method coupled with machine learning approaches enables highly accurate prediction of small-scale spatial patterns in traffic-related noise over a mixed-use urban area.

Yin Xiaozhe, Fallah-Shorshani Masoud, McConnell Rob, Fruin Scott, Franklin Meredith