Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Predicting Writing Styles of Web-Based Materials for Children's Health Education Using the Selection of Semantic Features: Machine Learning Approach.

In JMIR medical informatics ; h5-index 23.0

BACKGROUND : Medical writing styles can have an impact on the understandability of health educational resources. Amid current web-based health information research, there is a dearth of research-based evidence that demonstrates what constitutes the best practice of the development of web-based health resources on children's health promotion and education.

OBJECTIVE : Using authoritative and highly influential web-based children's health educational resources from the Nemours Foundation, the largest not-for-profit organization promoting children's health and well-being, we aimed to develop machine learning algorithms to discriminate and predict the writing styles of health educational resources on children versus adult health promotion using a variety of health educational resources aimed at the general public.

METHODS : The selection of natural language features as predicator variables of algorithms went through initial automatic feature selection using ridge classifier, support vector machine, extreme gradient boost tree, and recursive feature elimination followed by revision by education experts. We compared algorithms using the automatically selected (n=19) and linguistically enhanced (n=20) feature sets, using the initial feature set (n=115) as the baseline.

RESULTS : Using five-fold cross-validation, compared with the baseline (115 features), the Gaussian Naive Bayes model (20 features) achieved statistically higher mean sensitivity (P=.02; 95% CI -0.016 to 0.1929), mean specificity (P=.02; 95% CI -0.016 to 0.199), mean area under the receiver operating characteristic curve (P=.02; 95% CI -0.007 to 0.140), and mean macro F1 (P=.006; 95% CI 0.016-0.167). The statistically improved performance of the final model (20 features) is in contrast to the statistically insignificant changes between the original feature set (n=115) and the automatically selected features (n=19): mean sensitivity (P=.13; 95% CI -0.1699 to 0.0681), mean specificity (P=.10; 95% CI -0.1389 to 0.4017), mean area under the receiver operating characteristic curve (P=.008; 95% CI 0.0059-0.1126), and mean macro F1 (P=.98; 95% CI -0.0555 to 0.0548). This demonstrates the importance and effectiveness of combining automatic feature selection and expert-based linguistic revision to develop the most effective machine learning algorithms from high-dimensional data sets.

CONCLUSIONS : We developed new evaluation tools for the discrimination and prediction of writing styles of web-based health resources for children's health education and promotion among parents and caregivers of children. User-adaptive automatic assessment of web-based health content holds great promise for distant and remote health education among young readers. Our study leveraged the precision and adaptability of machine learning algorithms and insights from health linguistics to help advance this significant yet understudied area of research.

Xie Wenxiu, Ji Meng, Liu Yanmeng, Hao Tianyong, Chow Chi-Yin


health educational resource development, health linguistics, machine learning, online health education

General General

Automated detection of muscle fatigue conditions from cyclostationary based geometric features of surface electromyography signals.

In Computer methods in biomechanics and biomedical engineering

In this study, an attempt has been made to develop an automated muscle fatigue detection system using cyclostationary based geometric features of surface electromyography (sEMG) signals. For this purpose, signals are acquired from fifty-eight healthy volunteers under dynamic muscle fatiguing contractions. The sEMG signals are preprocessed and the epochs of signals under nonfatigue and fatigue conditions are considered for the analysis. A computationally effective Fast Fourier transform based accumulation algorithm is adapted to compute the spectral correlation density coefficients. The boundary of spectral density coefficients in the complex plane is obtained using alpha shape method. The geometric features, namely, perimeter, area, circularity, bending energy, eccentricity and inertia are extracted from the shape and the machine learning models based on multilayer perceptron (MLP) and extreme learning machine (ELM) are developed using these biomarkers. The results show that the cyclostationarity increases in fatigue condition. All the extracted features are found to have significant difference in the two conditions. It is found that the ELM model based on prominent features classifies the sEMG signals with a maximum accuracy of 94.09% and F-score of 93.75%. Therefore, the proposed approach appears to be useful for analysing the fatiguing contractions in neuromuscular conditions.

K Divya Bharathi, P A Karthick, S Ramakrishnan


Fatigue analysis, artificial neural networks, cyclostationarity, geometric features, surface electromyography

Public Health Public Health

Exploring Feasibility of Multivariate Deep Learning Models in Predicting COVID-19 Epidemic.

In Frontiers in public health

Background: Mathematical models are powerful tools to study COVID-19. However, one fundamental challenge in current modeling approaches is the lack of accurate and comprehensive data. Complex epidemiological systems such as COVID-19 are especially challenging to the commonly used mechanistic model when our understanding of this pandemic rapidly refreshes. Objective: We aim to develop a data-driven workflow to extract, process, and develop deep learning (DL) methods to model the COVID-19 epidemic. We provide an alternative modeling approach to complement the current mechanistic modeling paradigm. Method: We extensively searched, extracted, and annotated relevant datasets from over 60 official press releases in Hubei, China, in 2020. Multivariate long short-term memory (LSTM) models were developed with different architectures to track and predict multivariate COVID-19 time series for 1, 2, and 3 days ahead. As a comparison, univariate LSTMs were also developed to track new cases, total cases, and new deaths. Results: A comprehensive dataset with 10 variables was retrieved and processed for 125 days in Hubei. Multivariate LSTM had reasonably good predictability on new deaths, hospitalization of both severe and critical patients, total discharges, and total monitored in hospital. Multivariate LSTM showed better results for new and total cases, and new deaths for 1-day-ahead prediction than univariate counterparts, but not for 2-day and 3-day-ahead predictions. Besides, more complex LSTM architecture seemed not to increase overall predictability in this study. Conclusion: This study demonstrates the feasibility of DL models to complement current mechanistic approaches when the exact epidemiological mechanisms are still under investigation.

Chen Shi, Paul Rajib, Janies Daniel, Murphy Keith, Feng Tinghao, Thill Jean-Claude


COVID-19, deep learning, epidemic, modeling, multivariate

General General

Machine Learning Derived Blueprint for Rational Design of the Effective Single-Atom Cathode Catalyst of the Lithium-Sulfur Battery.

In The journal of physical chemistry letters ; h5-index 129.0

The "shuttle effect" and sluggish kinetics at cathode significantly hinder the further improvements of the lithium-sulfur (Li-S) battery, a candidate of next generation energy storage technology. Herein, machine learning based on high-throughput density functional theory calculations is employed to establish the pattern of polysulfides adsorption and screen the supported single-atom catalyst (SAC). The adsorptions are classified as two categories which successfully distinguish S-S bond breaking from the others. Moreover, a general trend of polysulfides adsorption was established regarding of both kind of metal and the nitrogen configurations on support. The regression model has a mean absolute error of 0.14 eV which exhibited a faithful predictive ability. Based on adsorption energy of soluble polysulfides and overpotential, the most promising SAC was proposed, and a volcano curve was found. In the end, a reactivity map is supplied to guide SAC design of the Li-S battery.

Lian Zan, Yang Min, Jan Faheem, Li Bo


General General

Applied Machine Learning for Prediction of CO2 Adsorption on Biomass Waste-Derived Porous Carbons.

In Environmental science & technology ; h5-index 132.0

Biomass waste-derived porous carbons (BWDPCs) are a class of complex materials that are widely used in sustainable waste management and carbon capture. However, their diverse textural properties, the presence of various functional groups, and the varied temperatures and pressures to which they are subjected during CO2 adsorption make it challenging to understand the underlying mechanism of CO2 adsorption. Here, we compiled a data set including 527 data points collected from peer-reviewed publications and applied machine learning to systematically map CO2 adsorption as a function of the textural and compositional properties of BWDPCs and adsorption parameters. Various tree-based models were devised, where the gradient boosting decision trees (GBDTs) had the best predictive performance with R2 of 0.98 and 0.84 on the training and test data, respectively. Further, the BWDPCs in the compiled data set were classified into regular porous carbons (RPCs) and heteroatom-doped porous carbons (HDPCs), where again the GBDT model had R2 of 0.99 and 0.98 on the training and 0.86 and 0.79 on the test data for the RPCs and HDPCs, respectively. Feature importance revealed the significance of adsorption parameters, textural properties, and compositional properties in the order of precedence for BWDPC-based CO2 adsorption, effectively guiding the synthesis of porous carbons for CO2 adsorption applications.

Yuan Xiangzhou, Suvarna Manu, Low Sean, Dissanayake Pavani Dulanja, Lee Ki Bong, Li Jie, Wang Xiaonan, Ok Yong Sik


carbon materials, gas adsorption and separation, gradient boosting decision trees, low carbon technology, machine learning, sustainable waste management

General General

Exploiting the power of information in medical education.

In Medical teacher

The explosion of medical information demands a thorough reconsideration of medical education, including what we teach and assess, how we educate, and whom we educate. Physicians of the future will need to be self-aware, self-directed, resource-effective team players who can synthesize and apply summarized information and communicate clearly. Training in metacognition, data science, informatics, and artificial intelligence is needed. Education programs must shift focus from content delivery to providing students explicit scaffolding for future learning, such as the Master Adaptive Learner model. Additionally, educators should leverage informatics to improve the process of education and foster individualized, precision education. Finally, attributes of the successful physician of the future should inform adjustments in recruitment and admissions processes. This paper explores how member schools of the American Medical Association Accelerating Change in Medical Education Consortium adjusted all aspects of educational programming in acknowledgment of the rapid expansion of information.

Cutrer William B, Spickard W Anderson, Triola Marc M, Allen Bradley L, Spell Nathan, Herrine Steven K, Dalrymple John L, Gorman Paul N, Lomis Kimberly D


Metacognition, active learning, artificial intelligence, clinical informatics, electronic health record, medical education