Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Predicting self-harm within six months after initial presentation to youth mental health services: A machine learning study.

In PloS one ; h5-index 176.0

BACKGROUND : A priority for health services is to reduce self-harm in young people. Predicting self-harm is challenging due to their rarity and complexity, however this does not preclude the utility of prediction models to improve decision-making regarding a service response in terms of more detailed assessments and/or intervention. The aim of this study was to predict self-harm within six-months after initial presentation.

METHOD : The study included 1962 young people (12-30 years) presenting to youth mental health services in Australia. Six machine learning algorithms were trained and tested with ten repeats of ten-fold cross-validation. The net benefit of these models were evaluated using decision curve analysis.

RESULTS : Out of 1962 young people, 320 (16%) engaged in self-harm in the six months after first assessment and 1642 (84%) did not. The top 25% of young people as ranked by mean predicted probability accounted for 51.6% - 56.2% of all who engaged in self-harm. By the top 50%, this increased to 82.1%-84.4%. Models demonstrated fair overall prediction (AUROCs; 0.744-0.755) and calibration which indicates that predicted probabilities were close to the true probabilities (brier scores; 0.185-0.196). The net benefit of these models were positive and superior to the 'treat everyone' strategy. The strongest predictors were (in ranked order); a history of self-harm, age, social and occupational functioning, sex, bipolar disorder, psychosis-like experiences, treatment with antipsychotics, and a history of suicide ideation.

CONCLUSION : Prediction models for self-harm may have utility to identify a large sub population who would benefit from further assessment and targeted (low intensity) interventions. Such models could enhance health service approaches to identify and reduce self-harm, a considerable source of distress, morbidity, ongoing health care utilisation and mortality.

Iorfino Frank, Ho Nicholas, Carpenter Joanne S, Cross Shane P, Davenport Tracey A, Hermens Daniel F, Yee Hannah, Nichles Alissa, Zmicerevska Natalia, Guastella Adam, Scott Elizabeth, Hickie Ian B


General General

An Evolutionary Multitasking-Based Feature Selection Method for High-Dimensional Classification.

In IEEE transactions on cybernetics

Feature selection (FS) is an important data preprocessing technique in data mining and machine learning, which aims to select a small subset of information features to increase the performance and reduce the dimensionality. Particle swarm optimization (PSO) has been successfully applied to FS due to being efficient and easy to implement. However, most of the existing PSO-based FS methods face the problems of trapping into local optima and computationally expensive high-dimensional data. Multifactorial optimization (MFO), as an effective evolutionary multitasking paradigm, has been widely used for solving complex problems through implicit knowledge transfer between related tasks. Inspired by MFO, this study proposes a novel PSO-based FS method to solve high-dimensional classification via information sharing between two related tasks generated from a dataset. To be specific, two related tasks about the target concept are established by evaluating the importance of features. A new crossover operator, called assortative mating, is applied to share information between these two related tasks. In addition, two mechanisms, which are variable-range strategy and subset updating mechanism, are also developed to reduce the search space and maintain the diversity of the population, respectively. The results show that the proposed FS method can achieve higher classification accuracy with a smaller feature subset in a reasonable time than the state-of-the-art FS methods on the examined high-dimensional classification problems.

Chen Ke, Xue Bing, Zhang Mengjie, Zhou Fengyu


General General

Direct Kinetic Fingerprinting for High-Accuracy Single-Molecule Counting of Diverse Disease Biomarkers.

In Accounts of chemical research ; h5-index 162.0

ConspectusMethods for detecting and quantifying disease biomarkers in biofluids with high specificity and sensitivity play a pivotal role in enabling clinical diagnostics, including point-of-care tests. The most widely used molecular biomarkers include proteins, nucleic acids, hormones, metabolites, and other small molecules. While numerous methods have been developed for analyzing biomarkers, most techniques are challenging to implement for clinical use due to insufficient analytical performance, high cost, and/or other practical shortcomings. For instance, the detection of cell-free nucleic acid (cfNA) biomarkers by digital PCR and next-generation sequencing (NGS) requires time-consuming nucleic acid extraction steps, often introduces enzymatic amplification bias, and can be costly when high specificity is required. While several amplification-free methods for detecting cfNAs have been reported, these techniques generally suffer from low specificity and sensitivity. Meanwhile, the quantification of protein biomarkers is generally performed using immunoassays such as enzyme-linked immunosorbent assay (ELISA); the analytical performance of these methods is often limited by the availability of antibodies with high affinity and specificity as well as the significant nonspecific binding of antibodies to assay surfaces. To address the drawbacks of existing biomarker detection methods and establish a universal diagnostics platform capable of detecting different types of analytes, we have developed an amplification-free approach, named single-molecule recognition through equilibrium Poisson sampling (SiMREPS), for the detection of diverse biomarkers with arbitrarily high specificity and single-molecule sensitivity. SiMREPS utilizes the transient, reversible binding of fluorescent detection probes to immobilized target molecules to generate kinetic fingerprints that are detected by single-molecule fluorescence microscopy. The analysis of these kinetic fingerprints enables nearly perfect discrimination between specific binding to target molecules and any nonspecific binding. Early proof-of-concept studies demonstrated the in vitro detection of miRNAs with a limit of detection (LOD) of approximately 1 fM and >500-fold selectivity for single-nucleotide polymorphisms. The SiMREPS approach was subsequently expanded to the detection of rare mutant DNA alleles from biofluids at mutant allele fractions of as low as 1 in 1 million, corresponding to a specificity of >99.99999%. Recently, SiMREPS was generalized to protein quantification using dynamically binding antibody probes, permitting LODs in the low-femtomolar to attomolar range. Finally, SiMREPS has been demonstrated to be suitable for the in situ detection of miRNAs in cultured cells, the quantification of small-molecule toxins and drugs, and the monitoring of telomerase activity at the single-molecule level. In this Account, we discuss the principles of SiMREPS for the highly specific and sensitive detection of molecular analytes, including considerations for assay design. We discuss the generality of SiMREPS for the detection of very disparate analytes and provide an overview of data processing methods, including the expansion of the dynamic range using super-resolution analysis and the improvement of performance using deep learning algorithms. Finally, we describe current challenges, opportunities, and future directions for the SiMREPS approach.

Mandal Shankar, Li Zi, Chatterjee Tanmay, Khanna Kunal, Montoya Karen, Dai Liuhan, Petersen Chandler, Li Lidan, Tewari Muneesh, Johnson-Buck Alexander, Walter Nils G


Radiology Radiology

Detecting MLC modeling errors using radiomics-based machine learning in patient-specific QA with an EPID for intensity-modulated radiation therapy.

In Medical physics ; h5-index 59.0

PURPOSE : We sought to develop machine learning models to detect multileaf collimator (MLC) modeling errors with the use of radiomic features of fluence maps measured in patient-specific quality assurance (QA) for intensity-modulated radiation therapy (IMRT) with an electric portal imaging device (EPID).

METHODS : Fluence maps measured with EPID for 38 beams from 19 clinical IMRT plans were assessed. Plans with various degrees of error in MLC modeling parameters (i.e., MLC transmission factor [TF] and dosimetric leaf gap [DLG]) and plans with an MLC positional error for comparison were created. For a total of 152 error plans for each type of error, we calculated fluence difference maps for each beam by subtracting the calculated maps from the measured maps. A total of 837 radiomic features were extracted from each fluence difference map, and we determined the number of features used for the training dataset in the machine learning models by using random forest regression. Machine learning models using the five typical algorithms (decision tree, kNN, SVM, logistic regression, and random forest) for binary classification between the error-free plan and the plan with the corresponding error for each type of error were developed. We used part of the total dataset to perform four-fold cross-validation to tune the models, and we used the remaining test dataset to evaluate the performance of the developed models. A gamma analysis was also performed between the measured and calculated fluence maps with the criteria of 3%/2 mm and 2%/2 mm for all of the types of error.

RESULTS : The radiomic features and its optimal number were similar for the models for the TF and the DLG error detection, which was different from the MLC positional error. The highest sensitivity was obtained as 0.913 for the TF error with SVM and logistic regression, 0.978 for the DLG error with kNN and SVM, and 1.000 for the MLC positional error with kNN, SVM and random forest. The highest specificity was obtained as 1.000 for the TF error with a decision tree, SVM and logistic regression, 1.000 for the DLG error with a decision tree, logistic regression and random forest, and 0.909 for the MLC positional error with a decision tree and logistic regression. The gamma analysis showed the poorest performance in which sensitivities were 0.737 for the TF error and the DLG error and 0.882 for the MLC positional error for 3%/2 mm. The addition of another type of error to fluence maps significantly reduced the sensitivity for the TF and the DLG error, whereas no effect was observed for the MLC positional error detection.

CONCLUSIONS : Compared to the conventional gamma analysis, the radiomics-based machine learning models showed higher sensitivity and specificity in detecting a single type of the MLC modeling error and the MLC positional error. Although the developed models need further improvement for detecting multiple types of error, radiomics-based IMRT QA was shown to be a promising approach for detecting the MLC modeling error.

Sakai Madoka, Nakano Hisashi, Kawahara Daisuke, Tanabe Satoshi, Takizawa Takeshi, Narita Akihiro, Yamada Takumi, Sakai Hironori, Ueda Masataka, Sasamoto Ryuta, Kaidu Motoki, Aoyama Hidefumi, Ishikawa Hiroyuki, Utsunomiya Satoru


IMRT QA, machine learning, quality assurance, radiomics

General General

Characterization of gait variability in multiple system atrophy and Parkinson's disease.

In Journal of neurology

BACKGROUND : Gait impairment is a pivotal feature of parkinsonian syndromes and increased gait variability is associated with postural instability and a higher risk of falls.

OBJECTIVES : We compared gait variability at different walking velocities between and within groups of patients with Parkinson-variant multiple system atrophy, idiopathic Parkinson's disease, and a control group of older adults.

METHODS : Gait metrics were recorded in 11 multiple system atrophy, 12 Parkinson's disease patients, and 18 controls using sensor-based gait analysis. Gait variability was analyzed for stride, swing and stance time, stride length and gait velocity. Values were compared between and within the groups at self-paced comfortable, fast and slow walking speed.

RESULTS : Multiple system atrophy patients displayed higher gait variability except for stride time at all velocities compared with controls, while Parkinson's patients did not. Compared with Parkinson's disease, multiple system atrophy patients displayed higher variability of swing time, stride length and gait velocity at comfortable speed and at slow speed for swing and stance time, stride length and gait velocity (all P < 0.05). Stride time variability was significantly higher in slow compared to comfortable walking in patients with multiple system atrophy (P = 0.014). Variability parameters significantly correlated with the postural instability/gait difficulty subscore in both disease groups. Conversely, significant correlations between variability parameters and MDS-UPDRS III score was observed only for multiple system atrophy patients.

CONCLUSION : This analysis suggests that gait variability parameters reflect the major axial impairment and postural instability displayed by multiple system atrophy patients compared with Parkinson's disease patients and controls.

Sidoroff Victoria, Raccagni Cecilia, Kaindlstorfer Christine, Eschlboeck Sabine, Fanciulli Alessandra, Granata Roberta, Eskofier Björn, Seppi Klaus, Poewe Werner, Willeit Johann, Kiechl Stefan, Mahlknecht Philipp, Stockner Heike, Marini Kathrin, Schorr Oliver, Rungger Gregorio, Klucken Jochen, Wenning Gregor, Gaßner Heiko


Gait analysis, Gait variability, Multiple system atrophy, Parkinson’s disease, Wearable sensors

General General

Automatically Explaining Machine Learning Prediction Results on Asthma Hospital Visits in Patients With Asthma: Secondary Analysis.

In JMIR medical informatics ; h5-index 23.0

BACKGROUND : Asthma is a major chronic disease that poses a heavy burden on health care. To facilitate the allocation of care management resources aimed at improving outcomes for high-risk patients with asthma, we recently built a machine learning model to predict asthma hospital visits in the subsequent year in patients with asthma. Our model is more accurate than previous models. However, like most machine learning models, it offers no explanation of its prediction results. This creates a barrier for use in care management, where interpretability is desired.

OBJECTIVE : This study aims to develop a method to automatically explain the prediction results of the model and recommend tailored interventions without lowering the performance measures of the model.

METHODS : Our data were imbalanced, with only a small portion of data instances linking to future asthma hospital visits. To handle imbalanced data, we extended our previous method of automatically offering rule-formed explanations for the prediction results of any machine learning model on tabular data without lowering the model's performance measures. In a secondary analysis of the 334,564 data instances from Intermountain Healthcare between 2005 and 2018 used to form our model, we employed the extended method to automatically explain the prediction results of our model and recommend tailored interventions. The patient cohort consisted of all patients with asthma who received care at Intermountain Healthcare between 2005 and 2018, and resided in Utah or Idaho as recorded at the visit.

RESULTS : Our method explained the prediction results for 89.7% (391/436) of the patients with asthma who, per our model's correct prediction, were likely to incur asthma hospital visits in the subsequent year.

CONCLUSIONS : This study is the first to demonstrate the feasibility of automatically offering rule-formed explanations for the prediction results of any machine learning model on imbalanced tabular data without lowering the performance measures of the model. After further improvement, our asthma outcome prediction model coupled with the automatic explanation function could be used by clinicians to guide the allocation of limited asthma care management resources and the identification of appropriate interventions.

Luo Gang, Johnson Michael D, Nkoy Flory L, He Shan, Stone Bryan L


asthma, forecasting, machine learning, patient care management