Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Radiology Radiology

Radiomic Immunophenotyping of GSEA-Assessed Immunophenotypes of Glioblastoma and Its Implications for Prognosis: A Feasibility Study.

In Cancers

Characterization of immunophenotypes in glioblastoma (GBM) is important for therapeutic stratification and helps predict treatment response and prognosis. Radiomics can be used to predict molecular subtypes and gene expression levels. However, whether radiomics aids immunophenotyping prediction is still unknown. In this study, to classify immunophenotypes in patients with GBM, we developed machine learning-based magnetic resonance (MR) radiomic models to evaluate the enrichment levels of four immune subsets: Cytotoxic T lymphocytes (CTLs), activated dendritic cells, regulatory T cells (Tregs), and myeloid-derived suppressor cells (MDSCs). Independent testing data and the leave-one-out cross-validation method were used to evaluate model effectiveness and model performance, respectively. We identified five immunophenotypes (G1 to G5) based on the enrichment level for the four immune subsets. G2 had the worst prognosis and comprised highly enriched MDSCs and lowly enriched CTLs. G3 had the best prognosis and comprised lowly enriched MDSCs and Tregs and highly enriched CTLs. The average accuracy of T1-weighted contrasted MR radiomics models of the enrichment level for the four immune subsets reached 79% and predicted G2, G3, and the "immune-cold" phenotype (G1) according to our radiomics models. Our radiomic immunophenotyping models feasibly characterize the immunophenotypes of GBM and can predict patient prognosis.

Hsu Justin Bo-Kai, Lee Gilbert Aaron, Chang Tzu-Hao, Huang Shiu-Wen, Le Nguyen Quoc Khanh, Chen Yung-Chieh, Kuo Duen-Pang, Li Yi-Tien, Chen Cheng-Yu


first-order statistics, glioblastoma, gray-level co-occurrence matrix, gray-level run length matrix, immunophenotypes, radiogenomics

General General

Telemonitoring Parkinson's disease using machine learning by combining tremor and voice analysis.

In Brain informatics

BACKGROUND : With the growing number of the aged population, the number of Parkinson's disease (PD) affected people is also mounting. Unfortunately, due to insufficient resources and awareness in underdeveloped countries, proper and timely PD detection is highly challenged. Besides, all PD patients' symptoms are neither the same nor they all become pronounced at the same stage of the illness. Therefore, this work aims to combine more than one symptom (rest tremor and voice degradation) by collecting data remotely using smartphones and detect PD with the help of a cloud-based machine learning system for telemonitoring the PD patients in the developing countries.

METHOD : This proposed system receives rest tremor and vowel phonation data acquired by smartphones with built-in accelerometer and voice recorder sensors. The data are primarily collected from diagnosed PD patients and healthy people for building and optimizing machine learning models that exhibit higher performance. After that, data from newly suspected PD patients are collected, and the trained algorithms are evaluated to detect PD. Based on the majority-vote from those algorithms, PD-detected patients are connected with a nearby neurologist for consultation. Upon receiving patients' feedback after being diagnosed by the neurologist, the system may update the model by retraining using the latest data. Also, the system requests the detected patients periodically to upload new data to track their disease progress.

RESULT : The highest accuracy in PD detection using offline data was [Formula: see text] from voice data and [Formula: see text] from tremor data when used separately. In both cases, k-nearest neighbors (kNN) gave the highest accuracy over support vector machine (SVM) and naive Bayes (NB). The application of maximum relevance minimum redundancy (MRMR) feature selection method showed that by selecting different feature sets based on the patient's gender, we could improve the detection accuracy. This study's novelty is the application of ensemble averaging on the combined decisions generated from the analysis of voice and tremor data. The average accuracy of PD detection becomes [Formula: see text] when ensemble averaging was performed on majority-vote from kNN, SVM, and NB.

CONCLUSION : The proposed system can detect PD using a cloud-based system for computation, data preserving, and regular monitoring of voice and tremor samples captured by smartphones. Thus, this system can be a solution for healthcare authorities to ensure the older population's accessibility to a better medical diagnosis system in the developing countries, especially in the pandemic situation like COVID-19, when in-person monitoring is minimal.

Sajal Md Sakibur Rahman, Ehsan Md Tanvir, Vaidyanathan Ravi, Wang Shouyan, Aziz Tipu, Mamun Khondaker Abdullah Al


Accelerometer, Machine-learning, Parkinson’s, Telemonitoring, Tremor

General General

A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data

ArXiv Preprint

Irregularly sampled time series (ISTS) data has irregular temporal intervals between observations and different sampling rates between sequences. ISTS commonly appears in healthcare, economics, and geoscience. Especially in the medical environment, the widely used Electronic Health Records (EHRs) have abundant typical irregularly sampled medical time series (ISMTS) data. Developing deep learning methods on EHRs data is critical for personalized treatment, precise diagnosis and medical management. However, it is challenging to directly use deep learning models for ISMTS data. On the one hand, ISMTS data has the intra-series and inter-series relations. Both the local and global structures should be considered. On the other hand, methods should consider the trade-off between task accuracy and model complexity and remain generality and interpretability. So far, many existing works have tried to solve the above problems and have achieved good results. In this paper, we review these deep learning methods from the perspectives of technology and task. Under the technology-driven perspective, we summarize them into two categories - missing data-based methods and raw data-based methods. Under the task-driven perspective, we also summarize them into two categories - data imputation-oriented and downstream task-oriented. For each of them, we point out their advantages and disadvantages. Moreover, we implement some representative methods and compare them on four medical datasets with two tasks. Finally, we discuss the challenges and opportunities in this area.

Chenxi Sun, Hongda Shen, Moxian Song, Hongyan Li


General General

Optimizing High-Performance Computing Systems for Biomedical Workloads.

In IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

The productivity of computational biologists is limited by the speed of their workflows and subsequent overall job throughput. Because most biomedical researchers are focused on better understanding scientific phenomena rather than developing and optimizing code, a computing and data system implemented in an adventitious and/or non-optimized manner can impede the progress of scientific discovery. In our experience, most computational, life-science applications do not generally leverage the full capabilities of high-performance computing, so tuning a system for these applications is especially critical. To optimize a system effectively, systems staff must understand the effects of the applications on the system. Effective stewardship of the system includes an analysis of the impact of the applications on the compute cores, file system, resource manager and queuing policies. The resulting improved system design, and enactment of a sustainability plan, help to enable a long-term resource for productive computational and data science. We present a case study of a typical biomedical computational workload at a leading academic medical center supporting over $100 million per year in computational biology research. Over the past eight years, our high-performance computing system has enabled over 900 biomedical publications in four major areas: genetics and population analysis, gene expression, machine learning, and structural and chemical biology. We have upgraded the system several times in response to trends, actual usage, and user feedback. Major components crucial to this evolution include scheduling structure and policies, memory size, compute type and speed, parallel file system capabilities, and deployment of cloud technologies. We evolved a 70 teraflop machine to a 1.4 petaflop machine in seven years and grew our user base nearly 10-fold. For long-term stability and sustainability, we established a chargeback fee structure. Our overarching guiding principle for each progression has been to increase scientific throughput and enable enhanced scientific fidelity with minimal impact to existing user workflows or code. This highly-constrained system optimization has presented unique challenges, leading us to adopt new approaches to provide constructive pathways forward. We share our practical strategies resulting from our ongoing growth and assessments.

Kovatch Patricia, Gai Lili, Cho Hyung Min, Fluder Eugene, Jiang Dansha


cloud technologies, computational biology, genomics, high performance computing, parallel file systems, scheduling, sustainability, system optimization

General General

Inferring an animal's environment through biologging: quantifying the environmental influence on animal movement.

In Movement ecology

Background : Animals respond to environmental variation by changing their movement in a multifaceted way. Recent advancements in biologging increasingly allow for detailed measurements of the multifaceted nature of movement, from descriptors of animal movement trajectories (e.g., using GPS) to descriptors of body part movements (e.g., using tri-axial accelerometers). Because this multivariate richness of movement data complicates inference on the environmental influence on animal movement, studies generally use simplified movement descriptors in statistical analyses. However, doing so limits the inference on the environmental influence on movement, as this requires that the multivariate richness of movement data can be fully considered in an analysis.

Methods : We propose a data-driven analytic framework, based on existing methods, to quantify the environmental influence on animal movement that can accommodate the multifaceted nature of animal movement. Instead of fitting a simplified movement descriptor to a suite of environmental variables, our proposed framework centres on predicting an environmental variable from the full set of multivariate movement data. The measure of fit of this prediction is taken to be the metric that quantifies how much of the environmental variation relates to the multivariate variation in animal movement. We demonstrate the usefulness of this framework through a case study about the influence of grass availability and time since milking on cow movements using machine learning algorithms.

Results : We show that on a one-hour timescale 37% of the variation in grass availability and 33% of time since milking influenced cow movements. Grass availability mostly influenced the cows' neck movement during grazing, while time since milking mostly influenced the movement through the landscape and the shared variation of accelerometer and GPS data (e.g., activity patterns). Furthermore, this framework proved to be insensitive to spurious correlations between environmental variables in quantifying the influence on animal movement.

Conclusions : Not only is our proposed framework well-suited to study the environmental influence on animal movement; we argue that it can also be applied in any field that uses multivariate biologging data, e.g., animal physiology, to study the relationships between animals and their environment.

Supplementary information : Supplementary information accompanies this paper at 10.1186/s40462-020-00228-4.

Eikelboom J A J, de Knegt H J, Klaver M, van Langevelde F, van der Wal T, Prins H H T


Behaviour classification, Collective movement, Cows, Foraging, Group dynamics, Lactation, Machine learning, Random forest regression, Resource availability, Support vector machine

General General

Lightme: analysing language in internet support groups for mental health.

In Health information science and systems

Background : Assisting moderators to triage harmful posts in Internet Support Groups is relevant to ensure its safe use. Automated text classification methods analysing the language expressed in posts of online forums is a promising solution.

Methods : Natural Language Processing and Machine Learning technologies were used to build a triage post classifier using a dataset from mental health forum for young people.

Results : When comparing with the state-of-the-art, a solution mainly based on features from lexical resources, received the best classification performance for the crisis posts (52%), which is the most severe class. Six salient linguistic characteristics were found when analysing the crisis post; (1) posts expressing hopelessness, (2) short posts expressing concise negative emotional responses, (3) long posts expressing variations of emotions, (4) posts expressing dissatisfaction with available health services, (5) posts utilising storytelling, and (6) posts expressing users seeking advice from peers during a crisis.

Conclusion : It is possible to build a competitive triage classifier using features derived only from the textual content of the post. Further research needs to be done in order to translate our quantitative and qualitative findings into features, as it may improve overall performance.

Ferraro Gabriela, Loo Gee Brendan, Ji Shenjia, Salvador-Carulla Luis