Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Regulatory genes identification within functional genomics experiments for tissue classification into binary classes via machine learning techniques.

In JPMA. The Journal of the Pakistan Medical Association

OBJECTIVE : The aim of this study is to filter out the most informative genes that mainly regulate the target tissue class, increase classification accuracy, reduce the curse of dimensionality, and discard redundant and irrelevant genes.

Method : This paper presented the idea of gene selection using bagging sub-forest (BSF). The proposed method provided genes importance grounded on the idea specified in the standard random forest algorithm. The new method is compared with three state-of-the art methods, i.e., Wilcoxon, masked painter and proportional overlapped score (POS). These methods were applied on 5 data sets, i.e. Colon, Lymph node breast cancer, Leukaemia, Serrated colorectal carcinomas, and Breast Cancer. Comparison was done by selecting top 20 genes by applying the gene selection methods and applying random forest (RF) and support vector machine (SVM) classifiers to assess their predictive performance on the datasets with selected genes. Classification accuracy, Brier score, and sensitivity have been used as performance measures.

RESULTS : The proposed method gave better results than the other methods using both random forest and SVM classifiers on all the datasets among all the feature selection methods.

CONCLUSIONS : The proposed method showed improved performance in terms of classification accuracy, Brier score and sensitivity, and hence, could be used as a novel method for gene selection to classify tissue samples into their correct classes.

Wazir Bushra, Khan Dost Muhammad, Khalil Umair, Hamraz Muhammad, Gul Naz, Khan Zardad


** Gene selection, classification, random forest, cancer, microarray gene expression \n\n**

General General

Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis.

In Journal of medical Internet research ; h5-index 88.0

BACKGROUND : The impending scale up of noncommunicable disease screening programs in low- and middle-income countries coupled with limited health resources require that such programs be as accurate as possible at identifying patients at high risk.

OBJECTIVE : The aim of this study was to develop machine learning-based risk stratification algorithms for diabetes and hypertension that are tailored for the at-risk population served by community-based screening programs in low-resource settings.

METHODS : We trained and tested our models by using data from 2278 patients collected by community health workers through door-to-door and camp-based screenings in the urban slums of Hyderabad, India between July 14, 2015 and April 21, 2018. We determined the best models for predicting short-term (2-month) risk of diabetes and hypertension (a model for diabetes and a model for hypertension) and compared these models to previously developed risk scores from the United States and the United Kingdom by using prediction accuracy as characterized by the area under the receiver operating characteristic curve (AUC) and the number of false negatives.

RESULTS : We found that models based on random forest had the highest prediction accuracy for both diseases and were able to outperform the US and UK risk scores in terms of AUC by 35.5% for diabetes (improvement of 0.239 from 0.671 to 0.910) and 13.5% for hypertension (improvement of 0.094 from 0.698 to 0.792). For a fixed screening specificity of 0.9, the random forest model was able to reduce the expected number of false negatives by 620 patients per 1000 screenings for diabetes and 220 patients per 1000 screenings for hypertension. This improvement reduces the cost of incorrect risk stratification by US $1.99 (or 35%) per screening for diabetes and US $1.60 (or 21%) per screening for hypertension.

CONCLUSIONS : In the next decade, health systems in many countries are planning to spend significant resources on noncommunicable disease screening programs and our study demonstrates that machine learning models can be leveraged by these programs to effectively utilize limited resources by improving risk stratification.

Boutilier Justin J, Chan Timothy C Y, Ranjan Manish, Deo Sarang


diabetes, global health, hypertension, machine learning, screening

oncology Oncology

Adjuvant and Neoadjuvant Treatment of Triple-Negative Breast Cancer With Chemotherapy.

In Cancer journal (Sudbury, Mass.)

Triple-negative breast cancer (TNBC) accounts for 15% to 20% of all invasive breast carcinomas and is defined by the lack of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2. Although TNBC is characterized by high rates of disease recurrence and worse survival, it is significantly more sensitive to chemotherapy as compared with other breast cancer subtypes. Accordingly, despite great efforts in the genomic characterization of TNBC, chemotherapy still represents the cornerstone of treatment. For the majority of patients with early-stage TNBC, sequential anthracycline- and taxane-based neoadjuvant chemotherapy (NACT) represents the standard therapeutic approach, with pathological complete response that strongly correlates with long-term survival outcomes. However, some issues about the optimal neoadjuvant regimen, as well as the effective role of chemotherapy in patients with residual disease after NACT, are still debated. Herein, we will review the current evidences that guide the use of (neo)adjuvant chemotherapy in patients with early-stage TNBC. Furthermore, we will discuss current controversies, including the incorporation of platinum compounds to the neoadjuvant backbone and the optimal treatment for patients with residual disease after NACT. Lastly, we will outline potential future directions that can guide treatment escalation and de-escalation, as well as the development of new therapies. In our view, the application of multi-omics technologies, liquid biopsy assays, and machine learning algorithms are strongly warranted to pave the way toward personalized anticancer treatment for early-stage TNBC.

Marra Antonio, Curigliano Giuseppe

General General

ADOPT: automatic deep learning and optimization-based approach for detection of novel coronavirus COVID-19 disease using X-ray images.

In Journal of biomolecular structure & dynamics

In the hospital, because of the rise in cases daily, there are a small number of COVID-19 test kits available. For this purpose, a rapid alternative diagnostic choice to prevent COVID-19 spread among individuals must be implemented as an automatic detection method. In this article, the multi-objective optimization and deep learning-based technique for identifying infected patients with coronavirus using X-rays is proposed. J48 decision tree approach classifies the deep feature of corona affected X-ray images for the efficient detection of infected patients. In this study, 11 different convolutional neural network-based (CNN) models (AlexNet, VGG16, VGG19, GoogleNet, ResNet18, ResNet50, ResNet101, InceptionV3, InceptionResNetV2, DenseNet201 and XceptionNet) are developed for detection of infected patients with coronavirus pneumonia using X-ray images. The efficiency of the proposed model is tested using k-fold cross-validation method. Moreover, the parameters of CNN deep learning model are tuned using multi-objective spotted hyena optimizer (MOSHO). Extensive analysis shows that the proposed model can classify the X-ray images at a good accuracy, precision, recall, specificity and F1-score rates. Extensive experimental results reveal that the proposed model outperforms competitive models in terms of well-known performance metrics. Hence, the proposed model is useful for real-time COVID-19 disease classification from X-ray chest images. Communicated by Ramaswamy H. Sarma.

Dhiman Gaurav, Chang Victor, Kant Singh Krishna, Shankar Achyut


CNN, COVID-19, Coronavirus, J48, MOSHO, deep learning, optimization

General General

On the usage of average Hausdorff distance for segmentation performance assessment: hidden error when used for ranking.

In European radiology experimental

Average Hausdorff distance is a widely used performance measure to calculate the distance between two point sets. In medical image segmentation, it is used to compare ground truth images with segmentations allowing their ranking. We identified, however, ranking errors of average Hausdorff distance making it less suitable for applications in segmentation performance assessment. To mitigate this error, we present a modified calculation of this performance measure that we have coined "balanced average Hausdorff distance". To simulate segmentations for ranking, we manually created non-overlapping segmentation errors common in magnetic resonance angiography cerebral vessel segmentation as our use-case. Adding the created errors consecutively and randomly to the ground truth, we created sets of simulated segmentations with increasing number of errors. Each set of simulated segmentations was ranked using both performance measures. We calculated the Kendall rank correlation coefficient between the segmentation ranking and the number of errors in each simulated segmentation. The rankings produced by balanced average Hausdorff distance had a significantly higher median correlation (1.00) than those by average Hausdorff distance (0.89). In 200 total rankings, the former misranked 52 whilst the latter misranked 179 segmentations. Balanced average Hausdorff distance is more suitable for rankings and quality assessment of segmentations than average Hausdorff distance.

Aydin Orhun Utku, Taha Abdel Aziz, Hilbert Adam, Khalil Ahmed A, Galinovic Ivana, Fiebach Jochen B, Frey Dietmar, Madai Vince Istvan


Average Hausdorff distance, Cerebral angiography, Cerebral arteries, Image processing (computer-assisted)

General General

Outcome prediction in aneurysmal subarachnoid hemorrhage: a comparison of machine learning methods and established clinico-radiological scores.

In Neurosurgical review ; h5-index 27.0

Reliable prediction of outcomes of aneurysmal subarachnoid hemorrhage (aSAH) based on factors available at patient admission may support responsible allocation of resources as well as treatment decisions. Radiographic and clinical scoring systems may help clinicians estimate disease severity, but their predictive value is limited, especially in devising treatment strategies. In this study, we aimed to examine whether a machine learning (ML) approach using variables available on admission may improve outcome prediction in aSAH compared to established scoring systems. Combined clinical and radiographic features as well as standard scores (Hunt & Hess, WFNS, BNI, Fisher, and VASOGRADE) available on patient admission were analyzed using a consecutive single-center database of patients that presented with aSAH (n = 388). Different ML models (seven algorithms including three types of traditional generalized linear models, as well as a tree bosting algorithm, a support vector machine classifier (SVMC), a Naive Bayes (NB) classifier, and a multilayer perceptron (MLP) artificial neural net) were trained for single features, scores, and combined features with a random split into training and test sets (4:1 ratio), ten-fold cross-validation, and 50 shuffles. For combined features, feature importance was calculated. There was no difference in performance between traditional and other ML applications using traditional clinico-radiographic features. Also, no relevant difference was identified between a combined set of clinico-radiological features available on admission (highest AUC 0.78, tree boosting) and the best performing clinical score GCS (highest AUC 0.76, tree boosting). GCS and age were the most important variables for the feature combination. In this cohort of patients with aSAH, the performance of functional outcome prediction by machine learning techniques was comparable to traditional methods and established clinical scores. Future work is necessary to examine input variables other than traditional clinico-radiographic features and to evaluate whether a higher performance for outcome prediction in aSAH can be achieved.

Dengler Nora Franziska, Madai Vince Istvan, Unteroberdörster Meike, Zihni Esra, Brune Sophie Charlotte, Hilbert Adam, Livne Michelle, Wolf Stefan, Vajkoczy Peter, Frey Dietmar


Aneurysmal subarachnoid hemorrhage, Artificial neural net, Deep learning, Outcome prediction, Tree boosting