Machine learning techniques are increasingly used in the analysis of high throughput genome sequencing data to better understand the disease process and design of therapeutic modalities. In the current study, we have applied state of the art machine learning (ML) algorithms (Random Forest (RF), Support Vector Machine Radial Kernel (svmR), Adaptive Boost (AdaBoost), averaged Neural Network (avNNet), and Gradient Boosting Machine (GBM)) to stratify the HNSCC patients in early and late clinical stages (TNM) and to predict the risk using miRNAs expression profiles. A six miRNA signature was identified that can stratify patients in the early and late stages. The mean accuracy, sensitivity, specificity, and area under the curve (AUC) was found to be 0.84, 0.87, 0.78, and 0.82, respectively indicating the robust performance of the generated model. The prognostic signature of eight miRNAs was identified using LASSO (least absolute shrinkage and selection operator) penalized regression. These miRNAs were found to be significantly associated with overall survival of the patients. The pathway and functional enrichment analysis of the identified biomarkers revealed their involvement in important cancer pathways such as GP6 signalling, Wnt signalling, p53 signalling, granulocyte adhesion, and dipedesis. To the best of our knowledge, this is the first such study and we hope that these signature miRNAs will be useful for the risk stratification of patients and the design of therapeutic modalities.
Kumar Sugandh, Patnaik Srinivas, Dixit Anshuman
Biomarker, Head and neck cancer, Machine learning, TNM stage, mRNA, microRNA