In Computers in biology and medicine
Diagnosis of Parkinson's disease (PD) remains a challenge in clinical practice, mostly due to lack of peripheral blood markers. Transcriptomic analysis of blood samples has emerged as a potential means to identify biomarkers and gene signatures of PD. In this context, classification algorithms can assist in detecting data patterns such as phenotypes and transcriptional signatures with potential diagnostic application. In this study, we performed gene expression meta-analysis of blood transcriptome from PD and control patients in order to identify a gene-set capable of predicting PD using classification algorithms. We examined microarray data from public repositories and, after systematic review, 4 independent cohorts (GSE6613, GSE57475, GSE72267 and GSE99039) comprising 711 samples (388 idiopathic PD and 323 healthy individuals) were selected. Initially, analysis of differentially expressed genes resulted in minimal overlap among datasets. To circumvent this, we carried out meta-analysis of 17,712 genes across datasets, and calculated weighted mean Hedges' g effect sizes. From the top-100- positive and negative gene effect sizes, algorithms of collinearity recognition and recursive feature elimination were used to generate a 59-gene signature of idiopathic PD. This signature was evaluated by 9 classification algorithms and 4 sample size-adjusted training groups to create 36 models. Of these, 33 showed accuracy higher than the non-information rate, and 2 models built on Support Vector Machine Regression bestowed best accuracy to predict PD and healthy control samples. In summary, the gene meta-analysis followed by machine learning methodology employed herein identified a gene-set capable of accurately predicting idiopathic PD in blood samples.
Falchetti Marcelo, Prediger Rui Daniel, Zanotto-Filho Alfeu
Blood biopsy, Classification algorithms, Effect size, Microarray, Molecular signature