In BMC psychiatry
BACKGROUND : Machine learning (ML) algorithms and methods offer great tools to analyze large complex genomic datasets. Our goal was to compare the genomic architecture of schizophrenia (SCZ) and autism spectrum disorder (ASD) using ML.
METHODS : In this paper, we used regularized gradient boosted machines to analyze whole-exome sequencing (WES) data from individuals SCZ and ASD in order to identify important distinguishing genetic features. We further demonstrated a method of gene clustering to highlight which subsets of genes identified by the ML algorithm are mutated concurrently in affected individuals and are central to each disease (i.e., ASD vs. SCZ "hub" genes).
RESULTS : In summary, after correcting for population structure, we found that SCZ and ASD cases could be successfully separated based on genetic information, with 86-88% accuracy on the testing dataset. Through bioinformatic analysis, we explored if combinations of genes concurrently mutated in patients with the same condition ("hub" genes) belong to specific pathways. Several themes were found to be associated with ASD, including calcium ion transmembrane transport, immune system/inflammation, synapse organization, and retinoid metabolic process. Moreover, ion transmembrane transport, neurotransmitter transport, and microtubule/cytoskeleton processes were highlighted for SCZ.
CONCLUSIONS : Our manuscript introduces a novel comparative approach for studying the genetic architecture of genetically related diseases with complex inheritance and highlights genetic similarities and differences between ASD and SCZ.
Sardaar Sameer, Qi Bill, Dionne-Laporte Alexandre, Rouleau Guy A, Rabbany Reihaneh, Trakadis Yannis J
Autism spectrum disorder, Genomic, Machine learning, Schizophrenia, Unsupervised clustering