In Genetic epidemiology
Gene-gene interaction (G × G) is thought to fill the gap between the estimated heritability of complex diseases and the limited genetic proportion explained by identified single-nucleotide polymorphisms. The current tools for exploring G × G were often developed for case-control designs with less considerations for their applications in families. Family-based studies are robust against bias led from population stratification in genetic studies and helpful in understanding G × G. We proposed a new algorithm epistasis sparse factor analysis (EPISFA) and epistasis sparse factor analysis for linkage disequilibrium (EPISFA-LD) based on unsupervised machine learning to screen G × G. Extensive simulations were performed to compare EPISFA/EPISFA-LD with a classical family-based algorithm FAM-MDR (family-based multifactor dimensionality reduction). The results showed that EPISFA/EPISFA-LD is a tool of both high power and computational efficiency that could be applied in family designs and is applicable within high-dimensionality datasets. Finally, we applied EPISFA/EPISFA-LD to a real dataset drawn from the Fangshan/family-based Ischemic Stroke Study in China. Five pairs of G × G were discovered by EPISFA/EPISFA-LD, including three pairs verified by other algorithms (FAM-MDR and logistic), and an additional two pairs uniquely identified by EPISFA/EPISFA-LD only. The results from EPISFA might offer new insights for understanding the genetic etiology of complex diseases. EPISFA/EPISFA-LD was implemented in R. All relevant source code as well as simulated data could be freely downloaded from https://github.com/doublexism/episfa.
Xiang Xiao, Wang Siyue, Liu Tianyi, Wang Mengying, Li Jiawen, Jiang Jin, Wu Tao, Hu Yonghua
family designs, gene-gene interaction, ischemic stroke, sparse factor analysis, unsupervised machine learning