In Twin research and human genetics : the official journal of the International Society for Twin Studies
Accurate zygosity determination is a fundamental step in twin research. Although DNA-based testing is the gold standard for determining zygosity, collecting biological samples is not feasible in all research settings or all families. Previous work has demonstrated the feasibility of zygosity estimation based on questionnaire (physical similarity) data in older twins, but the extent to which this is also a reliable approach in infancy is less well established. Here, we report the accuracy of different questionnaire-based zygosity determination approaches (traditional and machine learning) in 5.5 month-old twins. The participant cohort comprised 284 infant twin pairs (128 dizygotic and 156 monozygotic) who participated in the Babytwins Study Sweden (BATSS). Manual scoring based on an established technique validated in older twins accurately predicted 90.49% of the zygosities with a sensitivity of 91.65% and specificity of 89.06%. The machine learning approach improved the prediction accuracy to 93.10%, with a sensitivity of 91.30% and specificity of 94.29%. Additionally, we quantified the systematic impact of zygosity misclassification on estimates of genetic and environmental influences using simulation-based sensitivity analysis on a separate data set to show the implication of our machine learning accuracy gain. In conclusion, our study demonstrates the feasibility of determining zygosity in very young infant twins using a questionnaire with four items and builds a scalable machine learning model with better metrics, thus a viable alternative to DNA tests in large-scale infant twin studies.
Hardiansyah Irzam, Hamrefors Linnea, Siqueiros Monica, Falck-Ytter Terje, Tammimies Kristiina
ACE model, Zygosity prediction, bootstrapping, computational statistics, machine learning, twins questionnaire method