In Psychological methods
Determining the number of factors is one of the most crucial decisions a researcher has to face when conducting an exploratory factor analysis. As no common factor retention criterion can be seen as generally superior, a new approach is proposed-combining extensive data simulation with state-of-the-art machine learning algorithms. First, data was simulated under a broad range of realistic conditions and 3 algorithms were trained using specially designed features based on the correlation matrices of the simulated data sets. Subsequently, the new approach was compared with 4 common factor retention criteria with regard to its accuracy in determining the correct number of factors in a large-scale simulation experiment. Sample size, variables per factor, correlations between factors, primary and cross-loadings as well as the correct number of factors were varied to gain comprehensive knowledge of the efficiency of our new method. A gradient boosting model outperformed all other criteria, so in a second step, we improved this model by tuning several hyperparameters of the algorithm and using common retention criteria as additional features. This model reached an out-of-sample accuracy of 99.3% (the pretrained model can be obtained from https://osf.io/mvrau/). A great advantage of this approach is the possibility to continuously extend the data basis (e.g., using ordinal data) as well as the set of features to improve the predictive performance and to increase generalizability. (PsycINFO Database Record (c) 2020 APA, all rights reserved).
Goretzko David, Bühner Markus