In PloS one ; h5-index 176.0
RESEARCH MOTIVATION : Recently, the digital divide problem among elderly individuals has been intensifying. A larger problem is that the level of use of digital technology varies from person to person. Therefore, a digital divide may even exist among elderly individuals. Considering the recent accelerating digital transformation in our society, it is highly likely that elderly individuals are experiencing many difficulties in their daily life. Therefore, it is necessary to quickly address and manage these difficulties.
RESEARCH OBJECTIVE : This study aims to predict the digital divide in the elderly population and provide essential insights into managing it. To this end, predictive analysis is performed using public data and machine learning techniques.
METHODS AND MATERIALS : This study used data from the '2020 Report on Digital Information Divide Survey' published by the Korea National Information Society Agency. In establishing the prediction model, various independent variables were used. Ten variables with high importance for predicting the digital divide were identified and used as critical, independent variables to increase the convenience of analyzing the model. The data were divided into 70% for training and 30% for testing. The model was trained on the training set, and the model's predictive accuracy was analyzed on the test set. The prediction accuracy was analyzed using logistic regression (LR), support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), and eXtreme gradient boosting (XGBoost). A convolutional neural network (CNN) was used to further improve the accuracy. In addition, the importance of variables was analyzed using data from 2019 before the COVID-19 outbreak, and the results were compared with the results from 2020.
RESULTS : The study results showed that the variables with high importance in the 2020 data predicting the digital divide of elderly individuals were the demographic perspective, internet usage perspective, self-efficacy perspective, and social connectedness perspective. These variables, as well as the social support perspective, were highly important in 2019. The highest prediction accuracy was achieved using the CNN-based model (accuracy: 80.4%), followed by the XGBoost model (accuracy: 79%) and LR model (accuracy: 78.3%). The lowest accuracy (accuracy: 72.6%) was obtained using the DT model.
DISCUSSION : The results of this analysis suggest that support that can strengthen the practical connection of elderly individuals through digital devices is becoming more critical than ever in a situation where digital transformation is accelerating in various fields. In addition, it is necessary to comprehensively use classification algorithms from various academic fields when constructing a classification model to obtain higher prediction accuracy.
CONCLUSION : The academic significance of this study is that the CNN, which is often employed in image and video processing, was extended and applied to a social science field using structured data to improve the accuracy of the prediction model. The practical significance of this study is that the prediction models and the analytical methodologies proposed in this article can be applied to classify elderly people affected by the digital divide, and the trained models can be used to predict the people of younger generations who may be affected by the digital divide. Another practical significance of this study is that, as a method for managing individuals who are affected by a digital divide, the self-efficacy perspective about acquiring and using ICTs and the socially connected perspective are suggested in addition to the demographic perspective and the internet usage perspective.
Park Jung Ryeol, Feng Yituo