In IEEE access : practical innovations, open solutions
Newspapers are very important for a society as they inform citizens about the events around them and how they can impact their life. Their importance becomes more crucial and indispensable in the times of health crisis such as the current COVID-19 pandemic. Since the starting of this pandemic newspapers are providing rich information to the public about various issues such as the discovery of a new strain of coronavirus, lockdown and other restrictions, government policies, and information related to the vaccine development for the same. In this scenario, analysis of emergent and widely reported topics/themes/issues and associated sentiments from various countries can help us better understand the COVID-19 pandemic. In our research, the database of more than 100,000 COVID-19 news headlines and articles were analyzed using top2vec (for topic modeling) and RoBERTa (for sentiment classification and analysis). Our topic modeling results highlighted that education, economy, US, and sports are some of the most common and widely reported themes across UK, India, Japan, South Korea. Further, our sentiment classification model achieved 90% validation accuracy and the analysis showed that the worst affected country, i.e. the UK (in our dataset) also has the highest percentage of negative sentiment.
Ghasiya Piyush, Okamura Koji
COVID-19, RoBERTa, Top2Vec, machine learning, natural language processing, newspaper, sentiment analysis, topic modeling