In Social network analysis and mining
The wide popularity of Twitter as a medium of exchanging activities, entertainment, and information is attracted spammers to discover it as a stage to spam clients and spread misinformation. It poses the challenge to the researchers to identify malicious content and user profiles over Twitter such that timely action can be taken. Many previous works have used different strategies to overcome this challenge and combat spammer activities on Twitter. In this work, we develop various models that utilize different features such as profile-based features, content-based features, and hybrid features to identify malicious content and classify it as spam or not-spam. In the first step, we collect and label a large dataset from Twitter to create a spam detection corpus. Then, we create a set of rich features by extracting various features from the collected dataset. Further, we apply different machine learning, ensemble, and deep learning techniques to build the prediction models. We performed a comprehensive evaluation of different techniques over the collected dataset and assessed the performance for accuracy, precision, recall, and f1-score measures. The results showed that the used different sets of learning techniques have achieved a higher performance for the tweet spam classification. In most cases, the values are above 90% for different performance measures. These results show that using profile, content, user, and hybrid features for suspicious tweets detection helps build better prediction models.
Gangwar Surendra Singh, Rathore Santosh Singh, Chouhan Satyendra Singh, Soni Sanskar
Machine learning techniques, Natural language processing, Social network, Suspicious content detection, User-content features