Sentiment Analysis via Semi-Supervised Learning: A Model based on Dynamic Threshold and Multi-classifiers

Document Type


Publication Date





Sentiment analysis has become a very popular research topic, especially for retrieving valuable information from various online environments. Most existing sentiment studies are based on supervised learning, which requires sufficient amount of labeled data. However, sentiment analysis often faces insufficient labeled data in practice, as it is very expensive and time-consuming to label large amount of data. To handle the scenario of insufficient initial labeled data, we propose a novel semi-supervised model based on dynamic threshold and multi-classifiers. In particular, the training data are auto-labeled in an iterative way based on the proposed dynamic threshold algorithm, where a dynamic threshold function is proposed to set thresholds for selecting the auto-labeled data. It considers both the quality and quantity of the auto-labeled data. In addition, the proposed weighted voting strategy combines multiple support vector machine classifiers by considering performance gap among different classifiers. The performance of the proposed model is validated through experiments on real datasets. Compared with two other existing models, the proposed model achieves the highest sentiment analysis accuracy across datasets with different sizes of initial labeled training data.