Generic placeholder image

Recent Advances in Computer Science and Communications

Editor-in-Chief

ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Research Article

Sentiment Classification Using Feature Selection Techniques for Text Data Composed of Heterogeneous Sources

Author(s): Vaishali Arya* and Rashmi Agrawal

Volume 15, Issue 2, 2022

Published on: 18 August, 2020

Page: [207 - 214] Pages: 8

DOI: 10.2174/2666255813999200818133555

Price: $65

Abstract

Aims: This study analyzes feature selection techniques for text data composed of heterogeneous sources for sentiment classification

Objectives: The objective of work is to analyze the feature selection technique for text gathered from different sources to increase the accuracy of sentiment classification done on microblogs.

Methods: Three feature selection techniques Bag-of-Word(BOW), TF-IDF, and word2vector were applied to find the most suitable feature selection techniques for heterogeneous datasets.

Results: TF-IDF outperforms all of the three selected feature selection techniques for sentiment classification with SVM classifier.

Conclusion: Feature selection is an integral part of any data preprocessing task, and along with that, it is also important for the machine learning algorithms to achieve good accuracy in classification results. Hence it is essential to find out the best suitable approach for heterogeneous sources of data. The heterogeneous sources are rich sources of information and they also play an important role in developing a model for adaptable systems as well. So keeping that also in mind, we compared the three techniques for heterogeneous source data and found that TF-IDF is the most suitable one for all types of data, whether it is balanced or imbalanced data, it is a single source or multiple source data. In all cases, the TF-IDF approach is the most promising approach in generating the results for the classification of sentiments of users.

Keywords: Feature selection, sentiment classifier, machine learning, heterogeneous source, bag of word, TF-IDF, Word2Vec.

Graphical Abstract

[1]
J. Li, K. Cheng, S. Wang, F. Morstatter, R.P. Trevino, J. Tang, and H. Liu, "Feature selection: A data perspective", ACM Comput. Surv., vol. 50, no. 6, pp. 1-45, 2017.
[http://dx.doi.org/10.1145/3136625]
[2]
M. Labani, P. Moradi, F. Ahmadizar, and M. Jalili, "A novel multivariate filter method for feature selection in text classification problems", Eng. Appl. Artif. Intell., vol. 70, pp. 25-37, 2018.
[http://dx.doi.org/10.1016/j.engappai.2017.12.014]
[3]
H. Xu, F. Zhang, and W. Wang, "Implicit feature identification in Chinese reviews using explicit topic mining model", Knowl. Base. Syst., vol. 76, pp. 166-175, 2015.
[http://dx.doi.org/10.1016/j.knosys.2014.12.012]
[4]
M.L. Williams, P. Burnap, and L. Sloan, "Crime sensing with big data: The affordances and limitations of using open-source communications to estimate crime patterns", Br. J. Criminol., vol. 57, no. 2, pp. 320-340, 2017.
[5]
P. Wang, B. Xu, J. Xu, G. Tian, C.L. Liu, and H. Hao, "Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification", Neurocomputing, vol. 174, pp. 806-814, 2016.
[http://dx.doi.org/10.1016/j.neucom.2015.09.096]
[6]
F. Chen, and Y. Huang, "Knowledge-enhanced neural networks for sentiment analysis of Chinese reviews", Neurocomputing, vol. 368, pp. 51-58, 2019.
[http://dx.doi.org/10.1016/j.neucom.2019.08.054]
[7]
L. Zheng, H. Wang, and S. Gao, "Sentimental feature selection for sentiment analysis of Chinese online reviews", International Journal of Machine Learning and Cybernetics, vol. 9, no. 1, pp. 75-84, 2018.
[8]
D. Cao, R. Ji, D. Lin, and S. Li, "A cross-media public sentiment analysis system for microblog", Multimedia Syst., vol. 22, no. 4, pp. 479-486, 2016.
[http://dx.doi.org/10.1007/s00530-014-0407-8]
[9]
M. Chen, X. Jin, and D. Shen, "Short text classification improved by learning multi-granularity topics", Twenty-Second International Joint Conference on Artificial Intelligence, 2011pp. 1776-1781
[10]
A. Deniz, H.E. Kiziloz, T. Dokeroglu, and A. Cosar, "Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques", Neurocomputing, vol. 241, pp. 128-146, 2017.
[11]
B. Huang, B. Buckley, and T.M. Kechadi, "Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications", Expert Syst. Appl., vol. 37, no. 5, pp. 3638-3646, 2010.
[http://dx.doi.org/10.1016/j.eswa.2009.10.027]
[12]
M.Z. Asghar, A. Khan, S. Ahmad, and F.M. Kundi, "A review of feature extraction in sentiment analysis", J. Basic Appl. Sci. Res., vol. 4, no. 3, pp. 181-186, 2014.
[13]
I. Guyon, S. Gunn, M. Nikravesh, L.A. Zadeh, Eds., Feature extraction: Foundations and applications., Springer, 2008.https://books.google.co.in/books?hl=en&lr=&id=FOTzBwAAQBAJ&oi=fnd&pg=PA1&dq=4.%09An+Introduction+to+Feature+Extraction.+In:+Guyon,+I.,+Nikravesh,+M.,+Gunn,+S.,+Zadeh,+L.A.+(Eds.),+Feature+Extraction:+Foundations+and+Applications.+Springer+Berlin+Heidelberg,+Berlin,+Heidelberg,+pp.+1–25.&ots=5Tn5Qa8mp_&sig=kZ9ZFos8TMVCOlEwDCPiJltGNpk&redir_esc=y#v=onepage&q&f=false
[14]
S. Qaiser, U. Utara, M. Sintok, M. Kedah, A. Ramsha, and T. Analytics, "Text mining: Use of TF-IDF to examine the relevance of words to documents text mining", Int. J. Comput. Appl., vol. 181, no. 1, pp. 25-29, 2018.
[15]
S. Ghosh, and M.S. Desarkar, "Class specific TF-IDF boosting for short-text classification: Application to short-texts generated during disasters", In Companion Proceedings of the Web Conference, 2018pp. 1629-1637
[http://dx.doi.org/10.1145/3184558.3191621]
[16]
D. Sarkar, Feature engineering for text representation. In Text Analytics with Python., Apress, 2019, pp. 201-273.
[http://dx.doi.org/10.1007/978-1-4842-4354-1_4]
[17]
H. K. Kim, H. Kim, and S. Cho, "Bag-of-concepts: Comprehending document representation through clustering words in distributed representation", Neurocomputing, vol. 266, p. 336-352, 2017.
[http://dx.doi.org/10.1016/j.neucom.2017.05.046]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy