Persian Slang Text Conversion to Formal and Deep Learning of Persian Short Texts on Social Media for Sentiment Classification

Khazeni, M.; Heydari, M.; Albadvi, A.

doi:10.22061/jecei.2024.10745.731

نشریات مستقل دانشگاه در سامانه ارزیابی نشریات علمی وزارت علوم

نشریه معماری وشهرسازی پایدار موفق به اخذ رتبه علمی-پژوهشی شد

تعداد نشریات	11
تعداد شماره‌ها	226
تعداد مقالات	2,281
تعداد مشاهده مقاله	3,488,191
تعداد دریافت فایل اصل مقاله	2,557,163

	Persian Slang Text Conversion to Formal and Deep Learning of Persian Short Texts on Social Media for Sentiment Classification
Journal of Electrical and Computer Engineering Innovations (JECEI)
مقاله 3، دوره 13، شماره 1، فروردین 2025، صفحه 27-42 اصل مقاله (954.34 K)
نوع مقاله: Original Research Paper
شناسه دیجیتال (DOI): 10.22061/jecei.2024.10745.731
نویسندگان
M. Khazeni؛ M. Heydari^* ؛ A. Albadvi
Department of IT Engineering, Faculty of Industrial and Systems Engineering, Tarbiat Modares University, Tehran, Iran.
تاریخ دریافت: 31 فروردین 1403، تاریخ بازنگری: 18 مرداد 1403، تاریخ پذیرش: 24 مرداد 1403
چکیده
Background and Objectives: The lack of a suitable tool for the analysis of conversational texts in Persian language has made various analyzes of these texts, including Sentiment Analysis, difficult. In this research, it has we tried to make the understanding of these texts easier for the machine by providing PSC, Persian Slang Convertor, a tool for converting conversational texts into formal ones, and by using the most up-to-date and best deep learning methods along with the PSC, the sentiment learning of short Persian language texts for the machine in a better way. Methods: Be made More than 10 million unlabeled texts from various social networks and movie subtitles (as dialogue texts) and about 10 million news texts (as official texts) have been used for training unsupervised models and formal implementation of the tool. 60,000 texts from the comments of Instagram social network users with positive, negative, and neutral labels are considered as supervised data for training the emotion classification model of short texts. The latest methods such as LSTM, CNN, BERT, ELMo, and deep processing techniques such as learning rate decay, regularization, and dropout have been used. LSTM has been utilized in the research, and the best accuracy has been achieved using this method. Results: Using the official tool, 57% of the words of the corpus of conversation were converted. Finally, by using the formalizer, FastText model and deep LSTM network, the accuracy of 81.91 was obtained on the test data. Conclusion: In this research, an attempt was made to pre-train models using unlabeled data, and in some cases, existing pre-trained models such as ParsBERT were used. Then, a model was implemented to classify the Sentiment of Persian short texts using labeled data.
کلیدواژه‌ها
Natural Language Processing؛ Persian Conversational Text؛ Sentiment Analysis؛ Deep Learning

مراجع
[1] N. Armin, M. Shamsfard, “converting Persian colloquium text to formal by n-grams,” in Computer Society of Iran. for statistical machine translation, in Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP): 1724-1734, 2011. [2] M. Heydari, “Sentiment analysis challenges in persian language,” arXiv Prepr. arXiv1907.04407, 2019. [3] S. Zobeidi, M. Naderan, S. E. Alavi, “Opinion mining in Persian language using a hybrid feature extraction approach based on convolutional neural network,” Multimed. Tools Appl., 78(22): 32357-32378, 2019. [4] B. Liu, L. Zhang, “A survey of opinion mining and sentiment analysis,” in Mining Text Data, C. C. Aggarwal and C. Zhai, Eds. Boston, MA: Springer US, pp. 415-463, 2012. [5] B. Pang, L. Lee, “Opinion mining and sentiment analysis,” Found. Trends Inf. Retr., 2(1–2): 1-135, 2008. [6] G. Song, Y. Ye, X. Du, X. Huang, S. Bie, “Short text classification: a survey,” J. Multimed., 9(5): 635-643, 2014. [7] A. Naemi, M. Mansourvar, M. Naemi, B. Damirchilu, A. Ebrahimi, U. Kock Wiil, “Informal-to-formal word conversion for persian language using natural language processing techniques,” ACM Int. Conf. Proceeding Ser., 19: 1-7, 2021. [8] V. Tajalli, F. Kalantari, M. Shamsfard, "Developing an informal-formal persian corpus," arXiv preprint arXiv:2308.05336, 2023. [9] M. S. Rasooli, et al., "Automatic standardization of colloquial persian," arXiv preprint arXiv:2012.05879, 2020. [10] M. Mazoochi, et al., "Constructing colloquial dataset for persian sentiment analysis of social microblogs," arXiv preprint arXiv:2306.12679, 2023. [11] M. Adibian, S. Momtazi, "Using transformer-based neural models for converting informal to formal text in persian," Lang. Ling., 18(35): 47-69, 2022. [12] Z. Bokaee Nezhad, M. A. Deihimi, "Sarcasm detection in persian," J. Inf. Commun. Technol., 20(1): 1-20, 2021. [13] P. Golazizian et al., "Irony detection in Persian language: A transfer learning approach using emoji prediction," in Proc. Twelfth Language Resources and Evaluation Conference, 2020. [14] M. Mirzarezaee, M. M. Pedram, "Improving polarity identification in sentiment analysis using sarcasm detection and machine learning algorithms in persian tweets," J. Inf. Commun. Technol. 53(14): 14-23, 2023. [15] F. Najafi-Lapavandani, M. H. Shirali-Shahreza, “Humor detection in persian: a transformers-based approach,” Int. J. Inf. Commun. Technol. Res., 15(1): 56-62, 2023. [16] A. K. Sharma, S. Chaurasia, D. K. Srivastava, “Sentimental short sentences classification by using CNN deep learning model with fine tuned Word2Vec,” Procedia Comput. Sci., 167: 1139-1147, 2020. [17] P. F. Muhammad, R. Kusumaningrum, A. Wibowo, “Sentiment analysis using Word2vec and long short-term memory (LSTM) for Indonesian hotel reviews,” Procedia Comput. Sci., 179: 728-735, 2021. [18] L. Ouchene, S. Bessou, “FastText embedding and LSTM for sentiment analysis: An empirical study on algerian tweets,” in Proc. 2023 International Conference on Information Technology (ICIT): 51-55, 2023. [19] A. Patel, A. Kapoor, M. Mahato, S. Raut, B. B. Sinha, “Enhancing rumour detection: A hybrid deep learning approach with ELMo embeddings & CNN,” in Proc. 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), 2: 1-6, 2024. [20] M. Farahani, M. Gharachorloo, M. Farahani, M. Manthouri, “ParsBERT: Transformer-based model for persian language understanding,” ArXiv, vol. abs/2005.1, 2020. [21] T. Pires, E. Schlinger, D. Garrette, “How multilingual is multilingual BERT?,” arXiv Prepr. arXiv1906.01502, 2019. [22] S. Mihi, B. Ait Benali, N. Laachfoubi, “Automatic sarcasm detection in Arabic tweets: resources and approaches,” J. Intell. & Fuzzy Syst., 45(6): 9483-9497, 2023. [23] C. I. Eke, A. A. Norman, L. Shuib, “Context-based feature technique for sarcasm identification in benchmark datasets using deep learning and BERT model,” IEEE Access, 9: 48501-48518, 2021. [24] F. Shatnawi, M. Abdullah, M. Hammad, M. Al-Ayyoub, “Comprehensive study of pre-trained language models: detecting humor in news headlines,” Soft Comput., 27(5): 2575-2599, 2023. [25] I. Annamoradnejad, G. Zoghi, “ColBERT: Using BERT sentence embedding in parallel neural networks for computational humor,” Expert Syst. Appl., 249: 123685, 2024. [26] S. M. Sadjadi, Z. Rajabi, L. Rabiei, M. S. Moin, “FarSSiBERT: A novel transformer-based model for semantic similarity measurement of persian social networks informal texts,” arXiv Prepr. arXiv2407.19173, 2024. [27] P. Falakaflaki, M. Shamsfard, “Formality style transfer in persian,” arXiv Prepr. arXiv2406.00867, 2024. [28] S. M. S. Dashti, A. Khatibi Bardsiri, M. Jafari Shahbazzadeh, “PERCORE: A deep learning-based framework for persian spelling correction with phonetic analysis,” Int. J. Comput. Intell. Syst., 17(1): 1-23, 2024. [29] E. Kebriaei et al., “Persian offensive language detection,” Mach. Learn., 113(7): 4359-4379, 2024. [30] Y. Z. Vakili, A. Fallah, S. Zakeri, “Enhancing sentiment analysis of persian tweets: A transformer-based approach,” in Proc. 10th International Conference on Web Research (ICWR): 226-230, 2024.
آمار تعداد مشاهده مقاله: 723 تعداد دریافت فایل اصل مقاله: 330

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

پیوندهای مفید

اخبار و اعلانات

آمار

Persian Slang Text Conversion to Formal and Deep Learning of Persian Short Texts on Social Media for Sentiment Classification