An Effective Ensemble of Deep and Machine Learning Methods for Classifying the Expertise Shape of CQA Users

Nemati, S.

doi:10.22061/jecei.2024.10621.724

فهرست نشریات

دانشگاه تربیت دبیر شهید رجائی

انتشارات دانشگاه تربیت دبیر شهید رجائی

نشریات مستقل دانشگاه در سامانه ارزیابی نشریات علمی وزارت علوم

نشریه معماری وشهرسازی پایدار موفق به اخذ رتبه علمی-پژوهشی شد

تعداد نشریات	11
تعداد شماره‌ها	217
تعداد مقالات	2,173
تعداد مشاهده مقاله	3,124,019
تعداد دریافت فایل اصل مقاله	2,266,347

	An Effective Ensemble of Deep and Machine Learning Methods for Classifying the Expertise Shape of CQA Users
Journal of Electrical and Computer Engineering Innovations (JECEI)
مقاله 10، دوره 12، شماره 2، مهر 2024، صفحه 409-424 اصل مقاله (2.32 M)
نوع مقاله: Original Research Paper
شناسه دیجیتال (DOI): 10.22061/jecei.2024.10621.724
نویسنده
S. Nemati^*
Computer Engineering Department, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran.
تاریخ دریافت: 02 بهمن 1402، تاریخ بازنگری: 21 فروردین 1403، تاریخ پذیرش: 27 فروردین 1403
چکیده
Background and Objectives: Community question-answering (CQA) websites have become increasingly popular as platforms for individuals to seek and share knowledge. Identifying users with a special shape of expertise on CQA websites is a beneficial task for both companies and individuals. Specifically, finding those who have a general understanding of certain areas but lack expertise in other fields is crucial for companies who are planning internship programs. These users, called dash-shaped users, are willing to work for low wages and have the potential to quickly develop into skilled professionals, thus minimizing the risk of unsuccessful recruitment. Due to the vast number of users on CQA websites, they provide valuable resources for finding individuals with various levels of expertise. This study is the first of its kind to directly classify CQA users based solely on the textual content of their posts. Methods: To achieve this objective, we propose an ensemble of advanced deep learning algorithms and traditional machine learning methods for the binary classification of CQA users into two categories: those with dash-shaped expertise and those without. In the proposed method, we used the stack generalization to fuse the results of the dep and machine learning methods. To evaluate the effectiveness of our approach, we conducted an extensive experiment on three large datasets focused on Android, C#, and Java topics extracted from the Stack Overflow website. Results: The results on four datasets of the Stack Overflow, demonstrate that our ensemble method not only outperforms baseline methods including seven traditional machine learning and six deep models, but it achieves higher performance than state-of-the-art deep models by an average of 10% accuracy and F1-measure. Conclusion: The proposed model showed promising results in confirming that by using only their textual content of questions, we can classify the users in CQA websites. Specifically, the results showed that using the contextual content of the questions, the proposed model can be used for detecting the dash-shaped users precisely. Moreover, the proposed model is not limited to detecting dash-shaped users. It can also classify other shapes of expertise, such as T- and C-shaped users, which are valuable for forming agile software teams. Additionally, our model can be used as a filter method for downstream applications, like intern recommendations.
کلیدواژه‌ها
Shape of Expertise؛ Deep Learning؛ Machine Learning؛ Ensemble Method؛ Community Question Answering

مراجع
[1] P. Rostami, M. Neshati, “Intern retrieval from community question answering websites: A new variation of expert finding problem,” Expert Syst. Appl., 181: 115044, 2021. [2] S. Yuan, Y. Zhang, J. Tang, W. Hall, J. B. Cabotà, “Expert finding in community question answering: a review,” Artif. Intell. Rev., 53: 843-874, 2020. [3] A. Dargahi Nobari, S. Sotudeh Gharebagh, M. Neshati, “Skill translation models in expert finding,” in Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval: 1057-1060, 2017. [4] H. Demirkan, J. Spohrer, “T-shaped innovators: Identifying the right talent to support service innovation,” Res. Technol. Manage., 58(5): 12-15, 2015. [5] V. Kumar, N. Pedanekar, “Mining shapes of expertise in online social Q&A communities,” in Proc. the 19th ACM conference on Computer Supported Cooperative Work and Social Computing Companion: 317-320, 2016. [6] S. S. Gharebagh, P. Rostami, M. Neshati, “T-shaped mining: A novel approach to talent finding for agile software teams,” in Proc. Advances in Information Retrieval: 40th European Conference on IR Research: 411-423, 2018. [7] C. P. Maertz Jr, P. A. Stoeberl, J. Marks, “Building successful internships: lessons from the research for interns, schools, and employers,” Career Dev. Int., 19(1): 123-142, 2014. [8] X. Fu, X. Sun, H. Wu, L. Cui, J. Z. Huang, “Weakly supervised topic sentiment joint model with word embeddings,” Knowl. Based. Syst., 147: 43-54, 2018. [9] H. Wang, K. Guo, “The impact of online reviews on exhibitor behaviour: evidence from movie industry,” Enterp. Inf. Syst., 11(10): 1518-1534, 2017. [10] D. Kundu, D. P. Mandal, “Formulation of a hybrid expertise retrieval system in community question answering services,” Appl. Intell., 49: 463-477, 2019. [11] S. Sorkhani, R. Etemadi, A. Bigdeli, M. Zihayat, E. Bagheri, “Feature-based question routing in community question answering platforms,” Inf. Sci. (N Y), 608: 696-717, 2022. [12] X. Zhang et al., “Temporal context-aware representation learning for question routing,” in Proc. the 13th International Conference on Web Search and Data Mining: 753-761, 2020. [13] H. Ding, Q. Liu, G. Hu, “TDTMF: A recommendation model based on user temporal interest drift and latent review topic evolution with regularization factor,” Inf. Process. Manage., 59(5): 103037, 2022. [14] P. Rostami, A. Shakery, “A deep learning-based expert finding method to retrieve agile software teams from CQAs,” Inf. Process. Manage., 60(2): 103144, 2023. [15] K. Balog, L. Azzopardi, M. de Rijke, “A language modeling framework for expert finding,” Inf. Process. Manage., 45(1): 1-19, 2009. [16] S. Liang, M. de Rijke, “Formal language models for finding groups of experts,” Inf. Process. Manage., 52(4): 529-549, 2016. [17] D. Petkova, W. B. Croft, “Hierarchical language models for expert finding in enterprise corpora,” Int. J. Artif. Intell. Tools, 17(01): 5-18, 2008. [18] M. Bouguessa, B. Dumoulin, S. Wang, “Identifying authoritative actors in question-answering forums: the case of yahoo! answers,” in Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 866-874, 2008. [19] H. Zhu, E. Chen, H. Xiong, H. Cao, J. Tian, “Ranking user authority with relevant knowledge categories for expert finding,” World Wide Web, 17: 1081-1107, 2014. [20] A. Daud, J. Li, L. Zhou, F. Muhammad, “Temporal expert finding through generalized time topic modeling,” Knowl. Based Syst., 23(6): 615-625, 2010. [21] S. Momtazi, F. Naumann, “Topic modeling for expert finding using latent Dirichlet allocation,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 3(5): 346-353, 2013. [22] L. Yang et al., “Cqarank: jointly model topics and expertise in community question answering,” in Proc. the 22nd ACM International Conference on Information & Knowledge Management: 99-108, 2013. [23] N. Nikzad-Khasmakhi, M. Balafar, M. R. Feizi-Derakhshi, C. Motamed, “ExEm: Expert embedding using dominating set theory with deep learning approaches,” Expert Syst. Appl., 177: 114913, 2021. [24] M. Zhao, F. Javed, F. Jacob, M. McNair, “SKILL: A system for skill identification and normalization,” in Proc. the AAAI Conference on Artificial Intelligence: 4012-4017, 2015. [25] A. Azzam, N. Tazi, A. Hossny, “Text-based question routing for question answering communities via deep learning,” in Proc. the Symposium on Applied Computing: 1674-1678, 2017. [26] M. Dehghan, H. A. Rahmani, A. A. Abin, V. V. Vu, “Mining shape of expertise: A novel approach based on convolutional neural network,” Inf. Process. Manage., 57(4): 102239, 2020. [27] Z. Li, J. Y. Jiang, Y. Sun, W. Wang, “Personalized question routing via heterogeneous network embedding,” in Proc. the AAAI Conference on Artificial Intelligence: 192-199, 2019. [28] W. Tang, T. Lu, D. Li, H. Gu, N. Gu, “Hierarchical attentional factorization machines for expert recommendation in community question answering,” IEEE Access, 8: 35331-35343, 2020. [29] A. I. Kadhim, “Survey on supervised machine learning techniques for automatic text classification,” Artif. Intell. Rev., 52(1): 273-292, 2019. [30] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, D. Brown, “Text classification algorithms: A survey,” Information, 10(4): 150, 2019. [31] Q. Li et al., “A survey on text classification: From shallow to deep learning,” arXiv preprint arXiv:2008.00364, 2020. [32] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, J. Gao, “Deep learning–based text classification: a comprehensive review,” ACM Comput. Surv. (CSUR), 54(3): 1-40, 2021. [33] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014. [34] S. Yu, D. Liu, W. Zhu, Y. Zhang, S. Zhao, “Attention-based LSTM, GRU and CNN for short text classification,” J. Intell. Fuzzy Syst., 39(1): 333-340, 2020. [35] M. Zulqarnain et al., “Text classification using deep learning models: A Comparative review,” Cloud Comput. Data Sci., 5(1): 80-96, 2024. [36] A. Ezen-Can, “A comparison of LSTM and BERT for small corpus,” arXiv preprint arXiv:2009.05451, 2020. [37] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [38] S. González-Carvajal, E. C. Garrido-Merchán, “Comparing BERT against traditional machine learning text classification,” arXiv preprint arXiv:2005.13012, 2020. [39] M. E. Basiri, S. Nemati, M. Abdar, E. Cambria, U. R. Acharya, “ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis,” Future Gener. Comput. Syst., 115: 279-294, 2021. [40] S. Nemati, “Canonical correlation analysis for data fusion in multimodal emotion recognition,” in Proc. 9th International Symposium on Telecommunication: With Emphasis on Information and Communication Technology, IST 2018, 2019. [41] A. Mohammed, R. Kora, “A comprehensive review on ensemble deep learning: Opportunities and challenges,” J. King Saud Univ. Comput. Inf. Sci., 35(2): 754-774, 2023. [42] J. Wang, L. C. Yu, K. R. Lai, X. Zhang, “Dimensional sentiment analysis using a regional CNN-LSTM model,” in Proc. the 54th Annual Meeting of the Association for Computational Linguistics, (2: Short papers): 225-230, 2016. [43] S. M. Rezaeinia, R. Rahmani, A. Ghodsi, H. Veisi, “Sentiment analysis based on improved pre-trained word embeddings,” Expert Syst. Appl., 117: 139-147, 2019. [44] A. Chatterjee, U. Gupta, M. K. Chinnakotla, R. Srikanth, M. Galley, P. Agrawal, “Understanding emotions in text using deep learning and big data,” Comput. Human Behav., 93: 309-317, 2019. [45] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, “Hierarchical attention networks for document classification,” in Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 1480-1489, 2016. [46] S. Wen, J. Li, “Recurrent convolutional neural network with attention for twitter and yelp sentiment classification: ARC model for sentiment classification,” in Proc, the 2018 International Conference on Algorithms, Computing and Artificial Intelligence: 1-7, 2018. [47] G. Liu, J. Guo, “Bidirectional LSTM with attention mechanism and convolutional layer for text classification,” Neurocomput., 337: 325-338, 2019.
آمار تعداد مشاهده مقاله: 213 تعداد دریافت فایل اصل مقاله: 158

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

پیوندهای مفید

اخبار و اعلانات

آمار

An Effective Ensemble of Deep and Machine Learning Methods for Classifying the Expertise Shape of CQA Users