Presenting a Model of Data Anonymization in Big Data in the Context of In-Memory Processing Framework

Shamsinejad, E.; Banirostam, T.; Pedram, M. M.; Rahmani, A. M.

doi:10.22061/jecei.2023.9737.651

تعداد نشریات	15
تعداد شماره‌ها	239
تعداد مقالات	2,430
تعداد مشاهده مقاله	4,048,195
تعداد دریافت فایل اصل مقاله	2,939,954

	Presenting a Model of Data Anonymization in Big Data in the Context of In-Memory Processing Framework
Journal of Electrical and Computer Engineering Innovations (JECEI)
مقاله 6، دوره 12، شماره 1، فروردین 2024، صفحه 79-98 اصل مقاله (1.43 M)
نوع مقاله: Original Research Paper
شناسه دیجیتال (DOI): 10.22061/jecei.2023.9737.651
نویسندگان
E. Shamsinejad¹؛ T. Banirostam^* ¹؛ M. M. Pedram²؛ A. M. Rahmani³
¹Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran, Iran.
²Electrical and Computer Engineering Department, Kharazmi University, Tehran, Iran.
³Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran.
تاریخ دریافت: 30 فروردین 1402، تاریخ بازنگری: 14 تیر 1402، تاریخ پذیرش: 01 مرداد 1402
چکیده
Background and Objectives: Nowadays, with the rapid growth of social networks extracting valuable information from voluminous sources of social networks, alongside privacy protection and preventing the disclosure of unique data, is among the most challenging objects. In this paper, a model for maintaining privacy in big data is presented. Methods: The proposed model is implemented with Spark in-memory tool in big data in four steps. The first step is to enter the raw data from HDFS to RDDs. The second step is to determine m clusters and cluster heads. The third step is to parallelly put the produced tuples in separate RDDs. the fourth step is to release the anonymized clusters. The suggested model is based on a K-means clustering algorithm and is located in the Spark framework. also, the proposed model uses the capacities of RDD and Mlib components. Determining the optimized cluster heads in each tuple's content, considering data type, and using the formula of the suggested solution, leads to the release of data in the optimized cluster with the lowest rate of data loss and identity disclosure. Results: Using Spark framework Factors and Optimized Clusters in the K-means Algorithm in the proposed model, the algorithm implementation time in different megabyte intervals relies on multiple expiration time and purposeful elimination of clusters, data loss rates based on two-level clustering. According to the results of the simulations, while the volume of data increases, the rate of data loss decreases compared to FADS and FAST clustering algorithms, which is due to the increase of records in the proposed model. with the formula presented in the proposed model, how to determine the multiple selected attributes is reduced. According to the presented results and 2-anonomity, the value of the cost factor at k=9 will be at its lowest value of 0.20. Conclusion: The proposed model provides the right balance for high-speed process execution, minimizing data loss and minimal data disclosure. Also, the mentioned model presents a parallel algorithm for increasing the efficiency in anonymizing data streams and, simultaneously, decreasing the information loss rate.
کلیدواژه‌ها
Big Data؛ Anonymity؛ Confidentiality؛ Data Disclosure؛ Privacy

مراجع
[1] Zhao, H. Jiang, C. Wang, H. Huang, G. Liu, Y. Yang, "On the performance of k-anonymity against inference attacks with background information," IEEE Internet Things J., 6(1): 808-819, 2019. [2] Sangeetha, G. Sudha Sadasivam, Handbook of Big Data and IOT Security, first ed., Springer, Switzerland, 2019. [3] Patnaik, New Paradigm of Industry 4.0: Internet of Things, Big Data & Cyber Physical Systems, first ed., Springer, Switzerland, 2019. [4] Chaudhary, Ch. Choudhary, M. Kumar Gupta, Ch. Lal, T. Badal, Microservices in Big Data Analytics, first ed., Springer, Singapore, 2019. [5] Zhang, Ch. Liu, S. Nepal, Ch. Yang, J. Chen, Security, Privacy and Trust in Cloud Systems, first ed., Springer, Berlin, 2013. [6] Salas, J. Domingo-Ferrer, "Some basics on privacy techniques, anonymization and their big data challenges," Math. Comput. Sci., 12: 263–274, 2018. [7] Victor, D. Lopez, "Privacy models for big data: A survey," J. Big Data Intel., 3: 61-75, 2016. [8] K-K. Raymond Choo, A. Dehghantanha, Handbook of Big Data Privacy, Springer, Switzerland, 2020. [9] Al-Zobbi, S. Shahrestani, Ch. Ruan, "Improving mapreduce privacy by implementing multi-dimensional sensitivity-based anonymization", J. Big Data., 4(1): 1-23, 2017. [10] Luan Hou, X. Kun Huang, Ch. Qun Fei, Sh. Han Zhang, Y. Yang Li, Q. Lin Sun, Ch. Qing Wang, "A survey of text summarization approaches based on deep learning," J. Comput. Sci. Technol., 36: 633-663, 2021. [11] B Mehta, P. Rao U, "Toward scalable anonymization for privacy-preserving big data publishing," Adv. Intel. Syst. Comput., 2: 297-304, 2018. [12] Zheng, Z. Wang, T. Lv, Y. Ma, C. Jia, "K-Anonymity algorithm based on improved clustering," ICA3PP, 11335: 462-476, 2018. [13] Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, McCauley, M. J. Franklin, S. Shenker, I. Stoica, "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," NSDI'12, 15-28, 2012. [14] Ram Mohan Rao, S. Murali Krishna, A. P. Siva Kumar, "Privacy preservation techniques in big data analytics: A survey," J. Big Data., 5: 1-12, 2018. [15] Khan, Kh. Iqbal, S. Faizullah, M. Fahad, J. Ali, W. Ahmed, "Clustering based privacy preserving of big data using fuzzification and anonymization operation," Int. J. Adv. Comput. Sci. Appl. (IJACSA), 10(12): 282-289, 2019. [16] Dobson, K. Roy, X. Yuan, J. Xu, "Performance Evaluation of machine learning algorithms in apache spark for intrusion detection," in Proc. International Telecommunication Networks and Applications Conference (ITNAC), 127:1-6, 2018. [17] Ullah Bazai, J. Jang-Jaccard, "SparkDA: RDD-Based high-performance data anonymization technique for spark platform," in Proc. International Conference on Network and System Security, 11928: 646-662, 2019. [18] Canbay, S. Sagiroglu, "Big data anonymization with spark, in Proc. International Conference on Computer Science and Engineering, (UBMK): 833-838, 2017. [19] Al-Zobbi, S. Shahrestani, Ch. Ruan, "Experimenting sensitivity-based anonymization framework in apache spark," J. Big Data., 5: 1-26, 2018. [20] Mittal, V. E. Balas, L. Mohan Goyal, R.Kumar, Big Data Processing Using Spark in Cloud, first ed., Springer, Singapore, 2019. [21] He, H. Cai, "Latent-Data privacy preserving with customized data utility for social network data," IEEE Trans. Veh. Technol., 67(1): 665-673, 2018. [22] Matturdi, X. Zhou, S. Li, F. Lin "Big data security and privacy: a review," China Commun., 11(14): 135-145, 2014. [23] Ouazzani, H. Bakkali, "A new technique ensuring privacy in big data: k-anonymity without prior value of the threshold k," Procedia Comput. Sci., 127: 52-59, 2018. [24] Fei, S. Li, H. Dai, C. Hu, W. Dou, Q. Ni, "A k-anonymity based schema for location privacy preservation," IEEE Trans. Sustainable Comput., 4(2): 156-167, 2019. [25] Canbay, Y. Vural, S. Sagiroglu, “Privacy Preserving Big Data,” in Proc. International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), 24-29, 2018. [26] Kayem, C. T. Vester, Ch. Meinel, "Automated k-anonymization and l-diversity for shared data privacy," in Proc. International Conference on Database and Expert Systems Applications (DEXA), 9827: 105-120, 2016. [27] Shish Patel, S. Priyanka, "Online analytical processing for business intelligence in big data," J. Big Data, 8(6): 501-518, 2020. [28] R. Macwan, S. J. Patel, "k-NMF anonymization in social network data publishing," Secur. Comput. Syst. Networks Comput., 61(4): 601–613, 2018. [29] Reiza, M. A. Armengol de la Hoz, M. S. Garcíaa, "Big data analysis and machine learning in intensive care units," Med. Intensiva, 43(7): 416-426, 2019. [30] Novotny, P. A. Bilokon, A. Galiotos, F. Deleze, Machine Learning and Big Data with kdb+/q, first ed., Wiley, London, 2020. [31] Bowles, Machine Learning with Spark and Python, Second ed., John Wiley & Sons., Indianapolis, 2020. [32] Wang., Zh. Cai, Y. Li, D. Yang, L. Li, H. Gao, "Protecting query privacy with differentially private k-anonymityin location-based services," Pers. Ubiquitous Comput., 22: 453–469, 2018. [33] Arbuckle, Kh. El Emam, Building an Anonymization Pipeline, first ed., O'Reilly Media, California, 2020. [34] Ram Prasad Reddy, K. V.S.V.N. Raju, V. Valli Kumari, "Personalized privacy preserving incremental data dissemination through optimal generalization," J. Eng. Appl. Sci., 13(11): 4205–4216, 2018. [35] Domingo-Ferrer, "Big data anonymization requirements vs privacy models," in Proc. International Conference on E-Business and Telecommunication Networks (ICETE), 2: 305-312, 2018. [36] A Abdelhameed, Sh. M Moussa, M. E Khalifa, "Restricted sensitive attributes-based sequential anonymization (RSA-SA) approach for privacy-preserving data stream publishing," Knowledge-Based Syst., 164: 1-20, 2019. [37] Canbay, A. Kalyoncu, M. Ercimen, A. Dogan, S. Sagiroglu, "A clustering based anonymization model for big data," in Proc. International Conference on Computer Science and Engineering (UBMK): 720-725, 2019. [38] Tekli, B. Al Bouna, Y. Bou Issa, M. Kamradt, R. Haraty, "(k, l)-clustering for transactional data streams anonymization," International Conference on Information Security Practice and Experience (ISPEC), 11125: 544-556, 2018. [39] Jain, M. Gyanchandani, N. Khare, "Improved k-anonymity privacy-preserving algorithm using madhya pradesh state election commission big data," Commun. Security, Stud. Comput. Intel., 771: 1-10, 2019. [40] Guo, Q. Zhang, "Fast clustering-based anonymization approaches with time constraints for data streams," J. Software, 24: 1852-1867, 2014. [41] Wang, Zh. Chi, X. Tong, L. Li, "A differentially k-anonymity-based location privacy-preserving for mobile crowdsourcing systems," Procedia Comput. Sci., 129: 28-34, 2018. [42] Eyupoglu, M. Aydin, A. Zaim, A. Sertbas, “An Efficient big data anonymization algorithm based on chaos and perturbation techniques," Entropy, 20(5): 1-18, 2018. [43] Nezarat, Kh. Yavari, "A distributed method based on mondrian algorithm for big data anonymization," in Proc. International Congress on High-Performance Computing and Big Data Analysis (HPC), 891: 84–97, 2019. [44] Silva, T. Basso, R. Moraes, D. Elia, S. Fior, "A re-identification risk-based anonymization framework for data analytics platforms," in Proc. European Dependable Computing Conference (EDCC): 101-106, 2018. [45] Abouelmehdi, A. Beni-Hessane, H. Khaloufi, "Big healthcare data: Preserving security and privacy," J. Big Data, 5: 1-18, 2018. [46] Domingo-Ferrer, J. Soria-Comas, "Anonymization in the Time of Big Data," International Conference on Privacy in Statistical Databases (PSD), 9867: 57–68, 2016. [47] Ghavami, Big Data Analytics Methods: Analytics Techniques in Data Mining, Deep Learning and Natural Language Processing, second ed., De Gruyter, Berlin, 2020. [48] Z. Zgurovsky, Y. P. Zaychenko, Big Data: Conceptual Analysis and Applications, first ed., Springer Nature, Switzerland, 2020. [49] Kumar Mishra, X. She Yang, A. Unal, Data Science and Big Data Analytics: ACM-WIR 2018 (Lecture Notes on Data Engineering and Communications Technologies, 16), first ed., Springer, Singapore, 2019. [50] info at the University of Massachusetts Amherst {Datasets Adult}. [51] Kiabod, M. N. Dehkordi, B. Barekatain, “TSRAM: A Time-Saving k-degree Anonymization Method in Social Network,” Expert Syst. Appl., 125: 378-396, 2019. [52] Otgonbayar, Z. Pervez, K. Dahal, S. Eager, "K-VARP: k-anonymity for varied data streams via partitioning," Inf. Sci., 467: 238-255, 2018. [53] Kaur, S. Agrawal, "Differential privacy framework: impact of quasi-identifiers on anonymization," in Proc. 2nd International Conference on Communication, Computing and Networking, 46: 35–42, 2018. [54] Wang, Z. Cai, Y. Li, D. Yang, J. Li, "Protecting query privacy with differentially private k-anonymity in location-based services," Pers. Ubiquitous Comput., 22: 453–469, 2018. [55] N. Yang, Sh. L. Peng, L. C. Jain, Security with Intelligent Computing and Big-data Services, first ed., Springer Switzerland, 2020. [56] Oneto, N. Navarin, A. Sperduti, D. Anguita, Recent Advances in Big Data and Deep Learning. Springer, Genova, 2020. [57] Andrew, J. Karthikeyan, Privacy-Preserving Big Data Publication: (K, L) Anonymity, Advances in Intelligent Systems and Computing (AISC), 67: 77–88, 2020. [58] info at the University of Massachusetts Amherst {Datasets Bank and Marketing}. [59] Banirostam, H. Banirostam, M. M. Pedram, A. M. Rahamni, "A review of fraud detection algorithms for electronic payment card transactions," J. Adv. Comput. Eng. Technol., 7(3): 157-166, 2021. [60] Banirostam, E. Shamsinezhad T. Banirostam, "Functional control of users by biometric behavior features in cloud computing," in Proc. International Conference on Intelligent Systems, Modelling and Simulation: 94-98, 2013. [61] Banirostam, A. Hedayati, A. Khadem Zadeh, E. Shamsinezhad, "A trust based approach for increasing security in cloud computing infrastructure," in Proc. UKSim-International Conference on Computer Modeling and Simulation: 717-721, 2013. [62] Banirostam, A. R. Hedayati, A. Khadem Zadeh, "Using virtualization technique to increase security and reduce energy consumption in cloud computing," Int. J. Res. Comput. Sci., 4(2): 25-30, 2014. [63] Shamsinezhad, A. Shahbahrami, A. Hedayati, A. Khadem Zadeh, H. Banirostam, "Presentation methods for task migration in cloud computing by combination of Yu router and post-copy," Int. J. Comput. Sci. Issues (IJCSI), 10(4): 98-102, 2013. [64] Banirostam, E. Shamsinejad, M. M. Pedram, A. M. Rahamni, "A review of anonymity algorithms in big data," J. Adv. Comput. Eng. Technol., 7(3): 187-196, 2021. [65] El Ouazzani, H. El Bakkali, "A new technique ensuring privacy in big data: k-anonymity without prior value of the threshold k," in Proc. 1th International Conference On Intelligent Computing in Data Sciences, 127: 52-59, 2018. [66] Raj, R. G L D’Souza, "Big data anonymization in cloud using k-anonymity algorithm using map reduce framework," Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., 5(1): 50-56, 2019. [67] Jain, M. Gyanchandani, N. Khare, " Improved k-anonymize and l-diverse approach for privacy preserving big data publishing using MPSEC dataset," Comput. Inf., 39(3): 537–567, 2020.
آمار تعداد مشاهده مقاله: 1,122 تعداد دریافت فایل اصل مقاله: 591

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

آمار

Presenting a Model of Data Anonymization in Big Data in the Context of In-Memory Processing Framework