|تعداد مشاهده مقاله||2,477,463|
|تعداد دریافت فایل اصل مقاله||1,746,090|
|Journal of Electrical and Computer Engineering Innovations (JECEI)|
|مقاله 12، دوره 6، شماره 2، مهر 2018، صفحه 251-271 اصل مقاله (8.84 M)|
|نوع مقاله: Original Research Paper|
|شناسه دیجیتال (DOI): 10.22061/jecei.2019.5243.206|
|I. Behravan1؛ S.H. Zahiri* 2؛ S.M. Razavi3؛ R. Trasarti4|
|1Department of Electrical Engineering, PhD student, University of Birjand, email@example.com|
|2Department of Electrical Engineering, Faculty of Engineering, University of Birjand, firstname.lastname@example.org|
|3Department of Electrical Engineering, Faculty of Engineering, University of Birjand,|
|4KDD lab, ISTI-CNR, Pisa, Italy, email@example.com|
|تاریخ دریافت: 19 تیر 1396، تاریخ بازنگری: 25 بهمن 1396، تاریخ پذیرش: 31 اردیبهشت 1397|
|Background and Objectives: Big data referred to huge datasets with high number of objects and high number of dimensions. Mining and extracting big datasets is beyond the capability of conventional data mining algorithms including clustering algorithms, classification algorithms, feature selection methods and etc.|
Methods: Clustering, which is the process of dividing the data points of a dataset into different groups (clusters) based on their similarities and dissimilarities, is an unsupervised learning method which discovers useful information and hidden patterns from raw data. In this research a new clustering method for big datasets is introduced based on Particle Swarm Optimization (PSO) algorithm. The proposed method is a two-stage algorithm which first searches the solution space for proper number of clusters and then searches to find the position of the centroids.
Results: the performance of the proposed method is evaluated on 13 synthetic datasets. Also its performance is compared to X-means through calculating two evaluation metrics: Rand index and NMI index. The results demonstrate the superiority of the proposed method over X-means for all of the synthetic datasets. Furthermore, a biological microarray dataset is used to evaluate the proposed method deeper. Finally, 2 real big mobility datasets, including the trajectories traveled by several cars in the city of Pisa, are analyzed using the proposed clustering method. The first dataset includes the trajectories recorded in Sunday and the second one contains the trajectories recorded in Monday during 5 weeks. The achieved results showed that people choose more diverse destinations in Sunday although it has fewer trajectories.
Conclusion: Finding the number of clusters is a big challenge especially fir big datasets. The results achieved for the proposed method showed its fabulous performance in detecting the number of clusters for high dimensional and massive datasets. Also, the results demonstrate the power and effectiveness of the swarm intelligence methods in solving hard and complex optimization problems.
©2018 The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.
|Big data clustering؛ Bobility dataset؛ K-means؛ Swarm intelligence؛ Particle swarm optimization|
 A. Abraham, S. Das, S. Roy, “Swarm intelligence algorithms for data clustering,” in Soft Computing for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach, Eds., ed Boston, MA: Springer US: 279-313, 2008,
 G. Krishnasamy, A. J. Kulkarni, R. Paramesran, “A hybrid approach for data clustering based on modified cohort intelligence and K-means,” Expert Systems with Applications, 41(13): 6009-6016, 2014.
 S. H. Razavi, E. O. M. Ebadati, S. Asadi, H. Kaur, “An efficient grouping genetic algorithm for data clustering and big data analysis,” in Computational Intelligence for Big Data Analysis, ed: Springer: 119-142, 2015.
 M. Fahad, F. Aadil, Z. u. Rehman, S. Khan, P. A. Shah, K. Muhammad, et al., “Grey wolf optimization based clustering algorithm for vehicular ad-hoc networks,” Computers & Electrical Engineering, 70(1): 853-870, 2018.
 D. Pelleg, A. W. Moore, “X-means: Extending k-means with efficient estimation of the number of clusters,” in Proc. 17th International Conf. on Machine Learning Citations (ICML), 1(1): 727-734, 2000.
 H. Aidos, R. P. Duin, A. L. Fred, “The area under the ROC curve as a criterion for clustering evaluation," in Proc. 2nd International Conference on Pattern Recognition Applications and Methods (ICPRAM): 276-280, 2013.
 P. Fränti, R. Mariescu-Istodor, C. Zhong, “XNN graph,” in Proc. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR): 207-217, 2016.
 B. D. Lehmann, J. A. Bauer, X. Chen, M. E. Sanders, A. B. Chakravarthy, Y. Shyr, et al., “Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies,” The Journal of Clinical Investigation, 121: 2750-2767, 2011.
تعداد مشاهده مقاله: 629
تعداد دریافت فایل اصل مقاله: 467