Kabiri Rad, S., Afshin, V., Zahiri, S. H.. (1403). An Effective Heart Disease Prediction Model Using Deep Learning-based Dimensionality Reduction on Imbalanced Data. فناوری آموزش, (), 317-330. doi: 10.22061/jecei.2024.10847.742
S. Kabiri Rad; V. Afshin; S. H. Zahiri. "An Effective Heart Disease Prediction Model Using Deep Learning-based Dimensionality Reduction on Imbalanced Data". فناوری آموزش, , , 1403, 317-330. doi: 10.22061/jecei.2024.10847.742
Kabiri Rad, S., Afshin, V., Zahiri, S. H.. (1403). 'An Effective Heart Disease Prediction Model Using Deep Learning-based Dimensionality Reduction on Imbalanced Data', فناوری آموزش, (), pp. 317-330. doi: 10.22061/jecei.2024.10847.742
Kabiri Rad, S., Afshin, V., Zahiri, S. H.. An Effective Heart Disease Prediction Model Using Deep Learning-based Dimensionality Reduction on Imbalanced Data. فناوری آموزش, 1403; (): 317-330. doi: 10.22061/jecei.2024.10847.742
1Department of Computer Science, Birjand University of Technology, Birjand, Iran.
2Department of Electrical Engineering, Faculty of Engineering, University of Birjand, Birjand, Iran.
تاریخ دریافت: 05 تیر 1403،
تاریخ بازنگری: 12 مهر 1403،
تاریخ پذیرش: 24 مهر 1403
چکیده
Background and Objectives: When dealing with high-volume and high-dimensional datasets, the distribution of samples becomes sparse, and issues such as feature redundancy or irrelevance arise. Dimensionality reduction techniques aim to incorporate correlation between features and map the original features into a lower dimensional space. This usually reduces the computational burden and increases performance. In this paper, we study the problem of predicting heart disease in a situation where the dataset is large and (or) the proportion of instances belonging to one class compared to others is significantly low. Methods: We investigated three of the prominent dimensionality reduction techniques, including Principal Component Analysis (PCA), Information Bottleneck (IB) and Variational Autoencoder (VAE) on popular classification algorithms. To have adequate samples in all classes to properly feed the classifier, an efficient data balancing technique is used to compensate for fewer positives than negatives. Among all data balancing methods, a SMOTE-based method is selected, which generates new samples at the boundary of the samples distribution and avoids the synthesis of noise and redundant data. Results: The experimental results show that VAE-based method outperforms other dimensionality reduction algorithms in the performance measures. The proposed hybrid method improves accuracy to 97.1% and sensitivity to 99.2%. Conclusion: Finally, it can be concluded that the combination of VAE with oversampling algorithms can significantly enhance system performance as well as computational time.