NodeFetch: High Performance Graph Processing using Processing in Memory

Mosayebi, M.; Dehyadegari, M.

doi:10.22061/jecei.2020.7453.393

تعداد نشریات	11
تعداد شماره‌ها	229
تعداد مقالات	2,338
تعداد مشاهده مقاله	3,728,188
تعداد دریافت فایل اصل مقاله	2,730,117

	NodeFetch: High Performance Graph Processing using Processing in Memory
Journal of Electrical and Computer Engineering Innovations (JECEI)
دوره 9، شماره 1، فروردین 2021، صفحه 67-74 اصل مقاله (855.99 K)
نوع مقاله: Original Research Paper
شناسه دیجیتال (DOI): 10.22061/jecei.2020.7453.393
نویسندگان
M. Mosayebi؛ M. Dehyadegari^*
Department of Computer Systems Architecture, Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran.
تاریخ دریافت: 05 اردیبهشت 1399، تاریخ بازنگری: 24 شهریور 1399، تاریخ پذیرش: 25 آبان 1399
چکیده
Background and Objectives: Graph processing is increasingly gaining attention during era of big data. However, graph processing applications are highly memory intensive due to nature of graphs. Processing-in-memory (PIM) is an old idea which revisited recently with the advent of technology specifically the ability to manufacture 3D stacked chipsets. PIM puts forward to enrich memory units with computational capabilities to reduce the cost of data movement between processor and memory system. This approach seems to be a way of dealing with large-scale graph processing, considering recent advances in the field. Methods: This paper explores real-world PIM technology to improve graph processing efficiency by reducing irregular access patterns and improving temporal locality using HMC. We propose NodeFetch, a new method to access nodes and their neighbors while processing a graph by adding a new command to HMC system. Results: Results of our simulation on a set of real-world graphs point out that the proposed idea can achieve 3.3x speed up in average and 69% reduction of energy consumption over the baseline PIM architecture which is HMC. Conclusion: Most of the techniques in the field of processing-in-memory, hire methods to reduce movement of data between processor and memory. This paper proposes a method to reduce graph processing execution time and energy consumption by reducing cache misses while processing a graph.
کلیدواژه‌ها
Graph Processing؛ Hybrid Memory Cube (HMC)؛ Processing in Memory

مراجع
[1] X. Chen, "GraphCage: Cache Aware Graph Processing on GPUs," arXiv preprint arXiv:1904.02241, 2019. [2] J.E. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin, "Powergraph: Distributed graph-parallel computation on natural graphs," in Proc. the 10th Symposium on Operating Systems Design and Implementation (OSDI): 17-30, 2012. [3] A. Fidel, N.M. Amato, L. Rauchwerger, "Kla: A new algorithmic paradigm for parallel graph computations," in Proc. 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT): 27-38, 2014. [4] S. Hong, H. Chafi, E. Sedlar, K. Olukotun, "Green-Marl: a DSL for easy and efficient graph analysis," in Proc. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems: 349-362, 2012. [5] T.J. Ham, L. Wu, N. Sundaram, N. Satish, M. Martonosi, "Graphicionado: A high-performance and energy-efficient accelerator for graph analytics," in Proc. 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO): 1-13, 2016. [6] S. Ghose, K. Hsieh, A. Boroumand, R. Ausavarungnirun, O. Mutlu, "Enabling the adoption of processing-in-memory: Challenges, mechanisms, future research directions," arXiv preprint arXiv:1802.00320, 2018. [7] M.A. Mosayebi, A.M. Hasani, M. Dehyadegari, "Enhanced graph processing in PIM accelerators with improved queue management," Microelectron. J., 94: 104637, 2019. [8] M. Zhang et al., "GraphP: Reducing communication for PIM-based graph processing with efficient data partition," in Proc. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): 544-557, 2018. [9] G. Dai et al., "Graphh: A processing-in-memory architecture for large-scale graph processing," IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 38(4): 640-653, 2018. [10] L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, H. Kim, "Graphpim: enabling instruction-level pim offloading in graph computing frameworks," in Proc. 2017 IEEE International symposium on high performance computer architecture (HPCA): 457-468, 2017. [11] H.M.C. Specification, "2.1, Nov. 2015, Hybrid Memory Cube Consortium," Tech. Rep. [12] B. Soltani Farani, H. Dorosti, M. Salehi, S. M. Fakhraie, "Ultra-low-energy dsp processor design for many-core parallel applications," JECEI,” J. Electr. Comput. Eng. Innovations (JECEI), 8(1): 71-84, 2019. [13] L. Song, Y. Zhuo, X. Qian, H. Li, Y. Chen, "GraphR: Accelerating graph processing using ReRAM," in Proc. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): 531-543, 2018. [14] G. Kim, J. Kim, J. H. Ahn, J. Kim, "Memory-centric system interconnect design with hybrid memory cubes," in Proc. the 22nd international conference on Parallel architectures and compilation techniques: 145-155, 2013. [15] J. Kim, W. Dally, S. Scott, D. Abts, "Cost-efficient dragonfly topology for large-scale systems," IEEE micro, 29(1): 33-40, 2009. [16] J. Kim, W. J. Dally, D. Abts, "Flattened butterfly: a cost-efficient topology for high-radix networks," in Proc. 34th Annual International Symposium on Computer Architecture: 126-137, 2007. [17] J. Ahn, S. Hong, S. Yoo, O. Mutlu, K. Choi, "A scalable processing-in-memory accelerator for parallel graph processing," in Proc. 42nd Annual International Symposium on Computer Architecture: 105-117, 2015. [18] D.-I. Jeon, K.-B. Park, K.-S. Chung, "HMC-MAC: Processing-in memory architecture for multiply-accumulate operations with hybrid memory cube," IEEE Comput. Archit. Lett., 17(1): 5-8, 2017. [19] A. Addisie, V. Bertacco, "Centaur: Hybrid processing in on/off-chip memory architecture for graph analytics," in Proc. 57th ACM/IEEE Design Automation Conference (DAC): 1-6, 2020. [20] S. Beamer, K. Asanović, D. Patterson, "The GAP benchmark suite," arXiv preprint arXiv:1508.03619, 2015. [21] M. Ahmad, F. Hijaz, Q. Shi, O. Khan, "Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores," in Proc. EEE International Symposium on Workload Characterization: 44-55, 2015. [22] J. Leskovec, A. Krevl, "SNAP Datasets: Stanford large network dataset collection," ed, 2014. [23] Y. Eckert, N. Jayasena, and G. H. Loh, "Thermal feasibility of die-stacked processing in memory," 2014.
آمار تعداد مشاهده مقاله: 590 تعداد دریافت فایل اصل مقاله: 846

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

آمار

NodeFetch: High Performance Graph Processing using Processing in Memory