Porting of Finite Element Integration Algorithm to Xeon Phi Coprocessor-based HPC Architectures

Filip Krużel; Krzysztof Banaś; Mauro Iacomo

doi:10.24423/cames.578

Authors

Filip Krużel Cracow University of Technology, Kraków, Poland
Krzysztof Banaś AGH University of Science and Technology, Kraków, Poland http://orcid.org/0000-0002-4045-1530
Mauro Iacomo The University of Campania Luigi Vanvitelli, Caserta, Italy http://orcid.org/0000-0002-2089-975X

Abstract

In the present article, we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor was an extension of the many-core specialized unit for calculations, and its performance was comparable with the corresponding GPUs. Its main advantages were the built-in 512-bit vector registers and the ease of transferring existing codes from traditional x86 architectures. In the article, we move the code developed for a standard CPU to the coprocessor. We compare its performance with our OpenCL implementation of the numerical integration algorithm, previously developed for GPUs. The GPU code is tuned to fit into a coprocessor by our auto-tuning mechanism. Tests included two types of tasks to solve, using two types of approximation and two types of elements. The obtained timing results allow comparing the performance of highly optimized CPU and GPU codes with a Xeon Phi coprocessor performance. This article answers whether such massively parallel architectures perform better using the CPU or GPU programming method. Furthermore, we have compared the Xeon Phi architecture and the latest available Intel’s i9 13900K CPU when writing this article. This comparison determines if the old Xeon Phi architecture remains competitive in today’s computing landscape. Our findings provide valuable insights for selecting the most suitable hardware for numerical computations and the appropriate algorithmic design.

Keywords:

CPU, optimization, parallelization, vectorization, Intel Xeon Phi

References

1. N.M. Atallah, C. Canuto, G. Scovazzi, The second-generation shifted boundary method and its numerical analysis, Computer Methods in Applied Mechanics and Engineering, 372(1): 113341, 2020, https://doi.org/10.1016/j.cma.2020.113341

2. K. Banaś, F. Krużel, Comparison of Xeon Phi and Kepler GPU performance for finite element numerical integration, [in:] High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on, pp. 145–148, Aug 2014, https://doi.org/10.1109/HPCC.2014.27

3. K. Banaś, F. Krużel, OpenCL performance portability for Xeon Phi coprocessor and NVIDIA GPUs: A case study of finite element numerical integration, [in:] Euro-Par 2014: Parallel Processing Workshops, volume 8806 of Lecture Notes in Computer Science, pp. 158–169, Springer International Publishing, 2014, https://doi.org/10.1007/978-3-319-14313-2_14

4. K. Banaś, F. Krużel, J. Bielański, K. Chłoń, A comparison of performance tuning process for different generations of NVIDIA GPUs and an example scientific computing algorithm, [in:] R. Wyrzykowski, J. Dongarra, E. Deelman, K. Karczewski [Eds.], Parallel Processing and Applied Mathematics, pp. 232–242, Cham, Springer International Publishing, 2018, https://doi.org/10.1007/978-3-319-78024-5_21

5. K. Banaś, F. Krużel, J. Bielański, Optimal kernel design for finite element numerical integration on GPUs, Computing in Science and Engineering, 22(6): 61–74, 2020, https://doi.org/10.1109/MCSE.2019.2940656

6. E.B. Becker, G.F. Carey, J.T. Oden, Finite Elements. An Introduction, Prentice Hall, Englewood Cliffs, 1981, https://doi.org/10.1002/nme.1620180613

7. L. Buatois, G. Caumon, B. Levy, Concurrent number cruncher: A GPU implementation of a general sparse linear solver, International Journal of Parallel, Emergent and Distributed Systems, 24(3): 205–223, 2009, https://doi.org/10.1080/17445760802337010

8. F.L. Cabral, C. Osthoff, G.P. Costa, D. Brandao, M. Kischinhevsky, S.L. Gonzaga de Oliveira, Tuning Up TVD HOPMOC Method on Intel MIC Xeon Phi Architectures with Intel Parallel Studio Tools, [in:] 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 19–24, 2017, https://doi.org/10.1109/SBAC-PADW.2017.12

9. P.G. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam, 1978, https://doi.org/10.1137/1.9780898719208

10. B. Cockburn, G. Karniadakis, C. Shu [Eds.], Discontinuous Galerkin Methods: Theory, Computation and Applications, Vol. 11 of Lecture Notes in Computational Science and Engineering, Springer, Berlin, 2000, https://doi.org/10.1007/978-3-642-59721-3

11. B. Cockburn, G. Karniadakis, C. Shu, The development of discontinuous Galerkin methods, [in:] Discontinuous Galerkin Methods: Theory, Computation and Applications, Vol. 11 of Lecture Notes in Computational Science and Engineering, pp. 1–14, Springer, Berlin, 2000, https://doi.org/10.1007/978-3-642-59721-3_1

12. B. Cockburn, C.W. Shu, The local discontinuous Galerkin finite element method for convection diffusion systems, SIAM Journal on Numerical Analysis, 35: 2440–2463, 1998, https://doi.org/10.1137/S0036142997316712

13. I. Cutress, Intel’s Xe for HPC: Ponte Vecchio with Chiplets, EMIB, and Foveros on 7nm, Coming 2021, AnandTech, 2019.

14. R. Devine, Intel Core i9-13900K review: Retaking the performance crown for team blue, XDA Developers, 2022.

15. J. Dongarra, Frequently Asked Questions on the Linpack Benchmark and Top500, 2007.

16. J. Fang, A.L. Varbanescu, H. Sips, L. Zhang, Y. Che, Ch. Xu, Benchmarking Intel Xeon Phi to guide kernel design, 2013.

17. M. Geveler, D. Ribbrock, D. Göddeke, P. Zajac, S. Turek, Towards a complete FEM-based simulation toolkit on GPUs: Unstructured grid finite element geometric multigrid solvers with strong smoothers based on sparse approximate inverses, Computers & Fluids, 80: 327–332, 2013, https://doi.org/10.1016/j.compfluid.2012.01.025

18. D. Göddeke, H. Wobker, R. Strzodka, J. Mohd-Yusof, P. McCormick, S. Turek, Co–processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU, International Journal of Computational Science and Engineering, 4(4): 254–269, 2009, https://doi.org/10.1504/IJCSE.2009.029162

19. R. Goodwins, Intel unveils many-core Knights platform for HPC, ZdNet, 2010.

20. A. Howes, L. Munshi, The OpenCL Specification, Khronos OpenCL Working Group, 2014, version 2.0, revision 26.

21. Intel, OpenCL Design and Programming Guide for the Intel Xeon Phi Coprocessor, Intel Corporation, 2014.

22. Intel, Intel C++Compiler 16.0 User and Reference Guide, Intel Corporation, 2015.

23. Intel, Dane techniczne produktu [Intel products specifications], Intel Corporation, 2017.

24. Intel, Intel Unveils New GPU Architecture with High-Performance Computing and AI Acceleration, and oneAPI Software Stack with Unified and Scalable Abstraction for Heterogeneous Architectures, Intel Newsroom, 2019.

25. Intel, Product change notification 116378 – 00, July 23, 2018.

26. J. Jeffers, J. Reinders, Intel Xeon Phi Coprocessor High Performance Programming, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2013.

27. C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method, Cambridge University Press, 1987, https://doi.org/10.1007/BF00046566

28. Y. Kallinderis, Adaptive hybrid prismatic-tetrahedral grids, International Journal for Numerical Methods in Fluids, 20: 1023–1037, 1995, https://doi.org/10.1002/fld.1650200820

29. F. Krużel, Vectorized implementation of the FEM numerical integration algorithm on a modern CPU, [in:] European Conference for Modelling and Simulation, Vol. 33, pp. 414–420, 2019, https://doi.org/10.7148/2019-0414

30. F. Krużel, K. Banaś, Finite element numerical integration on PowerXCell processors, [in:] PPAM’09: Proceedings of the 8th International Conference on Parallel Processing and Applied Mathematics, pp. 517–524, Berlin, Heidelberg, Springer-Verlag, 2010, https://doi.org/10.1007/978-3-642-14390-8_54

31. F. Krużel, K. Banaś, Vectorized OpenCL implementation of numerical integration for higher order finite elements, Computers and Mathematics with Applications, 66(10): 2030–2044, 2013, https://doi.org/10.1016/j.camwa.2013.08.026

32. F. Krużel, K. Banaś, Finite element numerical integration on Xeon Phi coprocessor, [in:] M. Paprzycki M. Ganzha, L. Maciaszek [Eds.], Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, Vol. 2 of Annals of Computer Science and Information Systems, pp. 603–612, IEEE, 2014, https://doi.org/10.15439/2014F222

33. F. Krużel, K. Banaś, AMD APU systems as a platform for scientific computing, Computer Methods in Materials Science, 15(2): 362–369, 2015.

34. F. Krużel, K. Banaś, M. Nytko, Implementation of numerical integration to high-order elements on the GPUs, Computer Assisted Methods in Engineering and Science, 27(1): 3–26, 2020, https://doi.org/10.24423/cames.264

35. F. Krużel, M. Nytko, Intel Iris Xe-LP as a platform for scientific computing, [in:] M. Ganzha [Ed.], Communication Papers of the 17th Conference on Computer Science and Intelligence Systems, September 4–7, 2022, Sofia, Bulgaria, Vol. 32 [in:] Annals of Computer Science and Information Systems, pp. 121–128, Warszawa, PTI, 2022 https://doi.org/10.15439/2022F132

36. J.N. Lyness, Quadrature methods based on complex function values, Mathematics of Computation, 23(107): 601–619, 1969, https://doi.org/10.2307/2004388

37. J. Mamza, P. Makyla, A. Dziekoński, A. Lamecki, M. Mrozowski, Multi-core and multiprocessor implementation of numerical integration in Finite Element Method, [in:] Microwave Radar and Wireless Communications (MIKON), 2012 19th International Conference, Vol. 2, pp. 457–461, 2012, https://doi.org/10.1109/MIKON.2012.6233633

38. J.D. McCalpin, Memory bandwidth and machine balance in current high performance computers, IEEE Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995.

39. K. Michalik, K. Banaś, P. Płaszewski, P. Cybułka, ModFEM – a computational framework for parallel adaptive finite element simulations, Computer Methods in Materials Science, 13(1): 3–8, 2013.

40. S. Muralikrishnan, M.-B. Tran, T. Bui-Thanh, An improved iterative HDG approach for partial differential equations, Journal of Computational Physics, 367: 295–321, 2018, https://doi.org/10.1016/j.jcp.2018.04.033

41. S. Naik, Best Known Method: Estimating FLOP/s for workloads running on the Intel Xeon Phi coprocessor using Intel VTune Amplifier XE, September 2013.

42. T. Olas, W.K. Mleczko, R.K. Nowicki, R. Wyrzykowski, A. Krzyzak, Adaptation of RBM Learning for Intel MIC Architecture, [in:] L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, A.L. Zadeh, M.J. Zurada [Eds.], Artificial Intelligence and Soft Computing: Proceedings of the 14th International Conference. ICAISC 2015. Part I, Zakopane, Poland, June 14–18, pp. 90–101, Cham, Springer International Publishing, 2015, https://doi.org/10.1007/978-3-319-19324-3_9

43. OpenMP Architecture Review Board, OpenMP Application Programming Interface, version 4.5 edition, November 2015.

44. F. Roth, System Administration for the Intel Xeon Phi Coprocessor, Intel Corporation, 2013.

45. S. Rul, H. Vandierendonck, J. D’Haene, K. De Bosschere, An experimental study on performance portability of OpenCL kernels, [in:] Application Accelerators in High Performance Computing, 2010 Symposium, p. 3, Knoxville, TN, USA, 2010.

46. W.C. Schneck, E.D. Gregory, C.A.C. Leckey, Optimization of elastodynamic finite integration technique on Intel Xeon Phi Knights Landing processors, Journal of Computational Physics, 374: 550–562, 2018, https://doi.org/10.1016/j.jcp.2018.07.049

47. L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, Larrabee: a manycore x86 architecture for visual computing, SIGGRAPH 08: ACM SIGGRAPH 2008 papers, pp. 1–15, 2008, https://doi.org/10.1109/MM.2009.9

48. E. Strohmaier, J. Dongarra, S. Horst, M. Meuer, H. Meuer, Top500 The List, 2020, http://www.top500.org

49. Ł. Szustak, K. Rojek, P. Gepner, Using Intel Xeon Phi coprocessor to accelerate computations in MPDATA algorithm, [in:] R. Wyrzykowski, J. Dongarra, K. Karczewski, J. Waśniewski [Eds.], Parallel Processing and Applied Mathematics: 10th International Conference, PPAM 2013. Part I, Warsaw, Poland, September 8–11, 2013, pp. 582–592, Berlin, Heidelberg, Springer, 2014, https://doi.org/10.1007/978-3-642-55224-3_54

50. Ł. Szustak, K. Rojek, T. Olas, Ł. Kuczyński, K. Halbiniak, P. Gepner, Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor, Scientific Programming, 2015: 642705, 2015, https://doi.org/10.1155/2015/642705

51. S. Williams, A. Waterman, D. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM, 52(4): 65–76, 2009, https://doi.org/10.1145/1498765.1498785

Online first
Accepted manuscripts
2026, Vol 33
	No 1
2025, Vol 32
	No 1	No 2	No 3	No 4
2024, Vol 31
	No 1	No 2	No 3	No 4
2023, Vol 30
	No 1	No 2	No 3	No 4
2022, Vol 29
	No 1-2		No 3	No 4
2021, Vol 28
	No 1	No 2	No 3	No 4
2020, Vol 27
	No 1	No 2-3		No 4
2019, Vol 26
	No 1	No 2	No 3-4
2018, Vol 25
	No 1	No 2-3		No 4
2017, Vol 24
	No 1	No 2	No 3	No 4
2016, Vol 23
	No 1	No 2-3		No 4
2015, Vol 22
	No 1	No 2	No 3	No 4
2014, Vol 21
	No 1	No 2	No 3-4
2013, Vol 20
	No 1	No 2	No 3	No 4
2012, Vol 19
	No 1	No 2	No 3	No 4
2011, Vol 18
	No 1-2		No 3	No 4
2010, Vol 17
	No 1	No 2/3/4
2009, Vol 16
	No 1	No 2	No 3-4
2008, Vol 15
	No 1	No 2	No 3-4
2007, Vol 14
	No 1	No 2	No 3	No 4
2006, Vol 13
	No 1	No 2	No 3	No 4
2005, Vol 12
	No 1	No 2-3		No 4
2004, Vol 11
	No 1	No 2-3		No 4
2003, Vol 10
	No 1	No 2	No 3	No 4
2002, Vol 9
	No 1	No 2	No 3	No 4
2001, Vol 8
	No 1	No 2-3		No 4
2000, Vol 7
	No 1	No 2	No 3	No 4
1999, Vol 6
	No 1	No 2	No 3-4
1998, Vol 5
	No 1	No 2	No 3	No 4
1997, Vol 4
	No 1	No 2	No 3-4
1996, Vol 3
	No 1	No 2	No 3	No 4
1995, Vol 2
	No 1	No 2	No 3	No 4
1994, Vol 1
	No 1-2		No 3-4

Porting of Finite Element Integration Algorithm to Xeon Phi Coprocessor-based HPC Architectures

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact