Applications and Techniques on the Road to Exascale Computing

TytułApplications and Techniques on the Road to Exascale Computing
Publication TypeBook Chapter
Rok publikacji2012
AutorzyKarwacki M., Stpiczyński P
EditorDe Bosschere K., D`Hollander E.H., Joubert G.R., Padna D., Peters F., Sawyer M.
Book TitleAdvances in Parallel Computing
Volume22
ChapterImproving performance of triangular Matrix-Vector BLAS routines on GPUs.
PublisherIOS Press
ISBN Number978-1-61499-040-6
AbstractCUBLAS is a widely used implementation of BLAS (Basic Linear Algebra Subprograms) for NVIDIA CUDA Graphical Processing Units (GPUs). The aim of this paper is to show that the performance of the selected Level 2 BLAS routines for working with triangular matrices can be improved using some optimization techniques suitable for GPUs like using shared memory and coalesced memory access. We present new implementation of the routines xTRMV and xTRSV. The results of experiments carried out on two GPU architectures: Tesla M2050 and GeForce GTX 260 show that these new implementations are up to 500% faster than corresponding routines from CUBLAS Library.