Using autotuning for accelerating tensor contraction on graphics processing units (GPUS)

Update Item Information
Publication Type thesis
School or College College of Engineering
Department Computing
Author Rivera, Axel Y.
Title Using autotuning for accelerating tensor contraction on graphics processing units (GPUS)
Date 2014-12
Description Tensors are mathematical representations of physical entities that have magnitude with multiple directions. Tensor contraction is a form of creating these objects using the Einstein summation equation. It is commonly used in physics and chemistry for solving problems like spectral elements and coupled cluster computation. Mathematically, tensor contraction operations can be reduced to expressions similar to matrix multiplications. However, linear algebra libraries (e.g., BLAS and LAPACK) perform poorly on the small matrix sizes that commonly arise in certain tensor contraction computations. Another challenge seen in the computation of tensor contraction is the dierence between the mathematical representation and an ecient implementation. This thesis proposes a framework that allows users to express a tensor contraction problem in a high-level mathematical representation and transform it into a linear algebra expression that is mapped to a high-performance implementation. The framework produces code that takes advantage of the parallelism that graphics processing units (GPUs) provide. It relies on autotuning to nd the preferred implementation that achieves high performance on the available device. Performance results from the benchmarks tested, nekbone and NWChem, show that the output of the framework achieves a speedup of 8.56x and 14.25x, respectively, on an NVIDIA Tesla C2050 GPU against the sequential version; while using an NVIDIA Tesla K20c GPU it achieved speedups of 8.87x and 17.62x. The parallel decompositions found by the tool were also tested with an OpenACC implementation and achieved a speedup of 8.87x and 10.42x for nekbone, while NWChem obtained a speedup of 7.25x and 10.34x compared to the choices made by default in the OpenACC compiler. The contributions of this work are: (1) a simplied interface that allows the user to express tensor contraction using a high-level representation and transform it into high-performance code; (2) a decision algorithm that explores a set of optimization strategies for achieving performance; and, (3) a demonstration that this approach can achieve better performance than OpenACC and can be used to accelerate OpenACC.
Type Text
Publisher University of Utah
Subject Autotuning; Gpu; Tensor contraction
Dissertation Institution University of Utah
Dissertation Name Master of Science
Language eng
Rights Management Copyright © Axel Y. Rivera 2014
Format Medium application/pdf
Format Extent 1,137,271 bytes
Identifier etd3/id/3332
ARK ark:/87278/s66q55gv
Setname ir_etd
ID 196897
Reference URL https://collections.lib.utah.edu/ark:/87278/s66q55gv