CHIUW 2015 Hot Topics Abstracts
Saturday, June 13, 2015, 4-5pm
-
The Chapel Memory Consistency Model
Sung-Eun Choi, Michael Ferguson, Elliot Ronaghan, Greg Titus (Cray Inc.)
-
Abstract: A memory consistency model is an important element of a parallel
language because it allows programmers to understand programs
and it allows implementations to optimize. In this talk, we will
describe ongoing work on Chapel's memory consistency model. We will
present the high-level design goals of the draft model, discuss how
it avoids a pitfall or two from other language's models, and give a
detailed description of one aspect that is unique to Chapel.
-
Fast Fourier Transforms in Chapel
Doru Thom Popovici, Franz Franchetti (Carnegie-Mellon University)
-
Abstract: The fast Fourier transform (FFT) is an important building block for a
multitude of scientific applications from the High Performance
Computing (HPC) community. Rather than implementing the FFT from
scratch, application developers typically rely on pre-built and tuned
FFT libraries such as the Intel MKL or FFTW. These libraries obtain
performance by mapping efficient algorithms to the hardware features
of the architecture that they target. This presentation will give
insight in how we can map efficient recursive mixed-radix FFT
algorithms to the language features provided by Chapel to facilitate
parallel and distributed computation. The ultimate goal is to obtain
a competitive Chapel implementation of the FFT algorithm that can
achieve performance competitive with existing tuned libraries. To
achieve this goal we plan to understand and uncover the optimizations
that are required to bridge the performance gap between current Chapel
code and library code. We plan to apply the lessons learned from
building the Spiral autotuning and program generation system.
-
A Preliminary Performance Comparison of Chapel to MPI and MPI/OpenMP
Laura Brown (US Army Engineer Research and Development Center)
-
Abstract: As the High Performance Computing community moves toward peta- and
exascale computing, we need to begin evaluating alternatives, such as
Chapel, to MPI for parallel computing in order to achieve optimal
efficiency and scalability on large HPC systems. Any viable
alternatives, though, will need to be easy to use and provide
comparable (or better) performance to MPI. As part of a larger study
of parallel programming languages, I translated a small, non-trivial
program into Chapel and evaluated its performance on a large
production system. Then, these results were compared to the observed
performance of runs made with MPI and MPI/OpenMP versions of this
program. This talk will discuss the outcome of this study, along with
my initial impressions of Chapel as a usable parallel programming
language.
-
Data flow programming—a high performance and highly
complicated programming concept?
Jens Breitbart (Technische Universität München)
-
Abstract: This talk gives a short introduction to GASPI, which is a low-level
one-sided communication library using similar synchronization
mechanisms as are available in Chapel. GASPI has been developed with
a strict focus on performance, and applications using GASPI often beat
tuned MPI applications. However, most users have been unable to
utilize the power of the library due to the complexity that arises
from the synchronization primitives. The talk will focus on the issue
that the users faced when using GASPI and how Chapel may provide a
better end-user while still providing high performance.
-
If you can dodge a wrench, you can dodge a ball
Dylan Stark, George Stelle (Sandia National Laboratories) -
Abstract: This talk will focus on the importance of low-level runtime
configuration choices for achieving high performance, and how
application and node architecture can necessitate different choices.
The Chapel programming language significantly lowers the
programmability barrier for writing parallel applications by providing
clean semantics and abstractions for managing concurrency and data.
Nevertheless, concurrent execution and dynamic management of on-node
parallel resources is the responsibility of the underlying task layer.
In the case of the current default, Sandia¹s Qthreads library, we show
that mindful configuration for the node architecture and application
is essential, and that making the wrong choices can be ruinous. That
is to say, by avoiding performance pitfalls in the underlying thread
layer (the wrench), we improve the likelihood of avoiding performance
pitfalls in the higher level language (the ball).
-
A Progress Report on COHX: Chapel on HSA + XTQ
Mauricio Breternitz, Bibek Ghimire, Mike Chu, Steve Reinhardt (Advanced Micro Devices (AMD)) - Abstract: We report on our experience porting Chapel to the eXtended Task Queueing model (XTQ), an extension to HSA - Heterogeneous System Architecture. The HSA Architecture enables user-level tasking via architecturally-defined task-enqueueing to CPU and GPU task queues. XTQ extends HSA by enabling cross-node task queuing via RDMA access to HSA queues on remote nodes. We describe our approach and experience in porting Chapel to utilize XTQ. This comprises identifying and insulating Chapel runtime components that are updated to support this organization. We also describe initial experience with running Chapel-generated XTQ-enabled binaries in two environments: an emulation layer, which provides the XTQ API and runs on an HSA-enabled infiniband-connected cluster, as well as a gem5-based simulation model, which provides the XTQ API via a NIC device. The XTQ API is implemented and presented as an extension to the Portals 4 interface, underneath Chapel's GasNET layer. Initial microbenchmark results indicate potential speedup via low-latency intra- and inter-node task enqueueing.