CHIUW 2019
The ACM SIGPLAN 6th Annual
Chapel Implementers and Users Workshop
co-located with PLDI 2019
/ ACM FCRC 2019
Saturday–Sunday June 22–23, 2019
Phoenix, AZ, USA
Saturday June 22, 2019 (mini-conference day)
Sunday June 23, 2019 (coding day)
Introduction: CHIUW 2019—the ACM SIGPLAN 6th annual Chapel Implementers and Users Workshop, to be held in conjunction with PLDI 2019 / ACM FCRC 2019—continues our annual series of workshops focused on bringing developers and users of the Chapel language (chapel-lang.org) together to present and discuss work being done across the community. Attendance is open to anyone interested in Chapel, from the most seasoned Chapel user or developer to someone simply curious to learn more.
Registration: Register for CHIUW 2019 via the PLDI registration page. If you will only be attending CHIUW, select the PLDI'19 Workshops and Tutorials Saturday option. If you wish to attend additional days at PLDI and/or FCRC, look around on the registration site as there are many activities to choose from. Note that early registration closes on May 24th. Also, note that registration is not required to attend the coding day on Sunday.
Saturday June 22, 2019 (Mini-Conference Day)
Phoenix Convention Center, room 212a
8:15 - 9:00: | Continental Breakfast (catered by PLDI) |
9:00 - 9:30: | Chapel 101 (Optional) [slides] |
Brad Chamberlain (Cray Inc.) | |
This is a completely optional session held prior to the official start of the workshop designed for those who are new to Chapel and looking for a crash-course, or for those who would simply appreciate a refresher. | |
9:30 - 10:00: | Welcome, State of the Project [slides | video] |
Benjamin Robbins, Brad Chamberlain (Cray Inc.) | |
This session will serve as a welcome to and overview of CHIUW 2019, along with a brief summary of highlights and milestones achieved within the Chapel project since last year. | |
Session chair: Michelle Strout (University of Arizona) |
|
10:00 - 10:25 | GPUIterator: Bridging the Gap between Chapel and GPU Platforms [slides | video] |
Akihiro Hayashi (Rice University), Sri Raj Paul, Vivek Sarkar (Georgia Institute of Technology) | |
Abstract: PGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support accelerators, especially GPU accelerators, in heterogeneous nodes in a cluster. Among high-level PGAS programming languages, Chapel is well suited for this task due to its use of locales and domains to help abstract away low-level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. In this paper, we address some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. First, we introduce a Chapel module, GPUIterator, which is a portable programming interface that supports GPU execution of a Chapel forall loop. This module makes it possible for Chapel programmers to easily use hand-tuned native GPU programs/libraries, which is an important requirement in practice since there is still a big performance gap between compiler-generated GPU code and hand-turned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Second, though Chapel programs are regularly executed on multi-node clusters, past work on GPU enablement of Chapel programs mainly focused on single-node execution. In contrast, our work supports execution across multiple CPU+GPU nodes by accepting Chapel's distributed domains. Third, our approach supports hybrid execution of a Chapel parallel (forall) loop across both a GPU and CPU cores, which is beneficial for specific platforms. Our preliminary performance evaluations show that the use of the GPUIterator is a promising approach for Chapel programmers to easily utilize a single or multiple CPU+GPU node(s) while maintaining portability. | |
10:25 - 10:50: | Calling Chapel Code: Interoperability Improvements [slides | video | code] |
Lydia Duncan, David Iten (Cray Inc.) | |
Abstract: Since CHIUW last year, the Chapel team has undertaken an effort to improve the ability to call Chapel code from other languages. This talk will cover a few areas of improvement: using Chapel code as a library from C, Python, and Fortran; and in addition, improvements to array interoperation. | |
11:00 - 11:20: | Coffee Break (catered by PLDI) |
Session chair: David Wonnacott (Haverford College) |
|
11:20 - 11:45: | Towards Radix Sorting in the Chapel Standard Library [slides] |
Michael Ferguson (Cray Inc.) | |
Abstract: This talk will discuss recent work improving the Sort module of the Chapel programming language. It will discuss an interface design to support radix sort, describe the implementation of radix sort, compare the performance of this implementation to sort libraries in other language, and finally discuss distributed sorting. | |
11:45 - 12:10: | Implementing Stencil Problems in Chapel: An Experience Report [slides] |
Per Fuchs, Pieter Hijma (Vrije Universiteit Amsterdam), Clemens Grelck (University of Amsterdam) | |
Abstract: Stencil operations represent a fundamental class of algorithms in high-performance computing. We are interested in what level of performance can be expected from a high-productivity language such as Chapel. To this effect we discuss four different implementations of a generic stencil operation with a convergence check after each iteration. We start with a sequential implementation followed by a global-view implementation that we experiment with both on a 16-core multi-core system as well as on a cluster with up to 16 such nodes using domain maps. We finish with a local-view implementation that explicitly encodes all design decisions with respect to parallel execution. This paper is set up as a two stage experience report: We mainly report our findings from the users' perspective without any feedback from the Chapel implementers. We then report additional analysis performed under guidance of the Chapel team. Our experimental findings show that Chapel performs as expected on a single node. However, it does not achieve the expected levels of performance on our multi-node system, neither with the data-parallel global-view approach, nor with the task-parallel local-view code. We discuss the root causes of our reduced performance in detail and report possible solutions. | |
12:10 - 12:35: | Chapel Unblocked: Recent Communication Optimizations in Chapel [slides | video] |
Elliot Ronaghan, Ben Harshbarger, Gregory Titus, Michael Ferguson (Cray Inc.) | |
Abstract: This talk will highlight communication optimizations made to the Chapel compiler and runtime over the past year. It will focus on improvements to core benchmarks that have benefited from fine-grained and bulk communication optimizations as well as remote task-spawning improvements. Several benchmarks including HPC Challenge (HPCC) RandomAccess, HPCC Stream Triad, and an integer sort code ISx will be briefly introduced, and a relevant performance optimization will be showcased. These benchmarks represent core idioms that are common in many HPC applications. Performance results on up to 1,024 nodes (25,000 cores) will demonstrate that with each release Chapel is becoming more competitive against hand tuned MPI+OpenMP, SHMEM, and UPC. | |
12:35 - 2:00: | Lunch (catered by PLDI) |
Session chair: Brad Chamberlain (Cray Inc.) |
|
2:00 - 3:00: | Programming Abstractions for Orchestration of HPC Scientific Computing [slides | video] |
Anshu Dubey (Argonne National Laboratory / University of Chicago) | |
Abstract: Application developers are confronted with three axes of increasing complexity going forward; increasing heterogeneity in computing platforms at all levels, increasing heterogeneity in solvers and data management, and moving existing code bases to future programming models. While the first two will dictate which future programming models may deliver the needed performance, the third will determine their adoption. However, it is clear that the infrastructure backbone of large scale Multiphysics software has to orchestrate data and task movement between devices. The lifecycle of scientific software is several times that of platforms, therefore, any orchestration mechanism must have flexibility and configurability to remain usable on future platforms. In this presentation I will outline a model of an orchestration framework and the demands that it will place on programming models and languages. | |
Bio: Anshu Dubey is a Computer Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory and a Senior Scientist at large at the University of Chicago. She leads the Earth and Space Sciences sub-area within the Applications Development focus area of the US-DOE Exascale Computing Project. She is the lead software architect for FLASH, a multiphysics multiscale HPC software that is used by multiple science and engineering domains as their community code. In the past she has held positions at the University of Chicago, where she was the Associate Director at the Flash Center for Computational Science. | |
Session chair: Michael Ferguson (Cray Inc.) |
|
3:00 - 3:25: | Arkouda: Interactive Data Exploration Backed by Chapel [slides | video] |
Michael Merrill, William Reus, Timothy Neumann (DOD) | |
Abstract: Exploratory data analysis (EDA) is the prerequisite for all data science. EDA is non-negotiably interactive—by far the most popular environment for EDA is a Jupyter notebook—and, as datasets grow, increasingly computationally intensive. Several existing projects attempt to combine interactivity and distributed computation using programming paradigms and tools from cloud computing, but none of these projects have come close to meeting our needs for high-performance EDA. To fill this gap, we have developed a prototype, called arkouda, that allows a user to interactively issue massively parallel computations on distributed data. We designed the API of arkouda to closely mimic NumPy, the underlying computational library used in approximately 80% of EDA workflows (based on a sample of Jupyter notebooks). Our vision is that users will import arkouda as a Python module in place of NumPy (e.g. “import arkouda as np”) and use familiar NumPy functions and syntax to interact with arrays of data residing on an HPC. The computational heart of arkouda is a Chapel interpreter that accepts a pre-defined set of commands from the Python frontend and uses Chapel’s built-in machinery for multi-locale and multithreaded execution. While arkouda, in our experience, comes closer than anything else to enabling high-performance EDA, the process of developing arkouda has also helped identify ways Chapel must improve in order to become a truly productive language for data science. | |
3:30 - 4:00: | Break (catered by PLDI) |
Session chair: Michael Ferguson (Cray Inc.) |
|
4:00 - 4:25: | Chapel Graph Library (CGL) [slides | video] |
Louis Jenkins (University of Rochester), Marcin Zalewski (Pacific Northwest National Laboratory) | |
Abstract: In this talk, I summarize prior work on the Chapel HyperGraph Library (CHGL), the Chapel Aggregation Library (CAL), and introduce the more general Chapel Graph Library (CGL). CGL is being designed to enable global-view programming, such that locality is abstracted from the user. CGL is also being designed in a way that is similar to Chapel's multiresolution design philosophy, where graphs are implemented in terms of hyper graphs, and where both the underlying hypergraph and overlying graphs are available for use. Some of the kinds of graphs being designed are bipartite graphs, directed and undirected graphs, and even trees. | |
4:25 - 4:50: | Chapel in Cray HPO [slides] |
Benjamin Albrecht, Alex Heye, Benjamin Robbins (Cray Inc.) | |
Abstract: Cray HPO is a module of the data science workflow framework known as CrayAI. This module was released on Urika XC 1.2 and Urika CS 1.1, making Cray HPO the first Cray product built with Chapel. This talk will cover the importance, the technical aspects, and the broader vision for Cray HPO including Chapel’s role in the project. | |
Session chair: Benjamin Robbins (Cray Inc.) |
|
4:50 - 5:30: | Lightning Talks and Flash Discussions |
This final session will feature short (5–10 minute, depending on number of participants) time slots in which community members can give short talks, lead discussions on current hot topics of interest, do demos, etc. Sign up on-site or let us know of your interest beforehand. | |
5:30 - : | Adjourn for Dinner (in ad hoc groups or on your own) |
all day (start time 9am): | Chapel Coding Day |
CHIUW's coding day is an annual chance to work cooperatively on coding problems or discussion topics while we're in one place. Members of the core Chapel team will be available to partner with members of the community on topics of interest. If you would like to participate in a pair-programming or collaborative activity on this day, please let us know at chapel_submissions@cray.com or on-site at CHIUW. | |
General Chair:
- Benjamin Robbins, Cray Inc.
- Michael Ferguson, Cray Inc.
- Nikhil Padmanabhan, Yale University
- Brad Chamberlain (chair), Cray Inc.
- Maryam Dehnavi (co-chair), University of Toronto
- Rafael Asenjo, University of Malaga
- Michael Ferguson, Cray Inc.
- Oscar Hernandez, ORNL
- Hang Liu, UMass Lowell
- Nikhil Padmanabhan, Yale University
- Tyler Simon, UMBC
- Didem Unat, Koç University
- Ana Lucia Varbanescu, University of Amsterdam
- Rich Vuduc, Georgia Tech
- David Wonnacott, Haverford College
Call For Participation (for archival purposes)