CHIUW 2020

The 7th Annual
Chapel Implementers and Users Workshop

affiliated with IPDPS 2020


 

Friday May 22, 2020
8:30am–4:30pm PDT (GMT–7)
free and online via Zoom (due to Covid-19)

 

Introduction: CHIUW 2020 is the 7th annual Chapel Implementers and Users Workshop, organized in conjunction with IPDPS 2020. CHIUW serves as a forum where users and developers of the Chapel language (chapel-lang.org) can gather to report on work being done with Chapel, exchange ideas, and forge new collaborations. Anyone interested in parallel programming and Chapel is encouraged to attend CHIUW, from long-term enthusiasts to those simply curious to learn more.

Format: Due to Covid-19, CHIUW 2020 will be held online in a virtual workshop format. Talks will be given by speakers, either live or via pre-recorded videos (linked below when available). Each talk will be followed by a short question-and-answer session. Short breaks between speakers and sessions will be used to deal with any challenges that come up as a result of the distributed setting. Due to the large number of timezones involved, there will not be any formal meal breaks, but you're encouraged to eat while watching the talks or during the breaks.

Registration: In its online format, CHIUW 2020 is free and open to the public. Anyone interested in attending, listening, and participating is welcome to do so as long as appropriate conduct is observed. Registering with IPDPS is requested, though not required, so that they can get an approximate headcount for the workshop. Registering will also permit you to access the online proceedings for IPDPS. To register, visit the IPDPS home page and click the red "Register today" button.

 

Program
Time (PDT)
Pre-Workshop
 
anytime:  Chapel 101 [slides | video]
Brad Chamberlain (Cray, a Hewlett Packard Enterprise Company)
This is a completely optional talk that will be posted prior to the official start of the workshop for those who are new to Chapel and looking for a crash-course, or for those who would simply appreciate a refresher.
 
 
Welcome to CHIUW 2020
 
8:30–9:00:  Welcome, State of the Project [slides | video]
Brad Chamberlain (Cray, a Hewlett Packard Enterprise company)
This session will serve as a welcome to and overview of CHIUW 2020, along with a brief summary of highlights and milestones achieved within the Chapel project since last year.
 
 
Break / Technology Check
9:00–9:15:  Break: We'll use this initial, longer break to make sure that the streaming technology is generally working for people before proceeding.
 
 
Chapel Language Evolution
Session chair: Nikhil Padmanabhan (Yale)
 
9:15–9:40 Towards Stability in the Chapel Language [slides | video w/ Q&A]
Michael Ferguson (Cray, a Hewlett Packard Enterprise company)
Abstract: Language stability is an important upcoming feature of the Chapel programming language. Chapel users have both requested big changes to the language and also requested that the language become stable. This talk will discuss recent efforts to complete the big changes to the Chapel language so that the language can stabilize.
 
9:45–10:10: Visibility Control: Use and Import Statement Improvements [slides | video w/ Q&A]
Lydia Duncan (Cray, a Hewlett Packard Enterprise company)
Abstract: The Chapel team has recently returned to finalizing our inter-module interactions, especially regarding use and import statements. This talk will describe the work that has occurred in this area since Chapel 1.19.
 
 
Break
10:10–10:20:  Break
 
 
Arkouda
Session chair: Brad Chamberlain (Cray/HPE)
 
10:20–11:20:  Keynote: Arkouda: Chapel-Powered, Interactive Supercomputing for Data Science [slides | video | Q&A]
William Reus (U.S. DOD)
Abstract: Data science and high-performance computing (HPC) should be a great match: with datasets growing well beyond the memory of a single node and computations becoming ever more communication-intensive, the need for HPC in data science seems clear. And yet, there remains a frustrating gap between the two disciplines. One major reason is that data science is an interactive sport—data scientists overwhelmingly gravitate towards interactive platforms (e.g. Jupyter notebooks) and interpreted languages (e.g. Python)—whereas the culture of HPC tends to eschew interactivity in favor of compiled programs and batch jobs. While HPC practitioners prize computational efficiency, data scientists live by the very different maxim of rapid hypothesis testing and have demonstrated that they are willing to ignore HPC technologies entirely rather than give up interactivity. Bridging this gap entails a change in thinking about the purpose of an HPC and how it should be used.

This talk motivates and demonstrates the interactive use of up to hundreds of HPC nodes in data science workflows with an open-source package called Arkouda, which exposes massively parallel, distributed NumPy-like arrays to a Jupyter notebook running Python 3. We have chosen the NumPy format and Jupyter and Python as front-end technologies in order to conform to interfaces familiar to data scientists. Because Arkouda arrays can be constructed from and exported to NumPy arrays, users can perform heavy computations on hundreds of HPC nodes and bring back small sets of results for rich introspection in a single-node Python environment. In the future, we plan to use this interoperability as a template for bringing existing HPC codes into an interactive framework, much as NumPy has brought optimized C and Fortran routines into interactive Python workflows.

Meanwhile, the computational heart of Arkouda is a relatively compact yet highly scalable Chapel interpreter that implements a powerful set of data science primitives. Functionally, this interpreter comprises a dispatcher, modular data transformations, and a zero-copy, in-memory object store, all implemented in about 12,000 lines of Chapel. While these components and design principles appear in other open-source projects, the competitive advantage of Arkouda comes from the unique position Chapel holds as a productive language with performance and scaling on par with industry-standard HPC technologies. For this reason, Arkouda is small enough to be maintainable while achieving good scaling on communication-intensive primitives (e.g. argsort) up to at least 512 nodes of a Cray XC.

Bio: Dr. Reus is a physical chemist by training, having earned his Ph.D. from Harvard in the field of molecular electronics. Since graduate school, he has been cross-training in statistics and parallel computing in order to apply his scientific expertise to problems in cyberdefense. Dr. Reus lives near the Chesapeake bay with his wife and three children.
 
11:25–11:50: Squeezing Performance out of Arkouda [slides | video w/ Q&A]
Elliot Ronaghan (Cray, a Hewlett Packard Enterprise company)
Abstract: This talk will highlight optimizations made to Arkouda, a Python package backed by Chapel that provides a key subset of the popular NumPy and Pandas interfaces at HPC scales. Optimizations such as aggregating communication have significantly improved Arkouda’s performance across a wide range of architectures. Key optimizations and benchmark results will be shown on architectures including a single node server, Ethernet and InfiniBand clusters, and a 512 node Cray supercomputer.
 
 
Break
11:50–12:00:  Break
 
 
Applications of Chapel
Session chair: Engin Kayraklioglu (Cray/HPE)
 
12:00–12:25:  Development of Parallel CFD Applications on Distributed Memory with Chapel [slides | video | Q&A]
Matthieu Parenteau, Simon Bourgault-Cote, Frederic Plante, Eric Laurendeau (Polytechnique Montreal)
Abstract: Traditionally, Computational Fluid Dynamics (CFD) software uses Message Passing Interface (MPI) to handle the parallelism over distributed memory systems. For a new developer, such as a student or a new employee, the barrier of entry can be high and more training is required for each particular software package, which slows down the research process on actual science. The Chapel programming language offers an interesting alternative for research and development of CFD applications.

In this paper, the developments of two CFD applications are presented: the first one as an experiment by re-writing a 2D structured flow solver and the second one as writing from scratch a research 3D unstructured multi-physics simulation software. Details are given on both applications with emphasis on the Chapel features which were used positively in the code design, in particular to improve flexibility and extend to distributed memory. Some performance pitfalls are discussed with solutions to avoid them.

The performance of the unstructured software is then studied and compared to a traditional open-source CFD software package programmed in C++ with MPI for communication (SU2). The results show that our Chapel implementation achieves performances similar to other CFD software written in C and C++, thus confirming that Chapel is a viable language for high-performance CFD applications.

 
12:30–12:55:  Computing Hypergraph Homology in Chapel [slides | video w/ Q&A]
Jesun S. Firoz (Pacific Northwest National Laboratory), Louis Jenkins (University of Rochester), Cliff Joslyn, Brenda Praggastis, Emilie Purvine, Mark Raugas (Pacific Northwest National Laboratory)
Abstract: In this paper, we discuss our experience in implementing homology computation, in particular the Betti number calculations in Chapel Hypergraph Library (CHGL). Given a dataset represented as a hypergraph, a Betti number for a particular dimension k indicates how many k-dimensional ‘voids’ are present in the dataset. Computing the Betti numbers involve various array-centric and linear algebra operations. We demonstrate that implementing these operations in Chapel is both concise and intuitive. In addition, we show that Chapel provides language constructs for implementing parallel and distributed execution of the linear algebra kernels with minimal effort. Syntactically, Chapel provides succinctness of Python, while delivering comparable and better performance than C++-based and Julia-based packages for calculating the Betti numbers respectively.
 
1:00–1:15:  Exploring Chapel Productivity Using Some Graph Algorithms [slides | video w/ Q&A]
Richard F. Barrett, Jeanine Cook, Stephen L. Olivier, Omar Aaziz, Christipher D. Jenkins, Courtenay T. Vaughan (Sandia National Laboratories)
Abstract: A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructured graphs using high performance computing (HPC) or cloud resources, this practice hasn’t yet broken into the mainstream. Such computations require great expertise, yet users often need rapid prototyping and development to quickly customize existing code. Toward that end, we are exploring the use of the Chapel programming language as a means of making some important graph analytics more accessible, examining the breadth of characteristics that would make for a productive programming environment, one that is expressive, performant, portable, and robust.

In this talk we describe our early explorations of this space, based on miniTri, a miniapp from the Mantevo suite, and the mean hitting time algorithm, one of the analytics being explored within Grafiki, both of which are designed for use on distributed memory parallel processing environments. These implementations have been posed in terms of key linear algebra operations and algorithms, specifically sparse matrix-matrix multiplication, operating on integer datatypes, and the Conjugate Gradient method, based on a graph Laplacian matrix.

1:20–1:45:  Simulating Ultralight Dark Matter in Chapel [slides | video w/ Q&A]
Nikhil Padmanabhan, (Yale University), Elliot Ronaghan (Cray, a Hewlett Packard Enterprise Company), J. Luna Zagorac (Yale University), Richard Easther (University of Auckland)
Abstract: This talk summarizes the development of a Chapel astrophysics code to simulate the dynamics of ultralight dark matter. The talk has three broad goals - (i) to demonstrate that current versions of Chapel can achieve good performance and scalability for real-world workloads, (ii) to describe the experience of writing a research code in Chapel, and (iii) to highlight some of the simulation results we have achieved using this code. This project originated from a lightning talk at CHIUW 2019, and is a Chapel-centric update of results at PAW-ATM 2019.
 
 
Break
1:45–1:55:  Break
 
 
Chapel on GPUs
Session chair: Benjamin Robbins (Cray/HPE)
 
1:55–2:20: Exploring a Multi-Resolution GPU Programming Model for Chapel [slides | video | Q&A]
Akihiro Hayashi, Sri Raj Paul, Vivek Sarkar (Georgia Institute of Technology)
Abstract: While PGAS (Partitioned Global Address Space) programming models facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel systems, one open question is better support for accelerators, in particular, GPUs. We believe Chapel is well suited for this task due to its use of locales and domains to help abstract away low- level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. However, the shortcomings of past approaches on mapping Chapel onto GPUs include 1) no automatic code generation support for distributed or hybrid execution of forall loops across multiple CPUs+GPUs nodes and 2) no appropriate level of abstraction of GPU API (i.e., programmers have to directly handle raw CUDA/HIP/OpenCL API if the automatic approach does not work). In this talk, we explore a GPU programming model that complies with Chapel’s multi-resolution concept, where programmers have the option of providing a high-level specification and also of diving into lower-level details to incrementally evolve their implementations for improved performance on multiple CPUs+GPUs nodes.
 
2:25–2:40: Chapel on Accelerators [slides | video | Q&A]
Rahul Ghangas, Josh Milthorpe (The Australian National University)
Abstract: This talk introduces the “Chapel on Accelerators” project, which proposes compiler extensions to provide Chapel's high level constructs for hardware accelerators, mainly GPGPUs. GPUs form an important part of scientific computing, from physical simulations to deep learning, where offloading computation to GPUs can reduce running time by order of magnitude. The project currently uses static/template OpenCL kernels to do computation on the GPU. Static kernels are used for well known expressions, while template kernels are modified at compile time to support complex expressions. Current support includes offloading of promoted arithmetic operators and reduce expressions. During the talk, we discuss the current state of the project, goals and next steps. Finally, this talk also discusses an experimental idea of the “GPUArrays” library, which is a purely Chapel based library that supports offloading of various array operations to the GPU. GPUArrays uses a lazy evaluation based approach to construct OpenCL kernels by grouping multiple expressions involving GPUArrays together.
 
 
Break
2:40–2:50:  Break
 
 
Implementing Chapel
Session chair: Ben Albrecht (Cray/HPE)
 
2:50–3:15: Paving the way for Distributed Non-Blocking Algorithms and Data Structures in the Partitioned Global Address Space model [slides | video | Q&A]
Garvit Dewan (Indian Institute of Technology, Roorkee), Louis Jenkins (University of Rochester)
Abstract: The partitioned global address space memory model has bridged the gap between shared and distributed memory, and with this bridge comes the ability to adapt shared memory concepts, such as non-blocking programming, to distributed systems such as supercomputers. To enable non-blocking algorithms, we present ways to perform scalable atomic operations on objects in remote memory via remote direct memory access and pointer compression. As a solution to the problem of concurrent-safe reclamation of memory in a distributed system, we adapt Epoch-Based Memory Reclamation to distributed memory and implement it such that it supports global-view programming. This construct is designed and implemented for the Chapel programming language but can be adapted and generalized to work on other languages and libraries.
 
3:20–3:35:  An Automated Machine Learning Approach for Data Locality Optimizations in Chapel [slides | video w/ Q&A]
Engin Kayraklioglu (Cray, a Hewlett Packard Enterprise company), Tarek El-Ghazawi (The George Washington University)
Abstract: This talk will cover a machine learning approach for automated locality optimizations in Chapel. With this approach, applications that do not have any optimizations specific to distributed memory can scale with almost no programmer effort.
 
 
Open Session
Session chair: Michael Ferguson (Cray/HPE)
 
3:35–?:??:       Open Discussion Session
This final session is designed to support open discussion and interaction among the CHIUW attendees, similar to what we'd normally do over dinner and drinks, though the precise format will be TBD depending on the number and energy of participants. If you would like to propose something specific here, please let us know.
 

 

Chapel Coding Day

 
TBD:             Chapel Coding Day
Traditionally, CHIUW has included a second day designed to support coding together while we're in one location. But since we won't be co-located this year, any coding day will either take place later, or in small individually-scheduled groups depending on the level of interest (a good topic of discussion for the final session on the 22nd).
 

 

Committee

General Chair:

  • Benjamin Robbins, Cray, a Hewlett Packard Enterprise Company
Steering Committee:
  • Michael Ferguson, Cray, a Hewlett Packard Enterprise Company
  • Mike Merrill, U.S. DOD
  • Nikhil Padmanabhan, Yale University
  • Marcin Zalewski, NVIDIA

Program Committee:
  • Brad Chamberlain (chair), Cray, a Hewlett Packard Enterprise Company
  • Cathie Olschanowsky (co-chair), Boise State University
  • Maryam Dehnavi, University of Toronto
  • Clemens Grelck, University of Amsterdam
  • Paul H. Hargrove, Lawrence Berkeley National Laboratory
  • Engin Kayraklioglu, Cray, a Hewlett Packard Enterprise Company
  • Milind Kulkarni, Purdue University
  • Josh Milthorpe, Australian National University
  • Tyler Simon, UMBC
  • Christian Terboven, RWTH Aachen University
  • Rich Vuduc, Georgia Tech
  • Marcin Zalewski, NVIDIA

 

Call For Participation (for archival purposes)