The Chapel Parallel Programming Language

 

CHIUW 2018

The 5th Annual Chapel Implementers and Users Workshop

Friday May 25, 2018 (mini-conference day)
Saturday May 26, 2018 (code camp)
 

 
32nd IEEE International Parallel & Distributed Processing Symposium
JW Marriott Parq Vancouver, Vancouver, British Columbia, Canada

Introduction: CHIUW 2018—the fifth annual Chapel Implementers and Users Workshop, to be held in conjunction with IEEE IPDPS 2018—will continue our annual series of workshops designed to bring developers and users of the Chapel language (chapel-lang.org) together to present and discuss work being done across the broad open-source community. Attendance is open to anyone interested in Chapel, from the most seasoned Chapel user or developer to someone simply curious to learn more.

Registration: Register for CHIUW 2018 via the IPDPS registration site. If you're only attending CHIUW, select a one-day registration. To attend other days at IPDPS, select from the other options, as appropriate.

 

Friday, May 25, 2018 (Mini-Conference Day)

 
Pre-Workshop
 
8:30 - 9:00:  Chapel 101 (Optional)
TBD (Cray Inc.)
This is a completely optional session held prior to the official start of the workshop for those who are new to Chapel and looking for a crash-course, or for those who would simply like a refresher.
 
 
Introduction (session 1)
9:00 - 9:30:  Welcome, State of the Project
Brad Chamberlain (Cray Inc.)
 
 
Morning Break
9:30 - 10:00:  Break (catered by IPDPS)
 
 
Applications of Chapel (session 2)
Session chair: TBD
 
10:00 - 10:20 Parallel Sparse Tensor Decomposition in Chapel
Thomas Rolinger (University of Maryland), Tyler Simon (Laboratory for Physical Sciences), Christopher Krieger (Laboratory for Physical Sciences)
Abstract: In big-data analytics, using tensor decomposition to extract patterns from large, sparse multivariate data is a popular technique. Many challenges exist for designing parallel, high performance tensor decomposition algorithms due to irregular data accesses and the growing size of tensors that are processed. There have been many efforts at implementing shared-memory algorithms for tensor decomposition, most of which have focused on the traditional C/C++ with OpenMP framework. However, Chapel is becoming an increasingly popular programing language due to its expressiveness and simplicity for writing scalable parallel programs. In this work, we port a state of the art C/OpenMP parallel sparse tensor decomposition tool, SPLATT, to Chapel. We present a performance study that investigates bottlenecks in our Chapel code and discuss approaches for improving its performance. Also, we discuss features in Chapel that would have been beneficial to our porting effort. We demonstrate that our Chapel code is competitive with the C/OpenMP code for both runtime and scalability, achieving 83%-96% performance of the original code and near linear scalability up to 32 cores.
 
10:20 - 10:40: Iterator-Based Optimization of Imperfectly-Nested Loops
Daniel Feshbach (Haverford College), Mary Glaser (Haverford College), Michelle Strout (University of Arizona), and David Wonnacott (Haverford College)
Abstract: Effective optimization of dense array codes often depends upon the selection of the appropriate execution order for the iterations of nested loops. Tools based on the Polyhedral Model have demonstrated dramatic success in performing such optimizations on many such codes, but others remain an area of active research, leaving programmers to optimize code in other ways.

Bertolacci et. al demonstrated that programmer-defined iterators can be used to explore iteration-space reorderings, and that Cray’s compiler for the Chapel language can optimize such codes to be competitive with polyhedral tools. This “iterator-based” approach allows programmers to explore iteration orderings not identified by automatic optimizers, but was only demonstrated for perfectly-nested loops, and lacked any system for warning about an iterator that would produce an incorrect result.

We have now addressed these shortcomings of iterator-based loop optimization, and explored the use of our improved techniques to optimize the imperfectly-nested loops that form the core of Nussinov’s algorithm for RNA secondary-structure prediction. Our C++ iterator provides performance that equals the fastest C code, several times faster than was achieved by using the same C compiler on the code with the original iteration ordering, or the code produced by the Pluto loop optimizer. Our Chapel iterators produce run-time that is competitive with the equivalent iterator-free Chapel code, though the Chapel performance still does not equal that of the C/C++ code.

We have also implemented an iterator that produces an incorrect-but-fast version of Nussinov’s algorithm, and used this iterator to illustrate our approaches to error-detection. Manual application of our compile-time error-detection algorithm (which has yet to be integrated into a compiler) identifies this error, as does the run-time approach that we use for codes on which the static test proves inconclusive.

 
10:40 - 11:00:  Investigating Data Layout Transformations in Chapel
Apan Qasem (Texas State University), Ashwin AJi, and Mike Chu (AMD)
Abstract: Heterogeneous node architectures are quickly becoming the de facto choice in scalable supercomputers. Efficient layout and placement of shared data structures is critical in attaining desired performance on such systems. However, with most high-level programming languages, the programmer has to manually explore the optimal data organization strategy for their workloads. This paper explores automatic and semi- automatic data layout transformations for heterogeneous memory architectures using Chapel as a reference high-level language. We first identify computation and data access patterns that are problematic for hybrid nodes, then propose solutions to rectify these situations by converting inferior data layouts to efficient ones, and finally outline implementation strategies in Chapel. We demonstrate that the domain map feature in Chapel can be leveraged to implement sophisticated layout transforms for heterogeneous memory systems. Preliminary evaluation shows that the proposed transformations can make up to an order-of-magnitude difference in performance for GPU kernels with certain characteristics.
 
 
Quick Break
11:00 - 11:10:  Quick Break
 
 
Chapel Design and Evolution (session 3)
Session chair: TBD
 
11:10 - 11:30:  Transitioning from Constructors to Initializers in Chapel [extended abstract]
Lydia Duncan and Michael Noakes (Cray Inc.)
 
11:30 - 11:50:  RCUArray: An RCU-like Parallel-Safe Distributed Resizable Array
Louis Jenkins (Bloomsburg University)
Abstract: I present RCUArray, a parallel-safe distributed array that allows for read and update operations to occur concurrently with a resize. As Chapel lacks thread-local and task-local storage, I also present a novel extension to the Read-Copy-Update synchronization strategy that functions without the need for either. At 32-nodes with 44-cores per node the RCUArray’s relative performance to an unsynchronized Chapel block distributed array is as little as 20% for read and update operations, but with runtime support for zero-overhead RCU and thread-local or task-local storage it has the potential to be near-equivalent; relative performance for resize operations is as much as 3600% due to the novel design.
 
11:50 - 12:10:  Adding Lifetime Checking to Chapel [extended abstract]
Michael Ferguson (Cray Inc.)
 
 
Lunch
12:10 - 1:40:  Lunch (in ad hoc groups or on your own)
 
 
Keynote Talk
Session chair: TBD
1:40 - 2:40:  Title TBD
Katherine Yelick (UC Berkeley / Lawrence Berkeley National Laboratory)
Abstract: TBD

Bio: Katherine (Kathy) Yelick is a Professor of Electrical Engineering and Computer Sciences at UC Berkeley and the Associate Laboratory Director (ALD) for Computing Sciences at Lawrence Berkeley National Laboratory. Her research is in high performance computing, programming languages, compilers, parallel algorithms, and automatic performance tuning. She currently leads the Berkeley UPC project and co-lead the Berkeley Benchmarking and Optimization (Bebop) group. As ALD for Computing Sciences at LBNL, she oversees the National Energy Research Scientific Computing Center (NERSC), the Energy Sciences Network (ESnet) and the Computational Research Division (CRD), which covers applied math, computer science, data science and computational science.

A longer bio and CV can be found at here.

 
 
Chapel Performance (session 4)
Session chair: TBD
 
2:40 - 3:00: Tales from the Trenches: Whipping Chapel Performance into Shape [extended abstract]
Elliot Ronaghan, Ben Harshbarger, and Greg Titus (Cray Inc.)
 
 
Afternoon Break
 
3:00 - 3:30:  Break (catered by IPDPS)
 
 
Tools (session 5)
Session chair: TBD
 
3:30 - 3:50: Purity: An Integrated, Fine-Grain, Data-Centric, Communication Profiler for the Chapel Language
Richard Johnson and Jeffrey Hollingsworth (University of Maryland)
Abstract: We present Purity, a configurable, data-centric, communication profiler for the Chapel language that analyzes memory and communication access patterns in a multi-node PGAS environment. By integrating Purity into the compiler and runtime framework of Chapel we can instrument Chapel programs to capture memory and communication operations and produce both online and fine-grain post execution reporting. Our profiler is equipped with a sampling mechanism for reducing overhead, handles complex data structures, and generates detailed execution profiles that map data motion to the variable, field, loop, and node levels for both distributed and non-distributed instantiations. In a case study, Purity provided valuable insight into task and data locality which allowed us to develop a programmatic solution for reducing nearly 90% of remote operations in SSCA#2.
 
3:50 - 4:10:  ChplBlamer: A Data-centric and Code-centric Combined Profiler for Multi-locale Chapel Programs [extended abstract]
Hui Zhang and Jeffrey Hollingsworth (University of Maryland)
 
4:10 - 4:30:  Mason, Chapel's Package Manager [extended abstract]
Ben Albrecht (Cray Inc.), Sam Partee (Haverford College), Ben Harshbarger, and Preston Sahabu (Cray Inc.)
 
 
Lightning Talks and Flash Discussions (session 6)
Session chair: TBD
4:30 - 5:30:  Lightning Talks and Flash Discussions
This final session will feature short (5–10 minute, depending on number of participants) time slots in which community members can give short talks, lead discussions on current hot topics of interest, do demos, etc. Sign up on-site or let us know of your interest beforehand.
 
5:30 -       :  Adjourn for Dinner (in ad hoc groups or on your own)
   

 

Saturday, May 26, 2018 (Code Camp)

 
Chapel Code Camp
(room: TBD)
 
8:30 - ?:??: Chapel Code Camp
The Chapel code camp is an annual chance to work cooperatively on coding problems or discussion topics while we're in one place. Members of the core Chapel team will be on-hand to partner with members of the community on topics of interest. If you're interested in attending and/or submitting an activity to pursue, please see the call for participation for more information.
 
 
Lunch
 
12:00 -      : Lunch (in ad hoc groups or on your own)

 

Committee

General Chairs:

  • Michael Ferguson, Cray Inc.
  • Nikhil Padmanabhan (co-chair), Yale University
Program Committee:
  • Brad Chamberlain (chair), Cray Inc.
  • Aparna Chandramowlishwaran (co-chair), University of California, Irvine
  • Mike Chu, AMD
  • Anshu Dubey, Argonne National Laboratory
  • Jonathan Dursi, The Hospital for Sick Children, Toronto
  • Hal Finkel, Argonne National Laboratory
  • Marta Garcia Gasulla, Barcelona Supercomputing Center
  • Clemens Grelck, University of Amsterdam
  • Jeff Hammond, Intel
  • Bryce Lelbach, Nvidia
  • Michelle Strout, University of Arizona
  • Kenjiro Taura, University of Tokyo
  • David Wonnacott, Haverford College

 

Call For Participation (for archival purposes)