Tutorials
Recorded Chapel Tutorial (75 minutes)
Presented at ChapelCon 2024, June 2024.
Full Day Chapel Tutorial
- Slides:
- zip file of sample codes from slides
Papers
Featured Publications
This paper uses Chapel in a novel knowledge-sharing setting to support a general parallel framework for calibrating distributed hydrologic models. The approach is unique due to the use of a novel search algorithm as well as its interoperability with C#, fault tolerance, parallelism, and reliability.
This paper presents a compiler optimization that targets irregular memory accesses patterns in Chapel programs. Specifically, it uses static analysis to identify irregular memory accesses to distributed arrays in parallel loops and employs code transformations to generate an inspector and executor to perform selective data replication at runtime.
This paper revisits the design and implementation of tree search algorithms dealing with multiple GPUs, in addition to scalability and productivity-awareness using Chapel. The proposed algorithm exploits Chapel's distributed iterators by combining a partial search strategy with pre-compiled CUDA kernels for more efficient exploitation of the intra-node parallelism.
This paper describes a Computational Fluid Dynamics framework being developed using Chapel by a team at Polytechnique Montreal. The use of Chapel is described, and scaling results are given on up to 9k cores of a Cray XC. Comparisons are made against well-established CFD software packages.
Recent Publications
This work presents a local search for automatic parameterization of ChapelBB, a distributed tree search application for solving combinatorial optimization problems written in Chapel. The main objective of the proposed heuristic is to overcome the limitation of manual parameterization, which covers a limited feasible space.
This paper describes an implementation of Chapel's arrays that leverages the language's support for user-defined data distributions to implement the array using fabric-attached memory (FAM) rather than simply local DRAM.
This work compares the performance of a Chapel-based fractal generation on shared- and distributed-memory platforms with corresponding OpenMP and MPI+X implementations.
Chapel Overviews
This is currently the best introduction to Chapel's history, motivating themes, and features. It also provides a brief summary of current and future activities at the time of writing. An early pre-print of this chapter was made available under the name A Brief Overview of Chapel.
This is an early overview of Chapel's themes and main language concepts.
Chapel Project Updates
This paper describes the progress that has been made with Chapel since the HPCS program wrapped up.
This paper provides a snapshot of the Chapel project at the juncture between the end of the HPCS project and the start of the next phase in Chapel's development. It covers past successes, current status, and future directions.
Chapel Optimizations
This paper describes a pair of recent compiler optimizations focused on reducing communication overheads in Chapel, leveraging Chapel's high-level abstractions—one that strength reduces local array accesses, and a second which aggregates communications to amortize overheads.
This paper describes an approach that can efficiently train machine learning models that can be used to improve application execution times and scalability on distributed memory systems. This is achieved by analyzing the fine-grained communication profile of the application with small input data, and then predicting the communication patterns for more realistic inputs and coarsening the communication.
This paper describes how LLVM passes can optimize communication in PGAS languages like Chapel. In particular, by representing potentially remote addresses using a distinct address space, existing LLVM optimization passes can be used to reduce communication
This paper describes an optimization implemented for Chapel in which the runtime library aggregates puts and gets in accordance with Chapel's memory consistency model in order to reduce the potential overhead of doing fine-grained communications.
Applications of Chapel
This paper compares Chapel with Julia, Python/Numba, and C+OpenMP in terms of performance, scalability and productivity. Two parallel metaheuristics are implemented for solving the 3D Quadratic Assignment Problem (Q3AP), using thread-based parallelism on a multi-core shared-memory computer. The paper also evaluates and compares the performance of the languages for a parallel fitness evaluation loop, using four different test functions with different computational characteristics. The authors provide feedback on the implementation and parallelization process in each language.
This paper applies hypergraph analytics over a gigascale DNS data using CHGL, performing compute-intensive calculations for data reduction and segmentation. Identified portions are then sent to HNX for both exploratory analysis and knowledge discovery targeting known tactics, techniques, and procedures.
This paper uses Chapel to study the design and implementation of distributed Branch-and-Bound algorithms for solving large combinatorial optimization problems. Experiments on the proposed algorithms are performed using the Flow-shop scheduling problem as a test-case. The Chapel-based application is compared to a state-of-the-art MPI+Pthreads-based counterpart in terms of performance, scalability, and productivity.
This paper compares implementations of Breadth-First Search and Triangle Counting in Chapel and UPC++
Multiresolution Chapel Features
This paper describes how users can create parallel iterators that support zippered iteration in Chapel, demonstrating them via several examples that partition iteration spaces statically and dynamically.
This paper builds on our HotPAR 2010 paper by describing the programmer's role in implementing user-defined distributions and layouts in Chapel.
This paper describes our approach and software framework for implementing user-defined distributions and memory layouts using Chapel's domain map concept.
Chapel Tools
This paper describes a tool that uses a combination of data-centric and code-centric information to relate performance profiling information back to user-level data structures and source code in Chapel programs.
This paper proposes a high-level, data-centric profiler to analyze how distributed arrays are used by each locale.
Chapel Explorations
This paper describes a high-level, easy-to-use language feature to improve data locality efficiently.
This paper explores the expression of parameterized diamond-shaped time-space tilings in Chapel, demonstrating competitive performance with C+OpenMP along with significant software engineering benefits due to Chapel's support for parallel iterators.
Chapel Historical Papers
This is the original Chapel paper which lays out some of our motivation and foundations for exploring the language. Note that the language has evolved significantly since this paper was published, but it remains an interesting historical artifact.
Presentations
Featured Presentations
This introduction to Chapel provides the language's motivation and brief comparisons with familiar languages and HPC programming models. It then introduces some of Chapel's core features for parallelism and locality, showing how they have recently been extended to also support GPUs. It wraps up by providing a peek into some of the flagship applications that are using Chapel.
This is a talk with demos that introduces the use of Chapel to program GPUs in a vendor-neutral manner.
This lightning talk illustrates Chapel implementing several variants of Bale IndexGather—a random access computation: serially, for multicore processors, for GPUs, and on supercomputers. It touts the benefits of parallel languages in making such computations straightforward, yet fast and scalable.
This talk demonstrates how, as a parallel language, Chapel's standard library can easily support parallel implementations, permitting codes that are as succinct as other popular languages to outperform them by 10x–400x. It is also unique among the talks here in that it includes a live demo of programming using Chapel and Arkouda.
Featured Presentations by Chapel Users
This talk describes the use of Chapel to estimate the biodiversity of coral reefs using satellite image analysis.
This talk describes the use of Chapel to compute exact diagonalization methods on distributed systems, as used when simulating small quantum systems.
Other Recent Presentations
This talk gives an introduction to Chapel's support for GPU programming, including live demos on AMD and NVIDIA GPUs.
This talk introduces Chapel's support for GPU programming through user codes making use of it today and sample code segments.
This talk provides an in-depth introduction to Chapel's support for GPU programming from motivation to key concepts, applications, implementation approach, and ongoing work.
This is an introduction to the motivation, capabilities, and performance of Arkouda, supporting interactive data science for Python users at massive scales.
Timeless Talks
This CHIUW keynote describes CHAMPS, a ~48k-line framework written in Chapel for 3D unstructured computational fluid dynamics (CFD), while also providing an introduction to the role of HPC in Aerodynamics. The productivity benefits that Chapel brings to the CHAMPS team's work are made clear.
This CHIUW keynote describes Arkouda, a Python package that provides a NumPy-like interface implemented using a Chapel server that scales to dozens of Terabytes of data at interactive rates.
This keynote by Jonathan Dursi presents a survey of modern parallel computing frameworks as seen through the filter of the speaker's applications background, and describes Chapel's unique position within that landscape.
This was the keynote talk at CHIUW 2016, reporting on the personal experiences of an Astrophysics Professor who's been looking at using Chapel in his research.
Chapel Overviews
This 10-minute talk provides a very brief introduction to Chapel, highlighting recent advances such as support for GPUs and user applications.
This talk provides background on Chapel, such as how it compares to other mainstream language and HPC programming models, along with some of its benefits in the Arkouda and CHAMPS applications.
Applications of Chapel
This talk summarizes Chapel's use in CHAMPS and Arkouda, including some recent scaling results, and summarizes the use of traditional Chapel features to target GPUs in a vendor-neutral manner.
This keynote demonstrates how Chapel's support for task-parallelism is being used to express a wide variety of computations while also generating good performance and scalability.
This talk describes a use of Chapel to explore dark matter in cosmological models.
This talk describes the role of Chapel in supporting Exploratory Data Analysis (EDA) in Arkouda.
GPU Computing in Chapel
This talk describes Chapel's recently added support for GPU programming, detailing the programming model and code generation strategy.
Implementing and Optimizing Chapel
This talk gives a peek into what's required to compile some of Chapel's key features, and describes a pair of optimizations that are made possible through its unique features.
This keynote describes various forms of optimized and aggregated communications in Chapel for sparse communication patterns as exhibited by HPCC RA, Bale IndexGather, or Arkouda. Approaches include asynchronous fine-grain communications, manual copies expressed using Chapel's global namespace, and aggregation via user-level abstractions or compiler transformations.
This talk describes the use of Chapel's task-based parallel features to optimize communication through compiler analysis and/or user-defined aggregation abstractions.
This talk describes the Chapel memory consistency model and how it enables two communication optimizations that have been implemented for Chapel.
This talk is a fairly comprehensive overview of Chapel's themes, features, and status, with a bit more emphasis on the implementation and multiresolution design of the language than a typical talk allows for.
Chapel Design and Philosophy
This talk provides an update to the DOE community about recent Chapel progress, along with a retrospective about how we got here and some research challenges going forward.
This keynote provided a review of some of the productivity metrics that were pursued under the DARPA HPCS program, but then argued that productivity seems like a very personal/social decision and that it therefore should be studied in forums supporting personal/social decisions. Two specific proposals are made.
This talk surveys past approaches to benchmarking from a language designer's perspective, rating them along various axes of importance. It wraps up by advocating for an HPC equivalent to the Computer Language Benchmarks game.
This keynote talk reflects on some of the successes of ZPL's support for data-parallel array-based programming, lists reasons that ZPL was ultimately limited, and how we addressed those limitations in Chapel's design.
This talk and poster provide an introduction to Chapel's hierarchical locales, a Chapel concept for making the language and user codes future-proof against future changes in node architecture.
This talk briefly summarizes productivity-oriented metrics work undertaken by the Cray Cascade project during the HPCS program, along with a few anecdotal instances of Chapel productivity. It also provides some of Brad's personal takeaways from the experience.
This talk lists some of the things that we think make HPC programming non-productive today and gives examples of how we are trying to address them in Chapel.
This talk considers five design decisions that parallel language designers should wrestle with and how Chapel's design deals with them.