The Chapel developer community is excited to announce the release of Chapel version 1.32! To obtain a copy, please refer to the Downloading Chapel page on the Chapel website.
Highlights of Chapel 1.32
Chapel 2.0 Release Candidate
The main highlight of Chapel 1.32 is that it is a release candidate for our forthcoming Chapel 2.0 release! If you’re not familiar with the concept of Chapel 2.0, it is intended to be a release that declares a core subset of the language and library features as ‘stable’. These features are ones that we intend to support in their current form going forward, such that code relying on them will not break across releases. Meanwhile, other features will be considered ‘unstable’, implying that they are ones where we are still learning from user experiences and refining interfaces before considering them to be stabilized. Unstable features may continue evolving after the 2.0 release, either by improving them until they too are stable, or replacing them with other, more stable features.
Chapel 1.32 being a 2.0 release candidate means that this is a key
time for Chapel users to give us feedback about aspects of our
design that they would like to see change prior to the 2.0 release.
Users may also want to compile their programs with the
--warn-unstable
flag in order to identify any unstable features
that they are currently relying upon. Reliance on such features
could motivate you to advocate for stabilizing those features sooner,
or you could simply view it as an opportunity to be aware that those
features may continue to evolve over time. We are generally
interested in hearing about which unstable features user code is
currently relying upon, to help with our own prioritization efforts.
Users with feedback about 2.0 readiness or the stability of current features are encouraged to share it with us on Chapel’s Discourse user forum or as a GitHub issue.
As part of the team’s push to make this a worthy Chapel 2.0 release candidate, Chapel 1.32 contains a large number of improvements to the language, compiler, and libraries. Some of these changes include:
-
new warnings to encourage a programming style in which generic types are more clearly visible in a program’s source code
-
a change in the default intent for arrays and record receivers (i.e.,
this
) toconst
for greater uniformity with other types -
revised definitions of the compiler’s interpretation of
const
intents and default return/yield intents -
significant improvements to ranges, domains, and distributions, including converting distribution types to records, obviating the need for the
dmap
type -
major improvements to the
IO
,Math
,BigInteger
, andTime
modules, including a new IO serialization framework for specifying how to read and write types to files orthogonally from the file’s format (see below for more detail)
For more information about these changes, and many others not summarized here, refer to the CHANGES.md file, documentation for Chapel 1.32, or forthcoming release note slides.
GPU Improvements
Version 1.32 includes significant improvements to Chapel’s support for vendor-neutral GPU programming, both in terms of performance and capabilities.
Key performance improvements include:
-
compiler optimizations to reduce the number of pointer dereferences when accessing arrays within GPU kernels
-
switching the default memory allocation scheme for arrays to ‘array_on_device’ mode, in which an array’s data is stored directly on the GPU rather than in managed memory
-
a reduction in overheads when invoking math routines within GPU kernels by eliminating unnecessary boilerplate wrapper code
-
using per-task GPU streams, which can enable communication-computation overlap to improve performance
The non-trivial impact of these optimizations can be seen in the following graphs, which show the improvements that have occurred in a Chapel port of the SHOC Sort benchmark on both NVIDIA and AMD GPUs. Note that the second graph includes data transfer times while the first does not.
Chapel’s support for AMD effectively reaches feature parity with
NVIDIA in this release, largely due to the addition of a number of
math routines that had not been supported for AMD in
Chapel 1.31. In addition, the Chapel compiler’s --savec
flag
can now be used to inspect the assembly code generated when
targeting AMD GPUs.
Meanwhile, when targeting NVIDIA GPUs, Chapel 1.32 adds support for
generating multi-architecture binaries by setting CHPL_GPU_ARCH
to
a comma-separated list of target architectures.
See the latest GPU Programming technical note for additional details about these changes and Chapel’s overall support for GPUs in 1.32.
Support for Co-Locales
Since its inception, Chapel has preferred to represent each compute node as a single top-level locale, using multitasking to implement any intra-node parallelism. This approach has been beneficial in many problem domains where running a process per core could result in larger memory requirements or poor surface-to-volume effects due to the amount of [note:SPMD = Single Program, Multiple Data, a static and coarse-grained style of parallelism in which multiple copies of the same program are executed, e.g. one per processor core ] parallelism.
However, as modern compute nodes have begun to support multiple [note:NICs = Network Interface Chips, which permit processes to communicate with remote nodes ] this traditional approach has faced challenges. Specifically, it is unduly complicated to have a single locale (UNIX process) leverage multiple NICs effectively; yet using just one NIC leaves potential performance benefits on the floor by not exercising the network to its full capacity.
To address this, Chapel 1.32 introduces user-facing support for co-locales, in which multiple locales can be mapped to a single compute node. Using co-locales can lead to performance improvements by making better use of the network and/or reducing the number of memory references that cross between sockets. For example, the following charts show improvements to a pair of benchmarks when run using two locales per node on a dual-NIC HPE Cray EX system using Slingshot 11:
Current support is limited to running a locale per socket on a given compute node, and is also limited to certain platforms and configurations:
-
HPE Cray EX platforms with Slingshot 11 when using
CHPL_COMM=ofi
-
InfiniBand-based systems when using
CHPL_COMM=gasnet
withCHPL_COMM_SUBSTRATE=ibv
-
Configurations using
CHPL_LAUNCHER=slurm-srun
orpbs-gasnetrun_ibv
To opt-in to using co-locales, specify the number of locales for your Chapel program using a product of nodes and locales per node. For example, the following invocation:
$ ./myChapelProgram -nl 8x2
says to run the Chapel program on 8 nodes with 2 locales per node, for a total of 16 locales.
For more information on using co-locales with Chapel, please refer to the online documentation.
IO Serialization Framework
The IO serialization framework that was prototyped in Chapel
1.31
is now used by default for calls like writeln()
and read()
, and
it is also available for use with types written by end-users.
As an illustration, consider the following example that prints an array in a couple of different formats:
|
|
Line 5 uses a normal
writeln()
to print the array of integers to the standard console
output (stdout
) using Chapel’s traditional format—one element
at a time, separated by spaces. Then, in line
7, we create a variant of stdout
that uses the JSON serializer
for all write()
s called on it. The result is that when we write
the array to this output stream in line 8,
it is printed using standard JSON formatting. Other current serializers support
binary,
YAML,
and Chapel
syntax
as alternate formats.
The new serialization framework also includes deserializers, which support reading values back in from the given format. And most importantly, users can now define their own methods specifying how their types should be written or read. This can be done in a format-neutral manner for simplicity, or in a way that’s sensitive to the output format when needed. For more information on defining these methods, please refer to their online documentation.
Improved ARM64 Support
Thanks to our colleagues on the
Qthreads team at Sandia National
Laboratories, support for ARM64 chips is significantly improved in
Chapel 1.32. Specifically, this release bundles version 1.19 of
Qthreads, in which task creation and switching have been
re-implemented using assembly code for ARM64 chips. This can
dramatically reduce multitasking overheads when using Chapel’s
preferred CHPL_TASKS=qthreads
mode.
As a simple illustration, the following table shows the impact of this fast task switching on a 16-node run of Bale Index Gather using various implementation strategies:
Approach | w/out fast tasks | with fast tasks | improvement |
---|---|---|---|
ordered | 70.7 MB/s/node | 84.7 MB/s/node | 1.20x |
ordered, oversubscribed | 86.3 MB/s/node | 140.4 MB/s/node | 1.63x |
unordered | 147.5 MB/s/node | 152.3 MB/s/node | 1.03x |
aggregated | 1352.0 MB/s/node | 1448.5 MB/s/node | 1.07x |
In addition, Qthreads 1.19 also improved portability for ARM64-based
platforms. This enables the use of CHPL_TASKS=qthreads
on a wider
variety of systems, such as M1/M2 Macs, where it is now the default.
And much more…
Beyond the highlights mentioned here, Chapel 1.32 contains numerous other improvements to Chapel’s features and interfaces, such as:
-
initial support for array allocations that will throw if the system is out of memory
-
a more robust set of types and routines for dealing with C pointer types, particularly with respect to
const
-ness -
initial support for interface declarations, to opt-in to special methods like the serialization methods mentioned above
-
features for power users to better understand the vectorization and transformation of their Chapel programs
-
support for selecting between processor types on chips with heterogeneous processing units
For a more complete list of changes in Chapel 1.32, please refer to its CHANGES.md file.
For More Information
For questions about any of the changes in this release, please reach out to the developer community on Discourse.
As always, we’re interested in feedback on how we can help make the Chapel language, libraries, implementation, and tools more useful to you in your work.
And always, thanks to everyone who contributed to the Chapel 1.32 release!