Chapel Launchers¶

When compiling Chapel programs for multiple locales, a launcher binary is typically created that will execute the appropriate command(s) to get your program started. For example, when compiling for multiple locales, typically two binaries will be generated by the compiler (e.g., myprogram and myprogram_real). The first binary contains code to get your program up and running on multiple locales while the second contains your actual program code.

The goals of the launcher binary are:

to wrap details of job startup in a portable way so that new users can quickly get Chapel programs up and running on an unfamiliar platform.
to perform command-line parsing and error checking prior to waiting in a queue or firing off a parallel job in order to save time and resources related to simple errors/typos in the command line.
to preserve Chapel’s global-view programming model by permitting the user to run their program using a single binary (corresponding to the single logical task that executes main()) without getting bogged down in questions of numbers of nodes, numbers of cores per node, numbers of program instances to start up, etc.
if necessary, to coordinate runtime functional activity, such as I/O.

Executing a Chapel program using the verbose (-v) flag will typically print out the command(s) used to launch the program, along with any environment variables the launcher set on its behalf. It will also cause the program itself to print additional information about how it configured itself, though most of this will be of more interest to Chapel developers than regular users.

Executing using the help (-h/--help) flag will typically print out any launcher-specific options in addition to the normal help message for the program itself.

You can also execute the Chapel launcher with the --dry-run flag. This will not actually run or launch the user program, but instead simply print the same thing as -v: the command(s) that would have been used to launch the program, along with any environment variables the launcher would have set on its behalf. Note that --dry-run will also cause batch and other files created for the system launcher to be left behind so you can inspect and/or reuse them. Normally these are removed by the Chapel launcher when the program finishes. An example of such a file would be the sbatch file created when a Slurm-based launcher is used and CHPL_LAUNCHER_USE_SBATCH is set.

Currently Supported Launchers¶

Currently supported launchers include:

Launcher Name	Description
amudprun	GASNet launcher for the UDP substrate
aprun	Cray application launcher using aprun
gasnetrun_ibv	GASNet launcher for the Infiniband substrate
gasnetrun_mpi	GASNet launcher for the MPI substrate
mpirun4ofi	provisional launcher for `CHPL_COMM=ofi` on non-Cray systems
lsf-gasnetrun_ibv	GASNet launcher for LSF (bsub) and the Infiniband substrate
pals	Cray application launcher for PALS on HPE Cray EX systems
pbs-aprun	Cray application launcher for PBS (qsub) + aprun
pbs-gasnetrun_ibv	GASNet launcher for PBS (qsub) and the Infiniband substrate
slurm‑gasnetrun_ibv	GASNet launcher for SLURM and the Infiniband substrate
slurm‑gasnetrun_mpi	GASNet launcher for SLURM and the MPI substrate
slurm‑gasnetrun_ofi	GASNet launcher for SLURM and the OFI substrate
slurm-srun	native SLURM launcher
smp	GASNet launcher for the shared-memory substrate
none	do not use a launcher

A specific launcher can be explicitly requested by setting the CHPL_LAUNCHER environment variable. For the specific case of the mpirun4ofi launcher, please see Using Chapel with libfabric.

If CHPL_LAUNCHER is left unset, a default is picked as follows:

if CHPL_COMM is gasnet and CHPL_COMM_SUBSTRATE is udp CHPL_LAUNCHER is set to amudprun
otherwise, if CHPL_TARGET_PLATFORM is cray-xc or hpe-cray-ex:

If

CHPL_LAUNCHER

both aprun and srun in user’s path

none

aprun in user’s path

aprun

srun in user’s path

slurm-srun

otherwise

none
otherwise, if CHPL_TARGET_PLATFORM is cray-cs and CHPL_COMM is gasnet and salloc is in the user’s path:

If

CHPL_LAUNCHER

CHPL_COMM_SUBSTRATE=ibv

slurm-gasnetrun_ibv

CHPL_COMM_SUBSTRATE=mpi

slurm-gasnetrun_mpi
otherwise, if CHPL_TARGET_PLATFORM is cray-cs and srun is in the users path CHPL_LAUNCHER is set to slurm-srun
otherwise, if CHPL_COMM is gasnet:

If

CHPL_LAUNCHER

CHPL_COMM_SUBSTRATE=ibv

gasnetrun_ibv

CHPL_COMM_SUBSTRATE=mpi

gasnetrun_mpi

CHPL_COMM_SUBSTRATE=smp

smp

otherwise

none
otherwise CHPL_LAUNCHER is set to none

If the launcher binary does not work for your system (due to an installation-specific configuration, e.g.), you can often use the --dry-run flag to capture the commands that the launcher would have executed on your behalf and customize them for your needs.

Forwarding Environment Variables¶

Chapel launchers generally arrange for environment variables to be forwarded to worker processes. However, this strategy is not always reliable. The remote system may override some environment variables, and some launchers might not correctly forward all environment variables.

CHPL_RT_MASTERIP¶

This environment variable is used to specify the IP address which should be used to connect. By default, the node creating the connection will pass the result of gethostname() on to the nodes that need to connect to it, which will resolve that to an IP address using gethostbyname().

When CHPL_COMM == gasnet, this will also be used to set the value of GASNET_MASTERIP, which corresponds to the hostname of the master node (see https://gasnet.lbl.gov/dist/udp-conduit/README ).

CHPL_RT_WORKERIP¶

This environment variable is used to specify the IP address which should be used to communicate between worker nodes. By default, worker nodes will communicate among themselves using the same interface used to connect to the master node (see CHPL_RT_MASTERIP, above).

When CHPL_COMM == gasnet, this will also be used to set the value of GASNET_WORKERIP (see https://gasnet.lbl.gov/dist/udp-conduit/README ).

Using Slurm¶

To use native Slurm, set:

export CHPL_LAUNCHER=slurm-srun

On Cray systems, this will happen automatically if srun is found in your path, but not when both srun and aprun are found in your path. Native Slurm is the best option where it works, but at the time of this writing, there are problems with it when combined with CHPL_COMM=gasnet and the UDP or InfiniBand conduits. So, for these configurations please see:

Using Chapel with InfiniBand for information about using Slurm with InfiniBand.

Using the UDP Conduit with Slurm for information about using Slurm with the UDP conduit

Common Slurm Settings¶

Optionally, you can specify a node access mode by setting the environment variable CHPL_LAUNCHER_NODE_ACCESS. It will default to exclusive access, but can be overridden to:
- shared to give shared access to nodes
- unset to use the system default and not specify a node access mode
- exclusive to give exclusive access to nodes (this is the default)
For example, to grant shared node access, set:
```
export CHPL_LAUNCHER_NODE_ACCESS=shared
```
Optionally, you can specify a slurm partition by setting the environment variable CHPL_LAUNCHER_PARTITION. For example, to use the ‘debug’ partition, set:
```
export CHPL_LAUNCHER_PARTITION=debug
```
Optionally, you can specify a slurm nodelist by setting the environment variable CHPL_LAUNCHER_NODELIST. For example, to use node nid00001, set:
```
export CHPL_LAUNCHER_NODELIST=nid00001
```
Optionally, you can specify a slurm constraint by setting the environment variable CHPL_LAUNCHER_CONSTRAINT. For example, to use nodes with the ‘cal’ feature (as defined in the slurm.conf file), set:
```
export CHPL_LAUNCHER_CONSTRAINT=cal
```
Optionally, you can specify a slurm account by setting the environment variable CHPL_LAUNCHER_ACCOUNT. For example, to use the account ‘acct’, set:
```
export CHPL_LAUNCHER_ACCOUNT=acct
```
If the environment variable CHPL_LAUNCHER_USE_SBATCH is defined then sbatch is used to launch the job to the queue system, rather than running it interactively as usual. In this mode, the output will be written by default to a file called <executableName>.<jobID>.out. The environment variable CHPL_LAUNCHER_SLURM_OUTPUT_FILENAME can be used to specify a different filename for the output.

Using any SSH-based launcher with Slurm¶

It is possible to use any SSH-based launcher with Slurm, with some additionally effort. This strategy can come in handy if other launchers are not working. However, launchers such as slurm-srun and slurm-gasnetrun_ibv offer a better experience.

First, let’s see how to use an SSH-based launcher with an interactive salloc session. Here we will assume the UDP conduit, but any other launcher supporting SSH can be configured analogously.

# Compile a sample program
chpl -o hello6-taskpar-dist examples/hello6-taskpar-dist.chpl

# Reserve 2 nodes for an interactive run
salloc -N 2
# Then, within the salloc shell

  # Specify that ssh should be used
  export GASNET_SPAWNFN=S
  # Specify the list of nodes to use; SSH_SERVERS can also be used
  export GASNET_SSH_SERVERS=`scontrol show hostnames | xargs echo`
  # Run the program on the 2 reserved nodes.
  ./hello6-taskpar-dist -nl 2

This strategy can also be used within an sbatch script. Here is an example script to save to the file job.bash:

#!/bin/bash
#SBATCH -t 0:10:0
#SBATCH --nodes=2
#SBATCH --exclusive
#SBATCH --partition=chapel
#SBATCH --output=job.output

export GASNET_SPAWNFN=S
export GASNET_SSH_SERVERS=`scontrol show hostnames | xargs echo`

./hello6-taskpar-dist -nl 2

To run this job, use:

sbatch job.bash

and when it completes, the output will be available in job.output as specified in job.bash.

Changing the _real binary suffix¶

In order to support profiling tools that produce new binaries for the launcher to execute, the suffix of the real binary executed by the launcher may be changed with the CHPL_LAUNCHER_SUFFIX environment variable. If this variable is unset, the suffix defaults to “_real”, matching the compiler’s output.

Bypassing the launcher¶

If the Chapel launcher capability fails you completely, set CHPL_LAUNCHER to none, recompile, and execute the resulting binary according to the following rules using tools and queueing mechanisms appropriate for your system:

on most systems, the number of locales should be equal to the number of nodes on which you execute. That in turn should match the number of copies of the program that you are running.
some queueing systems require you to specify the number of cores to use per node. For best results, you will typically want to use all of them. All intra-node parallelism is typically implemented using Chapel’s threading layer (e.g., pthreads), so extra copies of the binary are not required per core.
in our experience, this technique does not work for InfiniBand configurations.

Additional launchers¶

In addition to the supported launchers listed above there are several others that are not actively maintained but may still work.

Launcher Name	Description
mpirun	launch using mpirun (no mpi comm currently)

If	CHPL_LAUNCHER
CHPL_COMM_SUBSTRATE=ibv	slurm-gasnetrun_ibv
CHPL_COMM_SUBSTRATE=mpi	slurm-gasnetrun_mpi

If	CHPL_LAUNCHER
CHPL_COMM_SUBSTRATE=ibv	gasnetrun_ibv
CHPL_COMM_SUBSTRATE=mpi	gasnetrun_mpi
CHPL_COMM_SUBSTRATE=smp	smp
otherwise	none