When compiling Chapel programs for multiple locales, a launcher binary is typically created that will execute the appropriate command(s) to get your program started. For example, when compiling for multiple locales, typically two binaries will be generated by the compiler (e.g., myprogram and myprogram_real). The first binary contains code to get your program up and running on multiple locales while the second contains your actual program code.
The goals of the launcher binary are:
- to wrap details of job startup in a portable way so that new users can quickly get Chapel programs up and running on an unfamiliar platform.
- to perform command-line parsing and error checking prior to waiting in a queue or firing off a parallel job in order to save time and resources related to simple errors/typos in the command line.
- to preserve Chapel's global-view programming model by permitting the user to run their program using a single binary (corresponding to the single logical task that executes main()) without getting bogged down in questions of numbers of nodes, numbers of cores per node, numbers of program instances to start up, etc.
- if necessary, to coordinate runtime functional activity, such as I/O.
Executing a Chapel program using the verbose (
-v) flag will typically
print out the command(s) used to launch the program.
Executing using the help (
--help) flag will typically print out
any launcher-specific options in addition to the normal help message for
the program itself.
Currently Supported Launchers¶
Currently supported launchers include:
|amudprun||GASNet launcher for programs running over UDP|
|aprun||Cray application launcher using aprun|
|gasnetrun_ibv||GASNet launcher for programs running over Infiniband|
|gasnetrun_mpi||GASNet launcher for programs using the MPI conduit|
|gasnetrun_ofi||GASNet launcher for programs using the OFI conduit|
|gasnetrun_psm||GASNet launcher for programs running over OmniPath|
|lsf-gasnetrun_ibv||GASNet launcher using LSF (bsub) over Infiniband|
|pbs-aprun||Cray application launcher using PBS (qsub) + aprun|
|pbs-gasnetrun_ibv||GASNet launcher using PBS (qsub) over Infiniband|
|slurm-gasnetrun_ibv||GASNet launcher using SLURM over Infiniband|
|slurm-srun||native SLURM launcher|
|smp||GASNet launcher for programs running over shared-memory|
|none||do not use a launcher|
A specific launcher can be explicitly requested by setting the
CHPL_LAUNCHER environment variable. If left unset, a default is picked as
CHPL_PLATFORMis cray-xc, cray-xe, or cray-xk:
If CHPL_LAUNCHER both aprun and srun in user's path none aprun in user's path aprun srun in user's path slurm-srun otherwise none
If CHPL_LAUNCHER CHPL_COMM_SUBSTRATE=ibv gasnetrun_ibv CHPL_COMM_SUBSTRATE=mpi gasnetrun_mpi CHPL_COMM_SUBSTRATE=mxm gasnetrun_ibv CHPL_COMM_SUBSTRATE=ofi gasnetrun_ofi CHPL_COMM_SUBSTRATE=psm gasnetrun_psm CHPL_COMM_SUBSTRATE=smp smp CHPL_COMM_SUBSTRATE=udp amudprun otherwise none
CHPL_LAUNCHERis set to none
If the launcher binary does not work for your system (due to an
installation-specific configuration, e.g.), you can often use the
flag to capture the commands that the launcher executes on your behalf
and customize them for your needs.
Forwarding Environment Variables¶
Chapel launchers generally arrange for environment variables to be forwarded to worker processes. However, this strategy is not always reliable. The remote system may override some environment variables, and some launchers might not correctly forward all environment variables.
To use native Slurm, set:
On Cray systems, this will happen automatically if srun is found in your path, but not when both srun and aprun are found in your path. Native Slurm is the best option where it works, but at the time of this writing, there are problems with it when combined with UDP or InfiniBand conduits. So, for these configurations please see:
Common Slurm Settings¶
Before running, you will need to set the amount of time to request from SLURM. For example, the following requests 15 minutes:
Another Slurm variable that usually needs to be set is the Slurm partition to use. For example, set the Slurm partition to 'debug' with the commands:
export SALLOC_PARTITION=debug export SLURM_PARTITION=$SALLOC_PARTITION
If needed, you can request a specific node feature from SLURM by putting
it in the
CHPL_LAUNCHER_CONSTRAINT environment variable. For example,
to use nodes with the 'cal' feature (as defined in the slurm.conf
If this environment variable is undefined, SLURM may use any node in the computer.
If the environment variable
CHPL_LAUNCHER_USE_SBATCH is defined then
sbatch is used to launch the job to the queue system, rather than
running it interactively as usual. In this mode, the output will be
written by default to a file called <executableName>.<jobID>.out. The
CHPL_LAUNCHER_SLURM_OUTPUT_FILENAME can be used
to specify a different filename for the output.
Using any SSH-based launcher with Slurm¶
It is possible to use any SSH-based launcher with Slurm, with some additionally effort. This strategy can come in handy if other launchers are not working. However, launchers such as slurm-srun and slurm-gasnetrun_ibv offer a better experience.
First, let's see how to use an SSH-based launcher with an interactive salloc session. Here we will assume the UDP conduit, but any other launcher supporting SSH can be configured analogously.
# Compile a sample program chpl -o hello6-taskpar-dist examples/hello6-taskpar-dist.chpl # Reserve 2 nodes for an interactive run salloc -N 2 # Then, within the salloc shell # Specify that ssh should be used export GASNET_SPAWNFN=S # Specify the list of nodes to use export GASNET_SSH_SERVERS=`scontrol show hostnames | xargs echo` # Run the program on the 2 reserved nodes. ./hello6-taskpar-dist -nl 2
This strategy can also be used within an sbatch script. Here is an example script to save to the file job.bash:
#!/bin/bash #SBATCH -t 0:10:0 #SBATCH --nodes=2 #SBATCH --exclusive #SBATCH --partition=chapel #SBATCH --output=job.output export GASNET_SPAWNFN=S export GASNET_SSH_SERVERS=`scontrol show hostnames | xargs echo` ./hello6-taskpar-dist -nl 2
To run this job, use:
and when it completes, the output will be available in job.output as specified in job.bash.
Changing the _real binary suffix¶
In order to support profiling tools that produce new binaries for the
launcher to execute, the suffix of the real binary executed by the
launcher may be changed with the
variable. If this variable is unset, the suffix defaults to "_real",
matching the compiler's output.
Bypassing the launcher¶
If the Chapel launcher capability fails you completely, set
CHPL_LAUNCHER to none, recompile, and execute the resulting binary
according to the following rules using tools and queueing mechanisms
appropriate for your system:
- on most systems, the number of locales should be equal to the number of nodes on which you execute. That in turn should match the number of copies of the program that you are running.
- some queueing systems require you to specify the number of cores to use per node. For best results, you will typically want to use all of them. All intra-node parallelism is typically implemented using Chapel's threading layer (e.g., pthreads), so extra copies of the binary are not required per core.
- in our experience, this technique does not work for InfiniBand configurations.
In addition to the supported launchers listed above there are several others that are not actively maintained but may still work.
|loadleveler||launch using IBM loadleveler (still needs refining)|
|mpirun||launch using mpirun (no mpi comm currently)|