Using Chapel with InfiniBand

This document describes how to run Chapel across multiple machines in an InfiniBand cluster. Multilocale Chapel Execution describes general information about running Chapel in a multilocale configuration.

Avoiding Slow Job Launch

We've observed very slow job launch on some systems with InfiniBand that were resolved by limiting the memory available for communication, for example with:

export GASNET_PHYSMEM_MAX=1G

Note that setting GASNET_PHYSMEM_MAX will limit amount of memory available to Chapel programs if CHPL_GASNET_SEGMENT=fast.

It might be necessary to also set GASNET_PHYSMEM_NOPROBE=1 - especially if you increase the size of GASNET_PHYSMEM_MAX.

It's probably a good idea to start with this variable set and to try removing it once everything else is working.

Using Slurm for Job Launch

For clusters using Slurm, there are a few options:

  1. The current best option for InfiniBand+Slurm is CHPL_LAUNCHER=slurm-gasnetrun_ibv:

    export CHPL_COMM=gasnet
    export CHPL_COMM_SUBSTRATE=ibv
    
    export CHPL_LAUNCHER=slurm-gasnetrun_ibv
    
    # Rebuild the Chapel runtime for these settings
    cd $CHPL_HOME
    make
    
    # Compile a sample program
    chpl -o hello6-taskpar-dist examples/hello6-taskpar-dist.chpl
    

    See Using Slurm for other options available, such as setting the time limit or selecting the type of node. Some settings might be required by your Slurm configuration. Setting these variables are typically necessary:

    # Specify the Slurm partition to use
    export CHPL_LAUNCHER_PARTITION=debug
    
    # Run the sample program
    ./hello6-taskpar-dist -nl 2
    
  2. An alternative is to use an ssh spawner and configure it to use the nodes allocated by Slurm.

    export CHPL_COMM=gasnet
    export CHPL_COMM_SUBSTRATE=ibv
    
    export CHPL_LAUNCHER=gasnetrun_ibv
    
    # Rebuild the Chapel runtime for these settings
    cd $CHPL_HOME
    make
    
    # Compile a sample program
    chpl -o hello6-taskpar-dist examples/hello6-taskpar-dist.chpl
    

    Now, to run a program, reserve some nodes with salloc and then within the resulting shell, configure the servers to SSH and run the program:

    # Reserve 2 nodes for an interactive run
    salloc -N 2
    # Then, within the salloc shell
    
      # Specify that ssh should be used
      export GASNET_IBV_SPAWNER=ssh
      # Run the program on the 2 reserved nodes.
      # gasnetrun_ibv will use the nodes Slurm allocated above.
      ./hello6-taskpar-dist -nl 2
    

    This technique is also possible when using sbatch. In that case, make sure your sbatch script includes the line:

    export GASNET_IBV_SPAWNER=ssh
    

    See Using any SSH-based launcher with Slurm for more information on these techniques.

  3. A further alternative is to configure GASNet to use mpirun to launch your program. mpirun might already be configured to work with Slurm. See using-mpi-for-job-launch.

Using SSH for Job Launch

To launch InfiniBand jobs with SSH, use the following

export CHPL_COMM=gasnet
export CHPL_COMM_SUBSTRATE=ibv

export CHPL_LAUNCHER=gasnetrun_ibv

# Rebuild the Chapel runtime for these settings
cd $CHPL_HOME
make

# Compile a sample program
chpl -o hello6-taskpar-dist examples/hello6-taskpar-dist.chpl

# Specify that ssh should be used
export GASNET_IBV_SPAWNER=ssh
# Specify the nodes to run on
export GASNET_SSH_SERVERS="host1 host2 host3 ..."

Using MPI for Job Launch

To launch InfiniBand jobs with mpirun, first make sure that mpicc is available and that MPI programs launch appropriately with mpirun. Then use the following. You'll want to make sure that GASNet detects MPI in its configuration output.

export CHPL_COMM=gasnet
export CHPL_COMM_SUBSTRATE=ibv

export CHPL_LAUNCHER=gasnetrun_ibv

# Rebuild the Chapel runtime for these settings
cd $CHPL_HOME
make

# Compile a sample program
chpl -o hello6-taskpar-dist examples/hello6-taskpar-dist.chpl

# Specify that ssh should be used
export GASNET_IBV_SPAWNER=mpi

See Also

For more information on these and other available GASNet options, including configuring to launch through MPI, please refer to GASNet's official InfiniBand conduit documentation, which can also be found in $CHPL_HOME/third-party/gasnet/gasnet-src/ibv-conduit/README.