Using Chapel on Intel "Knights Landing"

The following information is assembled to help Chapel users get up and running on Intel Xeon Phi, Knights Landing (KNL).

The initial implementation has only been tested on Cray machines where the KNL chip is used as a self-hosted processor. While we have not explicitly tested on other platforms, or with KNL as a Coprocessor, we don't know of any reason an advanced user couldn't run on such a platform.

Getting started

By and large running on KNL on a Cray XC will be the same as running on a Xeon based Cray XC. See Using Chapel on Cray Systems for more info.

In order to take better advantage of the AVX-512 micro-architecture you'll want to have a KNL targeting module loaded. For example:

module load craype-mic-knl

You'll also want to ensure you have a new enough target compiler loaded that is KNL/AVX-512 ready. We recommend using at least gcc 5.3, cce 8.5, or intel 16.

We provide a KNL locale model for making use of the MCDRAM (High Bandwidth Memory/HBM). Please see KNL Locale Model for details.

KNL provides many memory configurations and clustering modes. If a program will manage HBM explicitly using the KNL locale model, the flat memory configuration is a good place to start. If not, the cache configuration will use the HBM as a level 3 cache. Two in-between configurations are available, equal for 50% cache/50% explicitly managed, and split for 25% cache/75% explicitly managed. It is likely that the highest performing configuration is different for different programs, so it pays to experiment.

So far, the clustering modes that seem to have the most promise for Chapel programs are quad and snc4. The quad mode, contrary to its name, places all the KNL cores in one NUMA node. The snc4 mode splits the cores equally into four NUMA nodes. Again, experimentation will tell which works best for a given program, and other clustering modes may be worth trying as well.

The method of choosing the clustering mode and memory configuration is different for different machine installations. Please see your system administrator for more information on this. When using Slurm, a common method is to use the --constraint= flag to srun. For Chapel, this flag can be set using the CHPL_LAUNCHER_CONSTRAINT environment variable. For example, to use the snc4 clustering mode and the flat memory model, the following would be used.

export CHPL_LAUNCHER_CONSTRAINT=snc4,flat