The Chapel Parallel Programming Language

 

Chapel Performance Tips

Use the --fast flag

Once you have a Chapel program that you believe to be correct and want to run performance timings on it, make sure to compile with the --fast flag. This turns off a number of execution-time checks, turns on back-end compiler optimizations, and is key to achieving competitive performance with Chapel. See its entry on the compiler man page for details.

Why don't we compile Chapel programs with --fast by default? Because then most user coding errors like out-of-bounds indexing or nil-dereferences would get reported to us as bugs. We believe that having Chapel catch such errors by default and then requiring users take off the safety belt once they're ready to go fast is most productive for everyone.

Check your Communications

Most bad performance for multi-locale Chapel programs is due to inadvertently doing too much communication. Though Chapel makes it trivial to refer to remote values, doing so frequently can kill performance. You can instrument your program to see where communication is being introduced using the CommDiagnostics module or the chplvis tool.

Once you've found a section of your Chapel program that communicates more than it should, there are a variety of ways of fixing it including caching remote values manually or using advanced language features for asserting locality. If you need help with these...

Engage the Chapel Team

Though Chapel performance has improved by leaps and bounds in recent years and can typically be made competitive with C + MPI + OpenMP, there are still plenty of cases where our implementation doesn't do as well as it should. Because it can be difficult to tell whether a performance problem is due to a problem on your side or ours, please don't hesitate to contact the Chapel development team through the channels available to users in order to get help with performance debugging your code. In addition to saving you time and frustration, this can provide us with valuable feedback about where we should do further tuning and optimization.