LLVM Support¶
The Chapel compiler can be built with LLVM support in able to enable the following features:
extern block support (see C Interoperability). This feature uses the clang parser. Note that it is not necessary to use the LLVM code generator in order to use extern block support.
LLVM code generator. The LLVM code generator is the default when the Chapel compiler is built with LLVM. It can be selected with
CHPL_TARGET_COMPILER=llvm
and toggled off with e.g.CHPL_TARGET_COMPILER=gnu
.Experimental LLVM communication optimizations. You can activate these communication optimizations with
--llvm-wide-opt
. Some benchmark programs run faster with these LLVM communication optimizations.
Building the LLVM support¶
To build the compiler with LLVM support for extern blocks, LLVM code
generation, and support for --llvm-wide-opt
:
source ./util/setchplenv.bash
export CHPL_LLVM=bundled
# or, if you have already installed compatible LLVM libraries
# export CHPL_LLVM=system
make # you might want to do e.g. make -j 16 for a parallel build
Note:
If you have a built llvm in
third-party/llvm/install
, even if you forget toexport CHPL_LLVM=bundled
, the default will be to use the built llvm. You can override this default by settingCHPL_LLVM=none
.the Makefile in third-party/llvm will unpack LLVM and Clang source releases and build them
LLVM code generation has not been tested on all supported configurations, and some features (such as building a library instead of an executable) are not yet supported.
You can set the environment variable
CHPL_LLVM_DEVELOPER
to request a debug build of LLVM.
Activating the LLVM support¶
Compiling a program with an LLVM-enabled chpl will use the LLVM backend by default but this can be controlled with CHPL_TARGET_COMPILER.
If you pass a --savec
directory, the LLVM backend will emit two .bc files
in that directory:
chpl__module.bc
is the version that will be linkedchpl__module-nopt.bc
is the generated code without optimizations applied.
Passing --fast
will cause LLVM optimizations to run.
The --ccflags
option can control which LLVM optimizations are run, using the
same syntax as flags to clang.
Additionally, if you compile a program with --llvm-wide-opt --fast
,
you will allow LLVM optimizations to work with global memory. For
example, the Loop Invariant Code Motion (LICM) optimization might be able
to hoist an access of a remote variable - ie, a ‘get’ - out of a loop.
This optimization has produced better performance with some benchmarks.
Caveats:
--llvm-wide-opt
may add communication to or from a task’s stack, so it may not function correctly for combinations of tasking and communication layers in which some task has a stack outside of an acceptable region for communication (e.g. operations on the initial ‘main’ thread may fail withCHPL_COMM=gasnet
,CHPL_GASNET_SEGMENT=fast
).
How --llvm-wide-opt works
¶
Communication optimization within LLVM uses the address space feature of LLVM
in order to create a conceptual global address space. In particular, instead of
generating a call to the runtime functions to ‘put’ or ‘get’, when
--llvm-wide-opt
is enabled, the Chapel compiler will generate a load,
store, or memcpy using an address space 100 pointer. Address space 100 pointers
represent global memory - and address space 0 pointers continue to represent
local memory. The existing LLVM optimization passes will operate normally on
these address space 100 operations. The LLVM documentation describes these
optimizations and which are normally run.
Because it may be necessary to build a global pointer or to gather information
from it - for example when constructing a global pointer from a node number and
a local address, or extracting the node number or the address - the LLVM code
generated with --llvm-wide-opt
includes calls to nonexistent functions to
mark these operations:
.gf.addr extracts an address from a global pointer
.gf.loc extracts a locale from a global pointer
.gf.node extracts a node number from a global pointer
.gf.make constructs a global pointer from a locale and an address
.gf.g2w converts a global pointer to a wide pointer
.gf.w2g converts a wide pointer to a global pointer
These functions will be replaced with the usual runtime functions once all global pointers are lowered into wide pointers by the global-to-wide pass.
After the usual LLVM optimization passes run, two Chapel LLVM passes run:
aggregate-global-ops bundles together sequences of loads or sequences of stores on adjacent global memory locations into a single memcpy. That way, adjacent loads will generate a single ‘get’ instead of several ‘get’ calls.
global-to-wide converts operations on address space 100 pointers, notably including load, store, memcpy, and memset operations, into calls to the Chapel runtime. It converts address space 100 pointers into packed pointers and any of the special function calls (e.g. .gf.addr to extract the local address portion of a global pointer) into the usual operations on a packed pointer. In the future, we would like to support converting address space 100 pointers into the usual Chapel wide pointer format.