LLVM Support¶
The Chapel compiler uses LLVM support by default where possible. LLVM support enables the following features:
extern block support (see C Interoperability). This feature uses the clang parser. Note that it is not necessary to use the LLVM code generator in order to use extern block support.
LLVM code generator. The LLVM code generator is the default when the Chapel compiler is built with LLVM. It can be selected with
CHPL_TARGET_COMPILER=llvm
and toggled off with e.g.CHPL_TARGET_COMPILER=gnu
.Experimental LLVM communication optimizations. You can activate these communication optimizations with
--llvm-wide-opt
. Some benchmark programs run faster with these LLVM communication optimizations.
Enabling the LLVM support¶
Please see Setting up Your Environment for Chapel for more information about enabling LLVM
support. In many cases, it amounts to installing a compatible LLVM
package and setting export CHPL_LLVM=system
, and then rebuilding the
compiler.
Note that, when using CHPL_LLVM=bundled
, you can set the environment
variable CHPL_LLVM_DEVELOPER
to request a debug build of LLVM.
Compiling a program with an LLVM-enabled chpl
will use the LLVM backend
by default but this can be controlled with CHPL_TARGET_COMPILER
e.g.
with CHPL_TARGET_COMPILER=llvm
and CHPL_TARGET_COMPILER=gnu
.
Inspecting the Generated Code¶
It is possible to request output of the LLVM IR or assembly code
generated by the compilation process. To do so, use the experimental
--llvm-print-ir
and --llvm-print-ir-stage
flags.
--llvm-print-ir
accepts a comma-separated list of function names to show
--llvm-print-ir-stage
indicates which compiled form of the functions to show, where the options include:
none
– LLVM IR without any LLVM optimization
basic
– LLVM IR with basic LLVM optimization
full
– LLVM IR with full LLVM optimization
asm
– resulting assembly code (including all optimization)
every
– show LLVM IR after every optimization pass possible
In addition, the LLVM IR can be explored if you pass –savec with a directory. In the passed directory, the LLVM backend will emit two .bc files:
chpl__module.bc
is the version that will be linked
chpl__module-nopt.bc
is the generated code without optimizations applied.
Inspecting Individual LLVM Passes¶
When debugging LLVM optimizations, it can be useful to inspect what passes have
run and what they changed. The Chapel compiler supports the flag
--llvm-print-passes
, which will print all the LLVM passes that will be run.
These pass names are printed as a pipeline, so the output can be fed into
something like opt --passes='...'
.
This information can be combined with dumping LLVM IR, so that developers can
focus on the LLVM IR level transformations without needing to worry about the
frontend. The following flags are very useful for printing and manipulating
LLVM IR. All should be passed as --mllvm <flag>
, for example
--mllvm --print-after-all
.
--print-before=<PASSES>
Enables printing the LLVM IR before each pass.
Takes a comma separated list of LLVM passes.
--print-before-all
Enables printing the LLVM IR before every pass.
--print-after=<PASSES>
Enables printing the LLVM IR after each pass.
Takes a comma separated list of LLVM passes.
--print-after-all
Enables printing the LLVM IR after every pass.
--print-module-scope
When printing LLVM IR, always print the module level scope.
This flag generally allows the output to be passed to
opt
separately from Chapel.
--filter-print-funcs=<FUNCTIONS>
When printing LLVM IR, only print the IR for the listed functions
Takes a comma separated list of LLVM IR function names.
Optimization Options¶
Passing --fast
will cause LLVM optimizations to run.
The --ccflags
option can control which LLVM optimizations are run,
using the same syntax as flags to clang.
Experimental optimization with --llvm-wide-opt
¶
If you compile a program with the experimental flag --llvm-wide-opt
--fast
, you will allow LLVM optimizations to work with global memory.
For example, the Loop Invariant Code Motion (LICM) optimization might be
able to hoist an access of a remote variable - ie, a ‘get’ - out of a
loop. This optimization has produced better performance with some
benchmarks.
Please see LLVM-based Communication Optimizations for PGAS Programs by Hayashi et al. for more information about this flag and its implementation. Note that locality optimizations and transfer coalescing are not yet available in Chapel releases.
Caveats:
--llvm-wide-opt
may add communication to or from a task’s stack, so it may not function correctly for combinations of tasking and communication layers in which some task has a stack outside of an acceptable region for communication. At this point all communication layers should support communication to or from a task’s stack, but it comes up rarely.
Communication optimization within LLVM uses the address space feature of LLVM
in order to create a conceptual global address space. In particular, instead of
generating a call to the runtime functions to ‘put’ or ‘get’, when
--llvm-wide-opt
is enabled, the Chapel compiler will generate a load,
store, or memcpy using an address space 100 pointer. Address space 100 pointers
represent global memory - and address space 0 pointers continue to represent
local memory. The existing LLVM optimization passes will operate normally on
these address space 100 operations. The LLVM documentation describes these
optimizations and which are normally run.
Because it may be necessary to build a global pointer or to gather information
from it - for example when constructing a global pointer from a node number and
a local address, or extracting the node number or the address - the LLVM code
generated with --llvm-wide-opt
includes calls to nonexistent functions to
mark these operations:
.gf.addr extracts an address from a global pointer
.gf.loc extracts a locale from a global pointer
.gf.node extracts a node number from a global pointer
.gf.make constructs a global pointer from a locale and an address
.gf.g2w converts a global pointer to a wide pointer
.gf.w2g converts a wide pointer to a global pointer
These functions will be replaced with the usual runtime functions once all global pointers are lowered into wide pointers by the global-to-wide pass.
After the usual LLVM optimization passes run, two Chapel LLVM passes run:
aggregate-global-ops bundles together sequences of loads or sequences of stores on adjacent global memory locations into a single memcpy. That way, adjacent loads will generate a single ‘get’ instead of several ‘get’ calls.
global-to-wide converts operations on address space 100 pointers, notably including load, store, memcpy, and memset operations, into calls to the Chapel runtime. It converts address space 100 pointers into packed pointers and any of the special function calls (e.g. .gf.addr to extract the local address portion of a global pointer) into the usual operations on a packed pointer. In the future, we would like to support converting address space 100 pointers into the usual Chapel wide pointer format.
Inspecting LLVM Optimizations¶
It may be useful to determine if specific LLVM optimizations ran and what the results were. LLVM remarks allow optimization passes to report what happened.
To request optimization remarks, use the experimental --llvm-remarks
and
--llvm-remarks-function
flags.
--llvm-remarks
accepts a regular expression which matches and filters optimization pass names.
'.''
– shows remarks for all optimization passes
inline
– shows remarks for any optimization pass which matches ‘inline
’
(slp|loop)-vectorize
– shows remarks for any optimization pass which matches ‘slp-vectorize
’ or ‘loop-vectorize
’
--llvm-remarks-function
accepts a comma-separated list of function names to show. Not passing this flag will show all functions
These flags are also affected by if -g
is set or not and whether
CHPL_DEVELOPER
/ --[no]-devel
is set or not. Without -g
, the
ability of LLVM to map remarks back to Chapel source code is limited. The
compiler makes a best effort attempt to get Chapel source code information. If
the compiler is run in developer mode and no function filters are set, it will
output remarks for all code including standard and internal modules. Otherwise
remarks will be limited to user modules only.
Note
Introducing debug symbols with -g
or changing the state of CHPL_DEVELOPER
may change what optimizations can be done.