Chapel Testing System¶
The Chapel testing system is a key piece of technology for the Chapel developer. We use it as a harness for doing test-driven development, for performing sanity checks on code before committing it, for bug and issue tracking, and for nightly correctness and performance regression testing. Getting really comfortable with it is one of the most important things a developer can do early in the development cycle.
The tests for the testing system are located in
The main script that drives the test system itself is
$CHPL_HOME/util/start_test, though it relies on several helper scripts
This document provides only a high-level introduction to the testing system. For further details, ask a core Chapel developer for suggestions. You can also get a sense for the test system by looking through the test directory itself to see how it is used in practice.
How to Make¶
A Correctness Test¶
Though trivial, this test is available at $CHPL_HOME/test/Samples/Correctness in the Chapel source repository
A simplest use of the test system is to create a
.chpl file containing
some Chapel code and a
.good file containing the expected output. For
example, given a directory containing two files:
The test system can be exercised by invoking:
This is assuming
$CHPL_HOME/util/ is in the user’s $PATH, which is
taken care of when sourcing
This will cause the compiler to compile hi.chpl. If compiling hi.chpl does not
cause a compilation failure, start_test will then execute the resulting binary.
The concatenation of the compiler and executable output will then be compared
.good file. A transcript of the test system’s actions is
printed to the console and also stored in
$CHPL_HOME/test/Logs/ by default.
For more information on using
start_test, see Invoking start_test.
Outside Arguments or Settings¶
In addition to the simplest form of test shown above, the test system supports a number of additional options for creating more complex tests.
These options are all specified using files in the same directory as the test.
Some files apply to a directory as a whole while others will apply to a single
test by sharing the same base filename. Those files which impact the entire
directory are named in upper case, e.g.
They can be overridden or augmented with test-specific settings using the same
name but in lower case, e.g.
To specify arguments to the compiler, provide a
file for the test. All options for a single compilation should be on the same
line - specifying multiple lines will result in multiple compilations of the
For instance, to specify that the program should be compiled statically, this file would be provided:
To specify that the program should be compiled once statically and once dynamically, the file would look like this:
Note that sometimes different compilation arguments will result in different output. Testing Different Behavior in Different Settings provides guidance on how a test could respond to different behavior without modifying the output that is generated.
Specification of arguments for execution time is performed similarly, using
.execopts file. Should both an
.execopts and a
.compopts file be provided for a test, their options will be used in
combination. For example, a test specified like this:
config var x = true; if (x) then writeln(5); else writeln(7);
will be compiled twice, and executed four times by
chpl --static multiple-options.chpl
chpl --dynamic multiple-options.chpl
Note that sometimes different execution arguments will result in different output. Testing Different Behavior in Different Settings provides guidance on how a test could respond to different behavior without modifying the output that is generated.
Environment variables can be set for a particular test or directory using a
EXECENV file. Each environment variable must be specified
on a separate line, but all will be set for a particular run.
Here is an example
Controlling How It Runs¶
The testing system has a variety of files that can fine tune when a test gets run.
If the test should only be compiled and not executed, mark it with an empty file
with the suffix
foo.noexec. If the test should not be
compiled or executed on its own (for instance, if it is solely a helper file for
another test), give an empty file with the suffix
.notest. A directory with
NOTEST file will similarly not be run by the testing system (unless
its contents are explicitly listed in the call to
Limiting Time Taken¶
start_test will kill a test that has taken longer than 300 seconds
to execute or has been compiling for longer than four times the execution
The execution timeout value can be overridden for a test by specifying the
number of seconds in a
.timeout file. It can be set either higher than the
default timeout (for tests that take an unusually long time to run) or lower
(for tests that are expected to finish very quickly). The former is used more
frequently, but the latter is useful when diagnosing a test failure - if the
test is usually quick but occasionally hangs, a smaller timeout value can help
speed up the time to run the testing system when the failure mode does occur.
Note that if the value in this file is longer than the global timeout, any
-num-trials value or
.perfnumtrials file will be ignored (see
A Performance Test for more details on the
Tests With Varying Output¶
Limiting Where the Test Runs¶
Sometimes a test is only applicable to certain test environments: it might rely
on multi-locale state, or change its behavior dramatically depending on if
optimizations are used, for instance. If a test is only intended to run in
certain settings, a
.skipif file should be used.
SKIPIF file or a test-specific
.skipif file can take
two forms. The first is a line separated list of easily computed conditions,
any one of which will cause the test not to run in that particular setting. For
instance, the following file would only allow
foo.chpl to run in a
CHPL_COMM != none
This is useful when the conditions required to skip a test can be easily
determined from the environment. A condition of
<= indicates that the test
should be skipped when the environment variable on the left contains the
contents on the right, while
>= indicates the opposite - this is useful for
imprecise matches, e.g.
CHPL_HOST_PLATFORM >= cygwin would cause the test to
run on both
The second form a
SKIPIF file can take is that of a script.
This form is intended for conditions that require some computation to determine,
or when the combination of conditions is necessary (i.e. this setting and
this setting are required for the behavior we want to avoid). The script can be
in any commonly supported scripting language, usually bash or python. The
SKIPIF file must have executable permissions for this form to
True to standard output will result in the test being
skipped, while printing
False will result in the test being run.
#!/usr/bin/env python3 import os print(os.getenv('CHPL_TEST_PERF') == 'on' and os.getenv('CHPL_ATOMICS') == 'locks')
would cause the test to be skipped when performance testing is done with CHPL_ATOMICS=locks, but not ordinary performance testing, or correctness testing with CHPL_ATOMICS=locks
Testing Different Behavior in Different Settings¶
If a test is intended to work in all settings but will have slightly different
behavior in some situations, it is appropriate to add additional
for those settings. Some of these additional
.good files will be used
automatically by the testing system, while others will need to be specified
explicitly in the
.execopts file for the test.
start_test automatically recognizes
.good files with prefixes for
--no-local, communication layer, locale model, and
chpldoc. For example:
.comm-none.good: used with CHPL_COMM=none (the unqualified
.goodfile will then apply for CHPL_COMM != none)
.no-local.good: used with
.na-none.good: used with CHPL_NETWORK_ATOMICS=none
.tasks-fifo.good: used with CHPL_TASKS=fifo
.doc.good: used when testing
lm- can be combined, in that order.
Requests can be made for supporting additional formats if a common format does not appear to be covered automatically.
If only some compilations or executions of a test need a specialized
file, a comment on the same line as the relevant options can be used. For
--x=true # foo.true.good --x=false # foo.false.good
will compare test output to
foo.true.good for the first execution and
foo.false.good for the second.
Any line that is unlabeled will use the default
.good for that test.
Undefined behavior will occur when both the
files specify a
.good file in this way.
Using precomp and prediff files¶
When creating a
.prediff file, the file must be an
executable. You can turn your script into an executable by running:
chmod +x foo.precomp.
A Performance Test¶
This section covers how to make a performance test, including:
how to indicate it is a performance test
how to specify which parts of the output should be tracked
how to validate the output
how to specify compilation and execution options that are different from the test’s normal run
how to track output for multiple tests
how to compare against a version written in C
how to graph the data that has been tracked
[Files used to illustrate the running example here can be found at $CHPL_HOME/test/Samples/Performance in the Chapel source repository]
Identifying Performance Keys¶
Most of the information above pertains to the creation of a correctness test, in
which the test’s output is compared to a
.good file. The testing system
also supports performance tests in which one or more values from a test’s output
can be tracked on a nightly basis and optionally graphed. Information about
running a performance test can be found in Performance Testing.
Performance tests are specified using a
.perfkeys file, which lists strings
that the test system should look for in the output serving as prefixes for a
piece of data to track. When crawling a directory hierarchy, only tests with
.perfkeys files will be considered when testing in performance mode. For
example, if a test named
foo.chpl generates output in the following format:
Time: 194.3 seconds Memory: 24GB Validation: SUCCESS
one could track the two numeric values using a
.perfkeys file as
As the test system runs, it will look for the specified performance keys in the test output and store the string following the key as part of the performance test output (described below). Note that one could also track the Validation string in this way, though there are better ways to track success/failure conditions, described in the next section.
Validating Performance Test Output¶
In addition to identifying key-value pairs to track, performance
testing can also do some simple validation of test output using
regular expression-based matching. A line starting with
reject:[<line#>:]) followed by a regular
expression will ensure that the test output contains (does not
contain) the given regular expression, and count any surprises as
failures in the testing results. The optional line# constrains what
line number the output must appear on, where a negative number
indicates that the counting should start at the end of the file.
For example, adding a third line to the
.perfkeys file, we can also
verify that the last line of output contains the string “SUCCESS”:
Time: Memory: verify:-1: SUCCESS
Accumulating Performance Data in .dat files¶
The values collected during performance testing are stored in a
.dat file in
the directory specified by
$CHPL_TEST_PERF_DIR (if undefined, the test
system defaults to
$CHPL_HOME/test/perfdat/<machineName>). Each time the
test is run in performance mode, a new line of data is added to the end of the
.dat file. The line will start with the date, and the data for each key
will be tab-separated. The base name for the
.dat file is taken from the
.perfkeys file. For example, the output for the test above would be stored
in a file named
Here is a sample
.dat file, for the performance test at
# Date Time: Memory: 03/26/18 194.3 24 04/02/18 194.3 24
Because the lines are tab-separated, the key will not necessarily “line up” visually with the corresponding header. Modifying these files by hand is inadvisable.
Performance tests submitted to the Chapel repository are run on a nightly basis,
.dat files. Modifications to the
specify them will impact the
.dat files that have already been
generated, so please be careful when updating already existing performance
Note that in practice, most tests are written to be run in both a
correctness and a performance mode, using a
bool config const to skip
the printing of nondeterministic data such as the Time (and possibly
Memory) values above. We tend to make tests run in performance mode
by default and use a
foo.execopts file to make the correctness testing
flip this switch (since end users will typically want the performance
data on and there’s nothing worse than firing off a long run only to
find you didn’t turn on the performance metrics).
Other Performance Testing Options¶
Like correctness testing, performance testing supports the ability to
specify different compiler and execution-time options, etc. This is
done using files, as in correctness testing, where the filenames tend
to start with
.perf*. For example,
specify compiler options that should be used when compiling the test
for performance mode while
foo.perfexecopts specifies execution-time
options for performance testing.
Comparing Multiple Versions¶
Most performance tests are most interesting when comparing multiple
things to one another – for example, multiple implementations of
an algorithm, a test compiled in various configurations, a Chapel vs.
C version, etc. The approach typically taken here is to have each
configuration write output to its own
.dat file and then to graph
columns from various
.dat files against one another.
To compare multiple distinct Chapel tests, the approach is easy;
simply make each one a performance test with a distinct name. (In
fact, Chapel performance tests must have unique names across the
entire testing system because all
.dat files are placed into a single
directory at the end; the system itself checks for conflicts and
complains if it finds any).
To compare a single Chapel test compiled or run in multiple
configurations, the approach taken is to use multi-line versions of
.perfexecopts files, where each line represents a
different configuration that should be tested. Each option line
should be concluded with a
# comment delimiter, after which a
.perfkeys file should be named. For example, to compare two
problem sizes, one might use:
--n=100 # bar-100.perfkeys --n=10000 # bar-10000.perfkeys
This would cause
bar.chpl to be compiled once and executed twice, one
--n=100 and the second time with
--n=10000. The first execution
bar-100.perfkeys for its performance keys and write its
bar-100.dat while the second would use
and write its output to
Comparing to a C version¶
To compare a C version of a test to a Chapel version, the C version of the test
must end with the suffix
.test.c for single locale tests and
for multilocale tests. Since
.dat files must have unique names, the base
name for the C test should vary from the Chapel equivalent. For example, I
might name the C version of the
foo.chpl performance test
Like any other test, the C test needs a
.good file for correctness testing
.perfkeys file for performance testing.
C versions do not have to be performance tests, but this is their most common use case.
Creating a graph comparing multiple variations¶
Once you are creating multiple
.dat files containing data you would
like to graph, you’ll create a
.graph file indicating which data from
.dat files should be graphed. For example, to compare the
timing data from the
foo-c.c tests described above, one
might use the following
foo.graph file (note that the graph file’s
basename need not have any relation to the tests it is graphing since
they are typically pulling from multiple
.dat files; making the
filename useful to human readers is the main consideration).
perfkeys: Time:, Time: files: foo.dat, foo-c.dat graphkeys: Chapel version, C version ylabel: Time (seconds) graphtitle: Sample Performance Test (Bogus)
Briefly, the following three entries need to have the same arity, corresponding to the lines in the graph:
perfkeys:is a comma-separated list of perfkeys to graph from…
files:…the comma-separated list of .dat files, respectively
graphkeys:this is a comma-separated list of strings to use in the graph’s legend.
The following two entries are singletons:
ylabel:a label for the graph’s y-axis (the x-axis will be the date the test was run by default)
graphtitle:a title for the graph as a whole
Finally, add the
.graph file to
$CHPL_HOME/test/GRAPHFILES. This file
is separated into a number of suites (indicated by comments) followed by graphs
that should appear in those suites (a graph may appear in multiple suites).
This file determines how graphs are organized on the Chapel performance graphing
webpages (currently hosted at
.graph file exists and is listed in
start_test -performance will cause the test system to not only create
.dat files, but also to create a graph as described in the .graph
file. To view the graph, point your browser to
$CHPL_TEST_PERF_DIR/<machinename>/html/index.html. Then select the
suite(s) in which your graph appears, and you should see data for it.
(Note that for a new graph with only one day of data, it can be hard
to see the singleton points at first).
Multilocale Performance Testing¶
Writing a performance test for multilocale setting has similarities to single locale performance testing and multilocale correctness testing. However, helper file suffixes differ from the previously covered ones as follows:
Single Locale Performance
Graph files for multilocale performance tests are listed in
Finally to run a multilocale performance test
start_test --perflabel ml-
must be used.
Multilocale Communication Counts Testing¶
Another type of multilocale testing is where the number of communication calls (e.g. GETs, PUTs, ONs) generated is tracked. These numbers can be obtained with the help of CommDiagnostics module and be printed out similar to printing out the time elapsed or throughput.
Communication counts testing is only applicable in a multilocale setting, and it
is similar to multilocale performance testing. However, for helper files
label is used instead of
Test Your Test Before Submitting¶
Before submitting your test for review, be sure that it works under
start_test --perflabel ml-(if applicable)
start_test --perflabel cc-(if applicable)
modes when running within the directory (or directories) in question. Nothing is more embarrassing than committing a test that doesn’t work on day one.
Once the test(s),
.graph files, and
GRAPHFILES are committed to the
Chapel repository, they will start showing up on the Chapel public
pages as well.
A Test That Tracks A Failure¶
The testing system also serves as our current system for tracking code-driven bugs and open issues. When a bug is encountered (either by a user or a developer), if it is not quickly resolved then it will be tracked by making what is known as a future.
When making a new test that is a future, follow the guidelines for making a
correctness test. Like normal correctness tests, a future will specify a
.good file with its intended output. However, the future is not expected to
match against the
.good file when the future is filed - developer effort is
usually required to fix the bug.
Once this test is created (or if a test already exists), add a
sharing the same base name as the test to mark it as a future. For example,
hi.future file would make the simple correctness test at the start
of this document into a future test.
Marking a test as a future causes it to be tested every night, but not to be
counted against the compiler’s success/failure statistics. If/when the future
.good file, developers will be alerted by the testing system.
The format of the
.future file itself is minimally structured. The
first line should contain the type of future (see list below) followed
by a brief (one 80-column line) description of the future, which ideally
reflects the associated GitHub issue title. The next line should contain the
associated GitHub issue number in the #issue-number format, e.g. #1.
The rest of the file is optional and free-form. It can be used over the future’s lifetime to describe in what way the test isn’t working or should be working, implementation notes, philosophical arguments, etc.
The current categories of futures reflect GitHub labels:
bug: this test exhibits a bug in the implementation
error message: this test correctly generates an error message, but the error message needs clarification/improvement
feature request: a way of filing a request for a particular feature in Chapel
performance: indicates a performance issue that needs to be addressed
design: this test raises a question about Chapel’s semantics that we ultimately need to address
portability: indicates a portability issue that needs to be addressed
unimplemented feature: this test uses features that are specified, but which have not yet been implemented.
Currently, it is mandatory to include a GitHub issue number with any new futures. That said, futures that pre-date Chapel’s adoption of GitHub issues may have a description instead of an issue number.
When filing a bug report as an issue, it is considered good practice to include a future for the issue tracked on the GitHub issues page.
Tracking Current Failure Mode¶
Sometimes a future will change its behavior, but not be resolved. The future
should be updated to continue to track the issue as much as possible - to alert
developers when this happens, it is necessary to track not only the expected
good output but also the output indicating the current failure. This is done
.bad file. The contents of a
.bad file are similar to a
file and should match the currently generated output of the test.
Tests whose current/
.bad output varies based on the compiler version number,
line numbers of standard modules and such are fragile since these things change
frequently; in such cases, either a
.prediff should be used to filter the
output before comparing to
.bad, or the
.bad should be omitted.
Ultimately, our intention is to support a library of common recipes for
files, but this has not been implemented yet.
An easy way to obtain this file is to run the future once using
the output for that configuration can then be found in a
.out.tmp file in
the same directory as the test.
Resolving a Future¶
There are three situations under which a future will get resolved.
A developer explicitly works on resolving the future.
A developer works on another feature or issue and as a consequence the future gets resolved.
This could happen if the two issues appeared to be unrelated, or if the existence of the future had been forgotten
A developer examines the future and determines the current behavior is correct
The developer may then either remove the supporting files for futures, or remove the test entirely.
A brief description of flags that can be used with
start_test itself can
be obtained by calling
The section titled A Correctness Test demonstrates invoking
on a single explicitly-named file. More generally,
start_test takes a list
of test and directory names on the command line and will run all tests
explicitly named or contained within the directories (or their subdirectories).
start_test foo.chpl bar/baz.chpl typeTests/ OOPTests/
will test the two explicitly-named tests (
bar/ directory). It will also recursively search for any tests
stored in the
If invoked without any arguments,
start_test will start in the current
directory and recursively look for tests in subdirectories.
If invoked with the
start_test will compile the
program and execute it with
--valgrind flag does the
same, plus it also runs the compiler under
valgrind, which increases
testing time compared to
--valgrindexe. To learn about best practices
To run performance testing, add the
--performance flag to
along with the traditional options. So for example, to run this
single test in performance mode, one could use:
start_test --performance foo.chpl
When crawling a directory hierarchy, only tests with
will be considered when testing in performance mode.
All performance tests are compiled with
--fast by default and
when it’s not problematic for the target configuration.
The output from a
start_test run will begin with a list of the settings
used, following the environment settings as obtained from
Setting up Your Environment for Chapel). This will be followed by
information from running the individual tests or directories.
The output from
start_test will end with the location of the log file
containing all the output from its execution, as well as a summary of all tests
that failed and any futures that were run. This will look something like this:
[Test Summary - 180328.134706] [Error matching program output for path/to/failing/correctness/test] Future (bug: description of bug from future file) [Error matching program output for path/to/failing/future] Future (bug: description of bug from future file) [Success matching program output for path/to/passing/future] [Summary: #Successes = 1 | #Failures = 1 | #Futures = 2 | #Warnings = 0 ] [Summary: #Passing Suppressions = 0 | #Passing Futures = 1 ] [END]
Successful tests will not be printed after the line beginning with
Summary unless they had a
.future file (see A Test That Tracks A
Failure for information about
When nightly testing is run, core developers will be notified of every configuration with a new failure, warning, passing suppression, and/or passing future.
Summary of Testing Files¶
The following table serves as a quick reference for the various test files, and as a table of contents for this page. It is not necessarily complete, and not all of it has been covered in this document. Please ask a member of the core team for more information on a specific file.
Using file base name,
foo for the filenames in this table.
Contents of file
Chapel test program to compile and run
Single locale C test program to compile and run. See Comparing to a C version for more information
Multilocale C test program to compile and run. See Comparing to a C version for more information
expected output of test program
line separated compiler flag configurations. See Compile-time Arguments for more information
directory-wide compiler flags
line separated runtime flag configurations. See Execution-time Arguments for more information
directory-wide runtime flags
line separated list of environment variables settings. See Environment Variables for more information
directory-wide environment variables
number of locales to use in multilocale run
directory-wide number of locales to use in multilocale run
line separated list of files to include when validating the expected output
directory-wide list of files to compare with output
script that is run on the test output, before taking the diff between the output and .good file
directory-wide script that is run over test output
script that is run prior to compilation of the test program
directory-wide script that is run prior to compilation
script that is run prior to execution of the test program
directory-wide script that is run prior to execution
Testing System Settings
line separated list of files to remove before the next test run
directory-wide list of files to remove before test runs
empty file. Indicates .chpl file should only be compiled, not executed. See Controlling How It Runs for more information.
empty file. Indicates the file should not be run explicitly See Controlling How It Runs for more information.
empty file. Indicates the directory should not be run
line separated list of conditions under which the test should not be run, or a script to compute the same. See Limiting Where the Test Runs for more information
same as above, but applied to the entire directory
line separated list of conditions under which the test is
expected to fail, or a script to compute the same. Note
that unless otherwise specified, a
time in seconds after which start_test should stop this test See Limiting Time Taken for more information
performance (replace “perf” with “ml-” and “cc-” as necessary)
compiler flags, overrides .compopts for –performance
directory-wide performance compiler flags
runtime flags, overrides .execopts for –performance
directory-wide performance runtime flags
environment variables, overrides .execenv for –performance
directory-wide performance environment variables
number of execution trials to run if no timeout specified
directory-wide number of execution trials to run
time in seconds after which start_test should stop this test
keys to search for in the output
Specifies which data files and perfkeys to graph, and contains meta-data associated with labeling data sets, axis, and graphs
Acts as an index that tracks all .graph that should be graphed.
Describes the future being tested, following the newline-separated format of: category, title, issue #
output generated on a failing test, to track if a known failing future begins failing a different way. See Tracking Current Failure Mode for more information