Forall Loops

View forallLoops.chpl on GitHub

This primer illustrates forall loops, which are a way to leverage data parallelism or engage user-defined parallel iterators.

Like serial for-loops, forall loops can iterate over a data structure, an iterator, or a zippered combination of these. Unlike for-loops, multiple iterations of a forall loop can potentially execute in parallel. Parallelism is determined by the data structure or iterator being iterated over, also known as the “iterable”.

Chapel has forall statements and forall expressions. Each form has two varieties: “must-parallel” and “may-parallel”.

  • The must-parallel forms are written using the forall keyword. They require that the iterable provide a parallel iterator. Note that there are no requirements on the behavior of the parallel iterator. For example, it can execute serially, in which case the “must-parallel” loop that invokes it also executes serially.

  • The may-parallel forms are written using brackets [ ] (“May-parallel” forall statement). They invoke the parallel iterator if the iterable provides it, and otherwise fall back on the serial iterator.

As with for-loops, the body of a forall statement is a statement or a block statement, whereas the body of a forall expression is an expression. Both kinds are shown in the following sections.

“Must-parallel” forall statement

In the following example, the forall loop iterates over the array indices in parallel.

config const n = 5;
var A: [1..n] real;

forall i in 1..n {
  A[i] = i;
}

writeln("After setting up, A is:");
writeln(A);
writeln();

If A were a distributed array (Distributions), each loop iteration would typically be executed on the locale where the corresponding array element resides.

“Must-parallel” forall expression

The following forall expression produces new values in parallel. We store these values in a new array.

var B = forall a in A do a * 3;

writeln("After initialization, B is:");
writeln(B);
writeln();

Zippered “must-parallel” forall statement

Forall loops support zippered iteration over multiple iterables similarly to serial for-loops. For a zippered forall loop, parallelism is determined by the “leader” iterable, which is the first data structure or iterator in the zippered list.

A zippered “must-parallel” forall loop requires that the leader iterable provide a “leader” iterator and all iterables provide “follower” iterators. These are described in the parallel iterators primer.

Here we illustrate zippering arrays and domains.

var C: [1..n] real;
forall (a, b, i) in zip(A, B, C.domain) do
  C[i] = a * 10 + b / 10 + i * 0.001;

writeln("After a zippered loop, C is:");
writeln(C);
writeln();

The leader iterable in this example is A. Since this array is not distributed, all loop iterations will be executed on the current locale.

“May-parallel” forall statement

The iterator onlySerial defined below does not have any parallel forms. Since [i in onlySerial(n)] is a may-parallel loop, it will accept the iterator, executing its iterations serially:

iter onlySerial(m: int) {
  for j in 1..m do
    yield j;
}

[i in onlySerial(n)] {
  writeln("in onlySerial iteration #", i);
}
writeln();

If the user had supplied a parallel overload of the onlySerial() iterator, the above loop would invoke it instead.

Using the following must-parallel loop would cause an error if onlySerial() does not have any parallel overloads:

forall i in onlySerial(n) { // error: a parallel iterator is not found
  writeln("in iteration #", i);
}

“May-parallel” forall expression

Given that these are default rectangular arrays and therefore provide parallel iterators, the following may-parallel expression will be computed in parallel:

var D = [(a,b,c) in zip(A,B,C)] a + c - b;

writeln("The result of may-parallel expression, D is:");
writeln(D);
writeln();

As with must-parallel zippered loops, here A is the leader iterable (Zippered “must-parallel” forall statement). Its parallel iterator will determine how this loop is parallelized. if A were a distributed array, its parallel iterator would also determine iteration locality.

Domains declared without a distribution (see Distributions), including default rectangular and default associative domains, as well as arrays over such domains, provide both serial and parallel iterators. So do domains distributed over standard multi-locale distributions, such as blockDist and cyclicDist, and arrays over such domains. The parallel iterators provided by standard multi-locale distributions place each loop iteration on the locale where the corresponding index or array element is placed.

Task Intents and Shadow Variables

A forall loop may refer to some variables declared outside the loop, known as “outer variables”. When it does, “shadow variables” are introduced. Each task created by the parallel iterator gets its own set of shadow variables, one per outer variable.

  • Each shadow variable behaves as if it were a formal argument of a function that implements the task’s work. (These “task functions” are described in the language spec). The outer variable is passed to this formal argument according to the argument intent associated with the shadow variable, which is called a “task intent”.

  • References within a task that seem to refer to an outer variable will actually be referring to the corresponding shadow variable owned by the task. If the parallel iterator causes multiple iterations of the loop to be executed by the same task, these iterations refer to the same set of shadow variables.

  • Each shadow variable is deallocated at the end of its task.

For most types, forall intents use the default argument intent (The Default Intent). For numeric types, this implies capturing the value of the outer variable by the time the task starts executing. Sync and atomic variables are passed by reference (Sync, Atomics). Arrays infer their default intent based upon the declaration of the array. Mutable arrays (e.g. declared with var or passed by ref intent) have a default intent of ref, while immutable arrays (e.g. declared with const or passed by const intent) have a default intent of const. These defaults are described in the language spec.

var outerIntVariable = 0;
proc updateOuterVariable() {
  outerIntVariable += 1;  // always refers to the outer variable
}
var outerAtomicVariable: atomic int;

forall i in 1..n {

  D[i] += 0.5; // if multiple iterations of the loop update the same
               // array element, it could lead to a data race

  outerAtomicVariable.add(1);  // ok: concurrent updates are atomic

  if i == 1 then           // ensure only one task updates outerIntVariable
    updateOuterVariable(); // to avoid the risk of a data race

  // the shadow variable always contains the value as of loop start
  writeln("shadow outerIntVariable is: ", outerIntVariable);
}

writeln();
writeln("After a loop with default intents, D is:");
writeln(D);
 // This variable is updated exactly once, so its value is 1.
writeln("outerIntVariable is: ", outerIntVariable);
 // This variable is incremented atomically n times, so its value is n.
writeln("outerAtomicVariable is: ", outerAtomicVariable.read());
writeln();

The task intents in, const in, ref, const ref, and reduce can be specified explicitly using a with clause.

An in or const in intent creates a copy of the outer variable for each task. A ref or const ref makes the shadow variable an alias for the outer variable. Updates to a ref shadow variable are reflected in the corresponding outer variable.

var outerRealVariable = 1.0;

forall i in 1..n with (in outerIntVariable,
                       ref outerRealVariable) {
  outerIntVariable += 1;    // a per-task copy, never accessed concurrently

  if i == 1 then            // ensure only one task accesses outerIntVariable
    outerRealVariable *= 2; // to avoid the risk of a data race
}

writeln("After a loop with explicit intents:");
 // This outer variable's value is unaffected by the loop
 // because its shadow variables have the 'in' intent.
writeln("outerIntVariable is: ", outerIntVariable);
writeln("outerRealVariable is: ", outerRealVariable);
writeln();

A reduce intent can be used to compute reductions. The value of each reduce-intent shadow variable at the end of its task is combined into its outer variable according to the specified reduction operation. Within the loop body, the shadow variable represents the accumulation state produced by this task so far, starting from the reduction identity value at task startup. Values can be combined onto this accumulation state using the reduction-specific operation or the reduce= operator.

 // The values of the outer variables before the loop will be included
 // in the reduction result.
writeln("outerIntVariable before the loop is: ", outerIntVariable);
var outerMaxVariable = 0;

forall i in 1..n with (+ reduce outerIntVariable,
                       max reduce outerMaxVariable) {
  outerIntVariable += i;
  if i % 2 == 0 then
    outerMaxVariable reduce= i;

  // The loop body can contain other code
  // regardless of reduce-related operations.
}

writeln("After a loop with reduce intents:");
writeln("outerIntVariable = ", outerIntVariable);
writeln("outerMaxVariable = ", outerMaxVariable);
writeln();

A with-clause can be used in a similar fashion with any flavor of forall loop.

Task-Private Variables

A task-private variable is similar to an in-intent or ref-intent shadow variable in that it is initialized at the beginning of its task and deallocated at the end of the task. However, a task-private variable is initialized without regard to any outer variable.

A task-private variable is introduced using a with-clause in a way similar to a regular var, const, ref, or const ref variable. A var or const variable must provide either its type or initializing expression, or both. As with a regular variable, it will be initialized to the default value of its type if the initializing expression is not given. A ref or const ref variable must have the initializing expression and cannot declare its type.

A var task-private variable could be used, for example, as a per-task scratch space that is never accessed concurrently.

forall i in 1..n with (var myReal: real,  // starts at 0 for each task
                       ref outerIntVariable, // a shadow variable
                       ref myRef = outerIntVariable) {

  myReal += 0.1;   // ok: never accessed concurrently

  if i == 1 then   // ensure only one task accesses outerIntVariable
    myRef *= 3;    // to avoid the risk of a data race
}

writeln("After a loop with task-private variables:");
 // outerIntVariable was updated through the task-private reference 'myRef'
writeln("outerIntVariable is: ", outerIntVariable);
writeln();

Task Intents Inside Record Methods

When the forall loop occurs inside a method on a record, the fields of the receiver record are represented in the loop body with shadow variables as if they were outer variables.

At present, record fields are always passed by the default intent (The Default Intent), so the fields of MyRecord cannot be modified inside of the first forall loop loop below.

To modify the fields within the body of a forall loop, use the ref intent for this in the with-clause of the forall loop, as in the second forall loop below.

record MyRecord {
  var arrField: [1..n] int;
  var intField: int;
}

proc ref MyRecord.myMethod() {
  forall i in 1..n {
    // intField += 1;     // would cause "illegal assignment" error
  }
  forall i in 1..n with (ref this) {
    arrField[i] = i * 2;
    if i == 1 then
      intField += 1;      // beware of potential for data races
  }
}

var myR = new MyRecord();
myR.myMethod();

writeln("After MyRecord.myMethod, myR is:");
writeln(myR);
writeln();