Vectorizing Iterator

Data parallel constructs (such as forall loops) are implicitly vectorizable. If the --vectorize compiler flag is thrown the Chapel compiler will emit vectorization hints to the backend compiler, though the effects will vary based on the target compiler.

In order to allow users to explicitly request vectorization, this prototype vectorizing iterator is being provided. Loops that invoke this iterator will be marked with vectorization hints, provided the --vectorize flag is thrown.

This iterator is currently available for all Chapel programs and does not require a use statement to make it available. In future releases it will be moved to a standard module and will likely require a use statement to make it available.

iter vectorizeOnly(iterables ...)

Vectorize only "wrapper" iterator:

This iterator wraps and vectorizes other iterators. It takes one or more iterables (an iterator or class/record with a these() iterator) and yields the same elements as the wrapped iterables.

This iterator exists to provide a way to vectorize data parallel loops without invoking a parallel iterator with the goal of avoiding task creation for loops with small trip counts or where task creation isn't desirable.

Data parallel operations in Chapel such as forall loops are order-independent. However, a forall is implemented in terms of either leader/follower or standalone iterators which typically create tasks. This iterator exists to allow vectorization of order-independent loops without requiring task creation. By using this wrapper iterator you are asserting that the loop is order-independent (and thus a candidate for vectorization) just as you are when using a forall loop.

When invoked from a serial for loop, this iterator will simply mark your iterator(s) as order-independent. When invoked from a parallel forall loop this iterator will implicitly be order-independent because of the semantics of a forall, and additionally it will invoke the serial iterator instead of the parallel iterators. For instance:

forall i in vectorizeOnly(1..10) do;
for    i in vectorizeOnly(1..10) do;

will both effectively generate:

CHPL_PRAGMA_IVDEP
for (i=0; i<=10; i+=1) {}

The vectorizeOnly iterator automatically handles zippering, so the zip keyword is not needed. For instance, to vectorize:

for (i, j) in zip(1..10, 1..10) do;

simply write:

for (i, j) in vectorizeOnly(1..10, 1..10) do;

Note that the use of zip is not explicitly prevented, but all iterators being zipped must be wrapped by a vectorizeOnly iterator. Future releases may explicitly prevent the use zip with this iterator.