.. default-domain:: chpl .. _Chapter-Task_Parallelism_and_Synchronization: ==================================== Task Parallelism and Synchronization ==================================== Chapel supports both task parallelism and data parallelism. This chapter details task parallelism as follows: - :ref:`Task_parallelism` introduces tasks and task parallelism. - :ref:`Begin` describes the begin statement, an unstructured way to introduce concurrency into a program. - :ref:`Synchronization_Variables` describes synchronization variables, an unstructured mechanism for synchronizing tasks. - :ref:`Atomic_Variables` describes atomic variables, a mechanism for supporting atomic operations. - :ref:`Cobegin` describes the cobegin statement, a structured way to introduce concurrency into a program. - :ref:`Coforall` describes the coforall loop, another structured way to introduce concurrency into a program. - :ref:`Task_Intents` specifies how variables from outer scopes are handled within ``begin``, ``cobegin`` and ``coforall`` statements. :ref:`Task_Private_Variables` are also available. - :ref:`Sync_Statement` describes the sync statement, a structured way to control parallelism. - :ref:`Serial` describes the serial statement, a structured way to suppress parallelism. - :ref:`Yield_Task_Execution` describes yielding the current tasks execution. .. _Task_parallelism: Tasks and Task Parallelism -------------------------- A Chapel *task* is a distinct context of execution that may be running concurrently with other tasks. Chapel provides a simple construct, the ``begin`` statement, to create tasks, introducing concurrency into a program in an unstructured way. In addition, Chapel introduces the type qualifier ``sync`` for synchronization between tasks. Chapel provides two constructs, the ``cobegin`` and ``coforall`` statements, to introduce concurrency in a more structured way. These constructs create multiple tasks but do not continue until these tasks have completed. In addition, Chapel provides two constructs, the ``sync`` and ``serial`` statements, to insert synchronization and suppress parallelism. All four of these constructs can be implemented through judicious uses of the unstructured task-parallel constructs described in the previous paragraph. Tasks are considered to be created when execution reaches the start of a ``begin``, ``cobegin``, or ``coforall`` statement. When the tasks are actually executed depends on the Chapel implementation and run-time execution state. A task is represented as a call to a *task function*, whose body contains the Chapel code for the task. Variables defined in outer scopes are considered to be passed into a task function by default intent, unless a different *task intent* is specified explicitly by a ``task-intent-clause``. Accesses to the same variable from different tasks are subject to the Memory Consistency Model (:ref:`Chapter-Memory_Consistency_Model`). Such accesses can result from aliasing due to ``ref`` argument intents or task intents, among others. .. _Begin: The Begin Statement ------------------- The begin statement creates a task to execute a statement. The syntax for the begin statement is given by .. code-block:: syntax begin-statement: 'begin' task-intent-clause[OPT] statement Control continues concurrently with the statement following the begin statement. *Example (beginUnordered.chpl)*. The code .. code-block:: chapel begin writeln("output from spawned task"); writeln("output from main task"); .. BLOCK-test-chapelprediff #!/usr/bin/env sh testname=$1 outfile=$2 sort $outfile > $outfile.2 mv $outfile.2 $outfile .. BLOCK-test-chapeloutput output from main task output from spawned task executes two ``writeln`` statements that output the strings to the terminal, but the ordering is purposely unspecified. There is no guarantee as to which statement will execute first. When the begin statement is executed, a new task is created that will execute the ``writeln`` statement within it. However, execution will continue immediately after task creation with the next statement. A begin statement creates a single task function, whose body is the body of the begin statement. The handling of the outer variables within the task function and the role of ``task-intent-clause`` are defined in :ref:`Task_Intents`. Yield and return statements are not allowed in begin blocks. Break and continue statements may not be used to exit a begin block. .. _Synchronization_Variables: Synchronization Variables ------------------------- Synchronization variables have a logical state associated with the value. The state of the variable is either *full* or *empty*. Normal reads of a synchronization variable cannot proceed until the variable’s state is full. Normal writes of a synchronization variable cannot proceed until the variable’s state is empty. The ``sync`` type qualifier precedes the type of the variable’s value in the declaration. ``sync`` is supported for the primitive types ``nothing``, ``bool``, ``int``, ``uint``, ``real``, ``imag``, ``complex``, ``range``, ``bytes``, and ``string`` ( :ref:`Primitive_Types`); for enumerated types ( :ref:`Enumerated_Types`); and for class types (:ref:`Class_Types`) and record types (:ref:`Record_Types`). For sync variables of class type, the full/empty state applies to the reference to the class object, not to its member fields. If a task attempts to read or write a synchronization variable that is not in the correct state, the task is suspended. When the variable transitions to the correct state, the task is resumed. If there are multiple tasks blocked waiting for the state transition one task is non-deterministically selected to proceed and the others continue to wait. A synchronization variable is specified with a ``sync`` type given by the following syntax: .. code-block:: syntax sync-type: 'sync' type-expression A default-initialized synchronization variable will be empty. A synchronization variable initialized from another expression will be full and store the value from that expression. *Example (beginWithSyncVar.chpl)*. The code .. code-block:: chapel class Tree { var isLeaf: bool; var left, right: unmanaged Tree?; var value: int; proc sum():int { if (isLeaf) then return value; var x: sync int; begin x.writeEF(left!.sum()); var y = right!.sum(); return x.readFE() + y; } } .. BLOCK-test-chapelpost var tree: unmanaged Tree = new unmanaged Tree(false, new unmanaged Tree(false, new unmanaged Tree(true, nil, nil, 1), new unmanaged Tree(true, nil, nil, 1), 1), new unmanaged Tree(false, new unmanaged Tree(true, nil, nil, 1), new unmanaged Tree(true, nil, nil, 1), 1), 1); writeln(tree.sum()); proc Tree.deinit() { if isLeaf then return; delete left; delete right; } delete tree; .. BLOCK-test-chapeloutput 4 the sync variable ``x`` is assigned by an asynchronous task created with the begin statement. The task returning the sum waits on the reading of ``x`` until it has been assigned. .. *Example (syncVar.chpl)*. The following code implements a simple split-phase barrier using a sync variable. .. BLOCK-test-chapelpre config const n = 44; proc work(i) { // do nothing } .. code-block:: chapel var count: sync int = n; // counter which also serves as a lock var release: sync bool; // barrier release forall t in 1..n do begin { work(t); var myc = count.readFE(); // read the count, set state to empty if myc!=1 { write("."); count.writeEF(myc-1); // update the count, set state to full // we could also do some work here before blocking release.readFF(); } else { release.writeEF(true); // last one here, release everyone writeln("done"); } } .. BLOCK-test-chapeloutput ...........................................done In each iteration of the forall loop after the work is completed, the task reads the ``count`` variable, which is used to tally the number of tasks that have arrived. All tasks except the last task to arrive will block while trying to read the variable ``release``. The last task to arrive will write to ``release``, setting its state to full at which time all the other tasks can be unblocked and run. If a formal argument with a default intent either has a synchronization type or the formal is generic (:ref:`Formal_Arguments_of_Generic_Type`) and the actual has a synchronization type, the actual must be an lvalue and is passed by reference. In these cases the formal itself is an lvalue, too. The actual argument is not read or written during argument passing; its state is not changed or waited on. The qualifier ``sync`` without the value type can be used to specify a generic formal argument that requires a ``sync`` actual. .. _Functions_on_Synchronization_Variables: Predefined Sync Methods ~~~~~~~~~~~~~~~~~~~~~~~ The following methods are defined for variables of ``sync`` type: .. include:: /builtins/ChapelSyncvar.rst .. _Atomic_Variables: .. _Functions_on_Atomic_Variables: Atomic Variables ---------------- ``atomic`` is a type qualifier that precedes the variable’s type in the declaration. An atomic variable is specified with an atomic type given by the following syntax: .. code-block:: syntax atomic-type: 'atomic' type-expression .. include:: /builtins/Atomics.rst .. _Cobegin: The Cobegin Statement --------------------- The cobegin statement is used to introduce concurrency within a block. The ``cobegin`` statement syntax is .. code-block:: syntax cobegin-statement: 'cobegin' task-intent-clause[OPT] block-statement A new task and a corresponding task function are created for each statement in the ``block-statement``. Control continues when all of the tasks have finished. The handling of the outer variables within each task function and the role of ``task-intent-clause`` are defined in :ref:`Task_Intents`. Return statements are not allowed in cobegin blocks. Yield statement may only be lexically enclosed in cobegin blocks in parallel iterators (:ref:`Parallel_Iterators`). Break and continue statements may not be used to exit a cobegin block. *Example (cobeginAndEquivalent.chpl)*. The cobegin statement .. BLOCK-test-chapelpre var sync1, sync2: sync int; proc stmt1() { sync1.readFE(); } proc stmt2() { sync2.readFE(); sync1.writeEF(1); } proc stmt3() { sync2.writeEF(1); } .. code-block:: chapel cobegin { stmt1(); stmt2(); stmt3(); } is equivalent to the following code that uses only begin statements and sync variables to introduce concurrency and synchronize: .. code-block:: chapel var s1, s2, s3: sync bool; begin { stmt1(); s1.writeEF(true); } begin { stmt2(); s2.writeEF(true); } begin { stmt3(); s3.writeEF(true); } s1.readFF(); s2.readFF(); s3.readFF(); Each begin statement is executed concurrently but control does not continue past the final line above until each of the sync variables is written, thereby ensuring that each of the functions has finished. .. _Coforall: The Coforall Loop ----------------- The coforall loop is a variant of the cobegin statement in loop form. The syntax for the coforall loop is given by .. code-block:: syntax coforall-statement: 'coforall' index-var-declaration 'in' iteratable-expression task-intent-clause[OPT] 'do' statement 'coforall' index-var-declaration 'in' iteratable-expression task-intent-clause[OPT] block-statement 'coforall' iteratable-expression task-intent-clause[OPT] 'do' statement 'coforall' iteratable-expression task-intent-clause[OPT] block-statement The ``coforall`` loop creates a separate task for each iteration of the loop. Control continues with the statement following the ``coforall`` loop after all tasks corresponding to the iterations of the loop have completed. The single task function created for a ``coforall`` and invoked by each task contains the loop body. The handling of the outer variables within the task function and the role of ``task-intent-clause`` are defined in :ref:`Task_Intents`. Return statements are not allowed in coforall blocks. Yield statement may only be lexically enclosed in coforall blocks in parallel iterators (:ref:`Parallel_Iterators`). Break and continue statements may not be used to exit a coforall block. *Example (coforallAndEquivalent.chpl)*. The coforall statement .. BLOCK-test-chapelpre iter iterator() { for i in 1..3 do yield i; } proc body() { } .. code-block:: chapel coforall i in iterator() { body(); } is equivalent to the following code that uses only begin statements and sync variables to introduce concurrency and synchronize: .. code-block:: chapel var runningCount: sync int = 1; var finished: sync bool; for i in iterator() { runningCount.writeEF(runningCount.readFE() + 1); begin { body(); var tmp = runningCount.readFE(); runningCount.writeEF(tmp-1); if tmp == 1 then finished.writeEF(true); } } var tmp = runningCount.readFE(); runningCount.writeEF(tmp-1); if tmp == 1 then finished.writeEF(true); finished.readFF(); Each call to ``body()`` executes concurrently because it is in a begin statement. The sync variable ``runningCount`` is used to keep track of the number of executing tasks plus one for the main task. When this variable reaches zero, the sync variable ``finished`` is used to signal that all of the tasks have completed. Thus control does not continue past the last line until all of the tasks have completed. .. _Task_Intents: Task Intents ------------ If a variable is referenced within the lexical scope of a ``begin``, ``cobegin``, or ``coforall`` statement and is declared outside that statement, it is subject to *task intents*. That is, this *outer variable* is considered to be passed as an actual argument to the corresponding task function at task creation time. All references to the variable within the task function implicitly refer to a *shadow variable*, i.e. the task function’s corresponding formal argument. When the task construct is inside a method on a record, accesses a field of ``this``, and does not contain an explicit task intent on ``this`` (see below), the field itself is treated as an outer variable. That is, it is passed as an actual argument to the task function and all references to the field within the task function implicitly refer to the corresponding shadow variable. Each formal argument of a task function has the default argument intent by default. See also :ref:`The_Default_Intent`. Note that the default intent usually allows the compiler to assume that the value will not be concurrently modified. For variables of primitive and class types, this has the effect of capturing the value of the variable at task creation time and referencing that value instead of the original variable within the lexical scope of the task construct. A formal can be given another argument intent explicitly by listing it with that intent in the optional ``task-intent-clause``. For example, for variables of most types, the ``ref`` intent allows the task construct to modify the corresponding original variable or to read its updated value after concurrent modifications. The syntax of the task intent clause is: .. code-block:: syntax task-intent-clause: 'with' ( task-intent-list ) task-intent-list: task-intent-item task-intent-item, task-intent-list task-intent-item: formal-intent identifier reduce-scan-operator 'reduce' identifier class-type 'reduce' identifier task-private-var-decl where the following intents can be used as a ``formal-intent``: ``ref``, ``in``, ``const``, ``const in``, ``const ref``. ``task-private-var-decl`` is defined in :ref:`Task_Private_Variables`. The ``reduce`` task intent specifies a reduction into the outer variable, which is provided to the right of the ``reduce`` keyword. The reduction operator is specified by either the ``reduce-scan-operator`` or the ``class-type`` in the same way as for a Reduction Expressions (see :ref:`reduce`). At the start of each task the corresponding shadow variable is initialized to the identity value of the reduction operator. Within the task it behaves as a regular variable. In addition, it can be the left-hand side of the ``reduce=`` operator, which accumulates its right-hand side onto the shadow variable. At the end of each task its shadow variable is combined into the outer variable. *Open issue*. How should ``reduce`` task intent be defined for ``begin`` tasks? A reduction is legal only when the task completes before the program has exited the dynamic scope of the outer variable. Reduce intents are currently work-in-progress. See also :ref:`Reduce Intents technical note `. The implicit treatment of outer scope variables as the task function’s formal arguments applies to both module level and local variables. It applies to variable references within the lexical scope of a task construct, but does not extend to its dynamic scope, i.e., to the functions called from the task(s) but declared outside of the lexical scope. The loop index variables of a ``coforall`` statement are not subject to such treatment within that statement; however, they are subject to such treatment within nested task constructs, if any. *Rationale*. The primary motivation for task intents is to avoid some races on scalar/record variables, which are possible when one task modifies a variable and another task reads it. Without task intents, for example, it would be easy to introduce and overlook a bug illustrated by this simplified example: .. code-block:: chapel { var i = 0; while i < 10 { begin { f(i); } i += 1; } } If all the tasks created by the ``begin`` statement start executing only after the ``while`` loop completes, and ``i`` within the ``begin`` is treated as a reference to the original ``i``, there will be ten tasks executing ``f(10)``. However, the user most likely intended to generate ten tasks executing ``f(0)``, ``f(1)``, ..., ``f(9)``. Task intents ensure that, regardless of the timing of task execution. Another motivation for task intents is that referring to a captured copy in a task is often more efficient than referring to the original variable. That’s because the copy is a local constant, e.g. it could be placed in a register when it fits. Without task intents, references to the original variable would need to be implemented using a pointer dereference. This is less efficient and can hinder optimizations in the surrounding code, for example loop-invariant code motion. Furthermore, in the above example the scope where ``i`` is declared may exit before all the ten tasks complete. Without task intents, the user would have to protect ``i`` to make sure its lexical scope doesn’t exit before the tasks referencing it complete. We decided to treat ``cobegin`` and ``coforall`` statements the same way as ``begin``. This is for consistency and to make the race-avoidance benefit available to more code. We decided to apply task intents to module level variables, in addition to local variables. Again, this is for consistency. One could view module level variables differently than local variables (e.g. a module level variable is “always available”), but we favored consistency over such an approach. We decided not to apply task intents to “closure” variables, i.e., the variables in the dynamic scope of a task construct. This is to keep this feature manageable, so that all variables subject to task intents can be obtained by examining just the lexical scope of the task construct. In general, the set of closure variables can be hard to determine, unwieldy to implement and reason about, it is unclear what to do with extern functions, etc. We do not provide ``inout`` or ``out`` as task intents because they will necessarily create a data race in a ``cobegin`` or ``coforall``. ``type`` and ``param`` intents are not available either as they do not seem useful as task intents. .. .. note:: *Future*. For a given intent, we would also like to provide a blanket clause, which would apply the intent to all variables. An example of syntax for a blanket ``ref`` intent would be ``ref *``. .. _Sync_Statement: The Sync Statement ------------------ The sync statement acts as a join of all dynamically encountered begins from within a statement. The syntax for the sync statement is given by .. code-block:: syntax sync-statement: 'sync' statement 'sync' block-statement Return statements are not allowed in sync statement blocks. Yield statement may only be lexically enclosed in sync statement blocks in parallel iterators (:ref:`Parallel_Iterators`). Break and continue statements may not be used to exit a sync statement block. *Example (syncStmt1.chpl)*. The sync statement can be used to wait for many dynamically created tasks. .. BLOCK-test-chapelpre config const n = 9; proc work() { write("."); } .. code-block:: chapel sync for i in 1..n do begin work(); .. BLOCK-test-chapelpost writeln("done"); .. BLOCK-test-chapeloutput .........done The for loop is within a sync statement and thus the tasks created in each iteration of the loop must complete before the continuing past the sync statement. .. *Example (syncStmt2.chpl)*. The sync statement .. BLOCK-test-chapelpre proc stmt1() { } proc stmt2() { } .. code-block:: chapel sync { begin stmt1(); begin stmt2(); } is similar to the following cobegin statement .. code-block:: chapel cobegin { stmt1(); stmt2(); } except that if begin statements are dynamically encountered when ``stmt1()`` or ``stmt2()`` are executed, then the former code will wait for these begin statements to complete whereas the latter code will not. .. _Serial: The Serial Statement -------------------- The ``serial`` statement can be used to dynamically disable parallelism. The syntax is: .. code-block:: syntax serial-statement: 'serial' expression[OPT] 'do' statement 'serial' expression[OPT] block-statement where the optional ``expression`` evaluates to a boolean value. If the expression is omitted, it is as though ’true’ were specified. Whatever the expression’s value, the statement following it is evaluated. If the expression is true, any dynamically encountered code that would normally create new tasks within the statement is instead executed by the original task without creating any new ones. In effect, execution is serialized. If the expression is false, code within the statement will generates task according to normal Chapel rules. *Example (serialStmt1.chpl)*. In the code .. BLOCK-test-chapelpre config const lo = 9; config const hi = 23; proc work(i) { if __primitive("task_get_serial") then writeln("serial ", i); } .. code-block:: chapel proc f(i) { serial i<13 { cobegin { work(i); work(i); } } } for i in lo..hi { f(i); } .. BLOCK-test-chapeloutput serial 9 serial 9 serial 10 serial 10 serial 11 serial 11 serial 12 serial 12 the serial statement in procedure f() inhibits concurrent execution of work() if the variable i is less than 13. .. *Example (serialStmt2.chpl)*. The code .. BLOCK-test-chapelpre proc stmt1() { write(1); } proc stmt2() { write(2); } proc stmt3() { write(3); } proc stmt4() { write(4); } var n = 3; .. code-block:: chapel serial { begin stmt1(); cobegin { stmt2(); stmt3(); } coforall i in 1..n do stmt4(); } is equivalent to .. code-block:: chapel stmt1(); { stmt2(); stmt3(); } for i in 1..n do stmt4(); .. BLOCK-test-chapelpost writeln(); .. BLOCK-test-chapeloutput 123444123444 because the expression evaluated to determine whether to serialize always evaluates to true. .. _Yield_Task_Execution: Yielding Task Execution ----------------------- Execution of the current task can be explicitly yielded with ``currentTask.yieldExecution()``, providing an opportunity for other tasks to execute.