UnorderedAtomics

Usage

use UnorderedAtomics;

or

import UnorderedAtomics;

Support for unordered non-fetching atomic operations.

Warning

This module represents work in progress. The API is unstable and likely to change over time.

This module provides unordered versions of non-fetching atomic operations for all int, uint, and real types. Unordered versions of add(), sub(), or(), and(), and xor() are provided. The results of these functions are not visible until task or forall termination or an explicit unorderedAtomicTaskFence(), but they can provide a significant speedup for bulk atomic operations that do not require ordering:

use UnorderedAtomics;

const numTasksPerLocale = here.maxTaskPar,
      iters = 10000;

var a: atomic int;

coforall loc in Locales do on loc do
  coforall 1..numTasksPerLocale do
    for i in 1..iters do
      a.unorderedAdd(i); // unordered atomic add

// no fence required, fenced at task termination

const itersSum = iters*(iters+1)/2, // sum from 1..iters
      numTasks = numLocales * numTasksPerLocale;
assert(a.read() == numTasks * itersSum);

It’s important to be aware that unordered atomic operations are not consistent with regular atomic operations and updates may not be visible until the task or forall that issued them terminates or they are explicitly fenced with unorderedAtomicTaskFence().

var a: atomic int;
a.unorderedAdd(1);
writeln(a);        // can print 0 or 1
unorderedAtomicTaskFence();
writeln(a);        // prints 1

Generally speaking they are useful for when you have a large batch of atomic updates to perform and the order of those operations doesn’t matter.

Note

Currently, these are only optimized for CHPL_NETWORK_ATOMICS=ugni. Processor atomics or any other implementation falls back to ordered operations. Under ugni these operations are internally buffered. When the buffers are flushed, the operations are performed all at once. Cray Linux Environment (CLE) 5.2.UP04 or newer is required for best performance. In our experience, unordered atomics can achieve up to a 5X performance improvement over ordered atomics for CLE 5.2UP04 or newer.

proc ref AtomicT.unorderedAdd(val: valType): void

Unordered atomic add.

proc ref AtomicT.unorderedSub(val: valType): void

Unordered atomic sub.

proc ref AtomicT.unorderedOr(val: valType): void

Unordered atomic or.

proc ref AtomicT.unorderedAnd(val: valType): void

Unordered atomic and.

proc ref AtomicT.unorderedXor(val: valType): void

Unordered atomic xor.

proc unorderedAtomicTaskFence(): void

Fence any pending unordered atomics issued by the current task.