BufferedAtomics

Usage

use BufferedAtomics;

This module provides buffered versions of non-fetching atomic operations for all int, uint, and real types. Buffered versions of add(), sub(), or(), and(), and xor() are provided. These variants are internally buffered and the buffers are flushed implicitly when full or explicitly with flushAtomicBuff(). These buffered operations can provide a significant speedup for bulk atomic operations that do not require strict ordering of operations:

use BufferedAtomics;

const numTasksPerLocale = here.maxTaskPar,
      iters = 10000;


var a: atomic int;

coforall loc in Locales do on loc do
  coforall 1..numTasksPerLocale do
    for i in 1..iters do
      a.addBuff(i);                   // buffered atomic add

flushAtomicBuff();                    // flush any pending operations (required)


const itersSum = iters*(iters+1)/2,   // sum from 1..iters
      numTasks = numLocales * numTasksPerLocale;
assert(a.read() == numTasks * itersSum);

It's important to be aware that buffered atomic operations are not consistent with regular atomic operations and updates may not be visible until the buffers are explicitly flushed with flushAtomicBuff().

var a: atomic int;
a.addBuff(1);
writeln(a);        // can print 0 or 1
flushAtomicBuff();
writeln(a);        // prints 1

Generally speaking they are useful for when you have a large batch of atomic updates to perform and the order of those operations doesn't matter.

Note

Currently, these are only optimized for CHPL_NETWORK_ATOMICS=ugni. Processor atomics or any other implementation falls back to non-buffered operations. Under ugni these operations are internally buffered. When the buffers are flushed, the operations are performed all at once. Cray Linux Environment (CLE) 5.2.UP04 or newer is required for best performance. In our experience, buffered atomics can achieve up to a 5X performance improvement over non-buffered atomics for CLE 5.2UP04 or newer and up to a 2.5X improvement for older versions of CLE.

proc AtomicT.addBuff(value: T): void

Buffered atomic add.

proc AtomicT.subBuff(value: T): void

Buffered atomic sub.

proc AtomicT.orBuff(value: T): void

Buffered atomic or.

proc AtomicT.andBuff(value: T): void

Buffered atomic and.

proc AtomicT.xorBuff(value: T): void

Buffered atomic xor.

proc flushAtomicBuff(): void

Flush any atomic operations that are still buffered. Note that this flushes any pending operations on all locales, not just the current locale.