Runtime Support for Atomics

The following information is meant to describe the underlying runtime support for Chapel's Atomic Variables.

For more information on Atomic Variables refer to the Chapel Language Specification, or for a list of available functions on Atomics see Atomics

For code examples using atomics, see the atomics.chpl primer.

Overview

Atomic variables are a built-in type that support predefined atomic operations. Currently, Chapel supports processor-provided atomic operations on bool, as well as all sizes of int, uint, and real for most backend compilers (see Setting up Your Environment for Chapel for the current list of supported compilers.) Initial support for network-provided atomic operations is also available. See the platform-specific documentation to check if network-based atomics are available for your platform

The choice of supported atomic variable types as well as the atomic operations were strongly influenced by the C11 standard. A notable exception is that Chapel supports atomic fetch-and-add/fetch-and-subtract operations on real types as well.

The specific implementation of atomics can be selected via the environment variable CHPL_ATOMICS. Similar to the other Chapel environment variables, an appropriate default is chosen when not specified, and not all implementations are available for all settings. Chapel currently supports three atomics implementations: cstdlib, intrinsics and locks. This environment variable also specifies the atomic implementation used by the Chapel runtime.

If compiler support for atomics is available, the atomic operations will be mapped down the appropriate compiler intrinsics which often map directly to processor atomics. If intrinsics are not available, the atomic implementation defaults to using locks in the form of Chapel's sync vars. As a result the locks implementation will be slower than the intrinsic implementation. Since Chapel's atomics were modeled after the C11 edition of the C standard, the cstdlib implementation is just a wrapper around C standard atomics. As C11 support becomes more prevalent and reliable, cstdlib will become the default in some configurations.

Currently, unless using network atomics, all remote atomic operations will result in the calling task effectively migrating to the locale on which the atomic variable was allocated and performing the atomic operations locally.

If supported, the network atomics implementation can be selected via the environment variable CHPL_NETWORK_ATOMICS. If set, all variables declared to be atomic will use the specified network's atomic operations. It is possible to override this default by using the undocumented internal function chpl__processorAtomicType() defined in $CHPL_HOME/modules/internal/Atomics.chpl. Over time we will add a more principled way for explicitly requesting processor atomics, and this function may disappear.

For more information about the runtime implementation see $CHPL_HOME/runtime/include/atomics/README.

Memory Order Notes

As mentioned in the spec, most atomic operations optionally take a memory order. However, for the intrinsics and locks implementations, this argument is ignored. The resulting effect is that all atomic operations are performed with memory_order_seq_cst (sequentially consistent) regardless of the actual order specified. The reason for this is because the compiler intrinsics used in the runtime have no way to specify memory order.

The cstdlib implementation uses the specified memory order.

Variances from the C standard

While Chapel atomics are modeled after the C standard there are some notable differences. The primary one is that Chapel supports fetch-and-add/fetch-and-subtract operations for real types. It should be noted that since there is virtually no hardware support for floating point atomics, our implementation is not very efficient.

As noted in the spec there a few additional methods in Chapel that are not in C11. They are peek, poke, and waitFor. peek and poke are supposed to be relaxed versions of read and write that allow users to perform reads and writes with more relaxed memory constraints. Currently they are implemented as reads and writes with memory_order_relaxed. waitFor is a method that waits until an atomic object has a specific value. It can yield to other tasks while waiting.

Chapel currently does not support the memory fences or the isLockFree method from the C11 spec. They are defined in the runtime but not in the modules. The primary reason that isLockFree is not available is that it may not be accurate for the intrinsics. Without examining each intrinsic operation for each compiler it is hard to know if they actually map down to lock free operations. threadFence and signalFence are also in the runtime but not in the modules. The primary reason for this is that there is no need for them with the intrinsics or locks implementations, where all our operations use memory_order_seq_cst. They will be added for use with the cstdlib implementation. The fences are used with other memory_orders to allow you to create safe programs when atomic operations are using non sequential memory orders.

Open issues

  • Atomic bools are only supported for the default size and not implemented for all sizes of bools.
  • The memory_order is currently ignored by the intrinsics and locks implementations.
  • The threadFence and signalFence methods need to be made available for use with nonsequential memory orders.

Additional References

  • See the section titled "Memory Consistency Model" in the Chapel Language Specification for more information on memory orders and Chapel's memory consistency model.