GPU¶
Usage
use GPU;
or
import GPU;
Supports utility functions for operating with GPUs.
Warning
This module is unstable and its interface is subject to change in the future.
GPU support is a relatively new feature to Chapel and is under active development.
For the most up-to-date information about GPU support see the technical note about it.
- proc gpuWrite(const args ...?k)¶
This function is intended to be called from within a GPU kernel and is useful for debugging purposes.
Currently using
write
to send output tostdout
will make a loop ineligible for GPU execution; usegpuWrite
instead.Currently this function will only work if values of type
c_string
are passed.On NVIDIA GPUs the written values will be flushed to the terminal after the kernel has finished executing. Note that there is a 1MB limit on the size of this buffer.
- proc assertOnGpu()¶
Will halt execution at runtime if called from outside a GPU. If used on first line in
foreach
orforall
loop will also do a compile time check that the loop is eligible for execution on a GPU.
- proc gpuClock(): uint¶
Returns value of a per-multiprocessor counter that increments every clock cycle. This function is meant to be called to time sections of code within a GPU enabled loop.
- proc gpuClocksPerSec(devNum: int)¶
Returns the number of clock cycles per second of a GPU multiprocessor. Note: currently we don’t support calling this function from within a kernel.
- proc syncThreads()¶
Synchronize threads within a GPU block.
Allocate block shared memory, enough to store
size
elements ofeltType
. Returns aCTypes.c_ptr
to the allocated array. Note that although every thread in a block calls this procedure, the same shared array is returned to all of them.- Arguments
eltType – the type of elements to allocate the array for.
size – the number of elements in each GPU thread block’s copy of the array.
- proc setBlockSize(blockSize: int)¶
Set the block size for kernels launched on the GPU.
- proc gpuAtomicAdd(ref x: ?T, val: T): void¶
When run on a GPU, atomically add ‘val’ to ‘x’ (result is stored in ‘x’).
- proc gpuAtomicSub(ref x: ?T, val: T): void¶
When run on a GPU, atomically subtract ‘val’ from ‘x’ (result is stored in ‘x’).
- proc gpuAtomicMin(ref x: ?T, val: T): void¶
When run on a GPU, atomically compare ‘x’ and ‘val’ and store the minimum in ‘x’.
- proc gpuAtomicMax(ref x: ?T, val: T): void¶
When run on a GPU, atomically compare ‘x’ and ‘val’ and store the maximum in ‘x’.
- proc gpuAtomicInc(ref x: ?T, val: T): void¶
When run on a GPU, atomically increments x if the original value of x is greater-than or equal to val, if so the result is stored in ‘x’.
- proc gpuAtomicDec(ref x: ?T, val: T): void¶
When run on a GPU, atomically determine if ‘x’ equals 0 or is greater than ‘val’. If so store ‘val’ in ‘x’ otherwise decrement ‘x’ by 1.
- proc gpuAtomicAnd(ref x: ?T, val: T): void¶
When run on a GPU, atomically perform a bitwise ‘and’ operation on ‘x’ and ‘val’ and store the result in ‘x’.
- proc gpuAtomicOr(ref x: ?T, val: T): void¶
When run on a GPU, atomically perform a bitwise ‘or’ operation on ‘x’ and ‘val’ and store the result in ‘x’.
- proc gpuAtomicXor(ref x: ?T, val: T): void¶
When run on a GPU, atomically perform a bitwise ‘xor’ operation on ‘x’ and ‘val’ and store the result in ‘x’.
- proc gpuAtomicCAS(ref x: ?T, cmp: T, val: T): void¶
When run on a GPU, atomically compare the value in ‘x’ and ‘cmp’, if they are equal store ‘val’ in ‘x’.