Locales

Chapel provides high-level abstractions that allow programmers to exploit locality by controlling the affinity of both data and tasks to abstract units of processing and storage capabilities called locales. The on-statement allows for the migration of tasks to remote locales.

Throughout this section, the term local will be used to describe the locale on which a task is running, the data located on this locale, and any tasks running on this locale. The term remote will be used to describe another locale, the data on another locale, and the tasks running on another locale.

Locales

A locale is a Chapel abstraction for a piece of a target architecture that has processing and storage capabilities. Generally speaking, the tasks running within a locale have roughly uniform access to values stored in the locale’s local memory and longer latencies for accessing the memories of other locales. As an example, a single shared memory machine would be defined as a single locale. In contrast, a cluster of network-connected multicore nodes would have a locale for each node.

Locale Types

The identifier locale is a type that abstracts a locale as described above. Both data and tasks can be associated with a value of locale type.

The default value for a variable with locale type is Locales[0].

Locale Methods

The locale type supports the following methods:

proc locale.hostname

Get the hostname of this locale.

Returns

the hostname of the compute node associated with the locale

Return type

string

proc locale.name

Get the name of this locale. In practice, this is often the same as the hostname, though in some cases (like when using local launchers), it may be modified.

Returns

locale name

Return type

string

proc locale.id

Get the unique integer identifier for this locale.

Returns

locale number, in the range 0..numLocales-1

Return type

int

proc locale.maxTaskPar

This is the maximum task concurrency that one can expect to achieve on this locale. The value is an estimate by the runtime tasking layer. Typically it is the number of physical processor cores available to the program. Creating more tasks than this will probably increase walltime rather than decrease it.

proc locale.numPUs(logical: bool = false, accessible: bool = true)

A processing unit or PU is an instance of the processor architecture, basically the thing that executes instructions. locale.numPUs tells how many of these are present on this locale. It can count either physical PUs (commonly known as cores) or hardware threads such as hyperthreads and the like. It can also either take into account any OS limits on which PUs the program has access to or do its best to ignore such limits. By default it returns the number of accessible physical cores.

Arguments
  • logical : bool – Count logical PUs (hyperthreads and the like), or physical ones (cores)? Defaults to false, for cores.

  • accessible : bool – Count only PUs that can be reached, or all of them? Defaults to true, for accessible PUs.

Returns

number of PUs

Return type

int

There are several things that can cause the OS to limit the processor resources available to a Chapel program. On plain Linux systems using the taskset(1) command will do it. On Cray systems the CHPL_LAUNCHER_CORES_PER_LOCALE environment variable may do it, indirectly via the system job launcher. Also on Cray systems, using a system job launcher (aprun or slurm) to run a Chapel program manually may do it, as can running programs within Cray batch jobs that have been set up with limited processor resources.

proc locale.callStackSize

callStackSize holds the size of a task stack on a given locale. Thus, here.callStackSize is the size of the call stack for any task on the current locale, including the caller.

proc locale.runningTasks()
Returns

the number of tasks that have begun executing, but have not yet finished

Return type

int

Note that this number can exceed the number of non-idle threads because there are cases in which a thread is working on more than one task. As one example, in fifo tasking, when a parent task creates child tasks to execute the iterations of a coforall construct, the thread the parent is running on may temporarily suspend executing the parent task in order to help with the child tasks, until the construct completes. When this occurs the count of running tasks can include both the parent task and a child, although strictly speaking only the child is executing instructions.

As another example, any tasking implementation in which threads can switch from running one task to running another, such as qthreads, can have more tasks running than threads on which to run them.

The Predefined Locales Array

Chapel provides a predefined environment that stores information about the locales used during program execution. This execution environment contains definitions for the array of locales on which the program is executing (Locales), a domain for that array (LocaleSpace), and the number of locales (numLocales).

config const numLocales: int;
const LocaleSpace: domain(1) = [0..numLocales-1];
const Locales: [LocaleSpace] locale;

When a Chapel program starts, a single task executes main on Locales(0).

Note that the Locales array is typically defined such that distinct elements refer to distinct resources on the target parallel architecture. In particular, the Locales array itself should not be used in an oversubscribed manner in which a single processor resource is represented by multiple locale values (except during development). Oversubscription should instead be handled by creating an aggregate of locale values and referring to it in place of the Locales array.

Rationale.

This design choice encourages clarity in the program’s source text and enables more opportunities for optimization.

For development purposes, oversubscription is still very useful and this should be supported by Chapel implementations to allow development on smaller machines.

Example.

The code

const MyLocales: [0..numLocales*4] locale
               = [loc in 0..numLocales*4] Locales(loc%numLocales);
on MyLocales[i] ...

defines a new array MyLocales that is four times the size of the Locales array. Each locale is added to the MyLocales array four times in a round-robin fashion.

The here Locale

A predefined constant locale here can be used anywhere in a Chapel program. It refers to the locale that the current task is running on.

Example.

The code

on Locales(1) {
  writeln(here.id);
}

results in the output 1 because the writeln statement is executed on locale 1.

The identifier here is not a keyword and can be overridden.

Querying the Locale of an Expression

The locale associated with an expression (where the expression is stored) is queried using the following syntax:

locale-query-expression:
  expression . 'locale'

When the expression is a class, the access returns the locale on which the class object exists rather than the reference to the class. If the expression is a value, it is considered local. The implementation may warn about this behavior. If the expression is a locale, it is returned directly.

Example.

Given a class C and a record R, the code

on Locales(1) {
  var x: int;
  var c: C;
  var r: R;
  on Locales(2) {
    on Locales(3) {
      c = new C();
      r = new R();
    }
    writeln(x.locale.id);
    writeln(c.locale.id);
    writeln(r.locale.id);
  }
}

results in the output

1
3
1

The variable x is declared and exists on Locales(1). The variable c is a class reference. The reference exists on Locales(1) but the object itself exists on Locales(3). The locale access returns the locale where the object exists. Lastly, the variable r is a record and has value semantics. It exists on Locales(1) even though it is assigned a value on a remote locale.

Module-scope constants that are not distributed in nature are replicated across all locales.

Example.

For example, the following code:

const c = 10;
for loc in Locales do on loc do
    writeln(c.locale.id);

outputs

0
1
2
3
4

when running on 5 locales.

The On Statement

The on statement controls on which locale a block of code should be executed or data should be placed. The syntax of the on statement is given by

on-statement:
  'on' expression 'do' statement
  'on' expression block-statement

The locale of the expression is automatically queried as described in Querying the Locale of an Expression. Execution of the statement occurs on this specified locale and then continues after the on-statement.

Return statements may not be lexically enclosed in on statements. Yield statements may only be lexically enclosed in on statements in parallel iterators Parallel Iterators.

One common code idiom in Chapel is the following, which spreads parallel tasks across the network-connected locales upon which the program is running:

coforall loc in Locales { on loc { ... } }

Remote Variable Declarations

By default, when new variables and data objects are created, they are created in the locale where the task is running. Variables can be defined within an on-statement to define them on a particular locale such that the scope of the variables is outside the on-statement. This is accomplished using a similar syntax but omitting the do keyword and braces. The syntax is given by:

remote-variable-declaration-statement:
  'on' expression variable-declaration-statement

Note

Support for this syntax is not yet implemented.