Locales¶
Chapel provides high-level abstractions that allow programmers to exploit locality by controlling the affinity of both data and tasks to abstract units of processing and storage capabilities called locales. The on-statement allows for the migration of tasks to remote locales.
Throughout this section, the term local will be used to describe the locale on which a task is running, the data located on this locale, and any tasks running on this locale. The term remote will be used to describe another locale, the data on another locale, and the tasks running on another locale.
Locales¶
A locale is a Chapel abstraction for a piece of a target architecture that has processing and storage capabilities. Generally speaking, the tasks running within a locale have roughly uniform access to values stored in the locale’s local memory and longer latencies for accessing the memories of other locales. As an example, a single shared memory machine would be defined as a single locale. In contrast, a cluster of network-connected multicore nodes would have a locale for each node.
Locale Types¶
The identifier locale
is a type that abstracts a locale as described above.
Both data and tasks can be associated with a value of locale type.
The default value for a variable with locale
type is Locales[0]
.
Locale Methods¶
The locale type supports the following methods:
- proc locale.hostname¶
Get the hostname of this locale.
- Returns
the hostname of the compute node associated with the locale
- Return type
string
- proc locale.name¶
Get the name of this locale. In practice, this is often the same as the hostname, though in some cases (like when using local launchers), it may be modified.
- Returns
locale name
- Return type
string
- proc locale.id¶
Get the unique integer identifier for this locale.
- Returns
locale number, in the range
0..numLocales-1
- Return type
int
- proc locale.maxTaskPar¶
This is the maximum task concurrency that one can expect to achieve on this locale. The value is an estimate by the runtime tasking layer. Typically it is the number of physical processor cores available to the program. Creating more tasks than this will probably increase walltime rather than decrease it.
- proc locale.numPUs(logical: bool = false, accessible: bool = true)¶
A processing unit or PU is an instance of the processor architecture, basically the thing that executes instructions.
locale.numPUs
tells how many of these are present on this locale. It can count either physical PUs (commonly known as cores) or hardware threads such as hyperthreads and the like. It can also either take into account any OS limits on which PUs the program has access to or do its best to ignore such limits. By default it returns the number of accessible physical cores.- Arguments
logical : bool – Count logical PUs (hyperthreads and the like), or physical ones (cores)? Defaults to false, for cores.
accessible : bool – Count only PUs that can be reached, or all of them? Defaults to true, for accessible PUs.
- Returns
number of PUs
- Return type
int
There are several things that can cause the OS to limit the processor resources available to a Chapel program. On plain Linux systems using the
taskset(1)
command will do it. On Cray systems theCHPL_LAUNCHER_CORES_PER_LOCALE
environment variable may do it, indirectly via the system job launcher. Also on Cray systems, using a system job launcher (aprun
orslurm
) to run a Chapel program manually may do it, as can running programs within Cray batch jobs that have been set up with limited processor resources.
- proc locale.callStackSize¶
callStackSize
holds the size of a task stack on a given locale. Thus,here.callStackSize
is the size of the call stack for any task on the current locale, including the caller.
- proc locale.runningTasks()¶
- Returns
the number of tasks that have begun executing, but have not yet finished
- Return type
int
Note that this number can exceed the number of non-idle threads because there are cases in which a thread is working on more than one task. As one example, in fifo tasking, when a parent task creates child tasks to execute the iterations of a coforall construct, the thread the parent is running on may temporarily suspend executing the parent task in order to help with the child tasks, until the construct completes. When this occurs the count of running tasks can include both the parent task and a child, although strictly speaking only the child is executing instructions.
As another example, any tasking implementation in which threads can switch from running one task to running another, such as qthreads, can have more tasks running than threads on which to run them.
The Predefined Locales Array¶
Chapel provides a predefined environment that stores information about
the locales used during program execution. This execution environment
contains definitions for the array of locales on which the program is
executing (Locales
), a domain for that array (LocaleSpace
), and
the number of locales (numLocales
).
config const numLocales: int;
const LocaleSpace: domain(1) = [0..numLocales-1];
const Locales: [LocaleSpace] locale;
When a Chapel program starts, a single task executes main
on
Locales(0)
.
Note that the Locales array is typically defined such that distinct elements refer to distinct resources on the target parallel architecture. In particular, the Locales array itself should not be used in an oversubscribed manner in which a single processor resource is represented by multiple locale values (except during development). Oversubscription should instead be handled by creating an aggregate of locale values and referring to it in place of the Locales array.
Rationale.
This design choice encourages clarity in the program’s source text and enables more opportunities for optimization.
For development purposes, oversubscription is still very useful and this should be supported by Chapel implementations to allow development on smaller machines.
Example.
The code
const MyLocales: [0..numLocales*4] locale = [loc in 0..numLocales*4] Locales(loc%numLocales); on MyLocales[i] ...defines a new array
MyLocales
that is four times the size of theLocales
array. Each locale is added to theMyLocales
array four times in a round-robin fashion.
The here Locale¶
A predefined constant locale here
can be used anywhere in a Chapel
program. It refers to the locale that the current task is running on.
Example.
The code
on Locales(1) { writeln(here.id); }results in the output
1
because thewriteln
statement is executed on locale 1.
The identifier here
is not a keyword and can be overridden.
Querying the Locale of an Expression¶
The locale associated with an expression (where the expression is stored) is queried using the following syntax:
locale-query-expression:
expression . 'locale'
When the expression is a class, the access returns the locale on which the class object exists rather than the reference to the class. If the expression is a value, it is considered local. The implementation may warn about this behavior. If the expression is a locale, it is returned directly.
Example.
Given a class C and a record R, the code
on Locales(1) { var x: int; var c: C; var r: R; on Locales(2) { on Locales(3) { c = new C(); r = new R(); } writeln(x.locale.id); writeln(c.locale.id); writeln(r.locale.id); } }results in the output
1 3 1The variable
x
is declared and exists onLocales(1)
. The variablec
is a class reference. The reference exists onLocales(1)
but the object itself exists onLocales(3)
. The locale access returns the locale where the object exists. Lastly, the variabler
is a record and has value semantics. It exists onLocales(1)
even though it is assigned a value on a remote locale.
Module-scope constants that are not distributed in nature are replicated across all locales.
Example.
For example, the following code:
const c = 10; for loc in Locales do on loc do writeln(c.locale.id);outputs
0 1 2 3 4when running on 5 locales.
The On Statement¶
The on statement controls on which locale a block of code should be executed or data should be placed. The syntax of the on statement is given by
on-statement:
'on' expression 'do' statement
'on' expression block-statement
The locale of the expression is automatically queried as described
in Querying the Locale of an Expression. Execution of the
statement occurs on this specified locale and then continues after the
on-statement
.
Return statements may not be lexically enclosed in on statements. Yield statements may only be lexically enclosed in on statements in parallel iterators Parallel Iterators.
One common code idiom in Chapel is the following, which spreads parallel tasks across the network-connected locales upon which the program is running:
coforall loc in Locales { on loc { ... } }
Remote Variable Declarations¶
By default, when new variables and data objects are created, they are
created in the locale where the task is running. Variables can be
defined within an on-statement
to define them on a particular locale
such that the scope of the variables is outside the on-statement
.
This is accomplished using a similar syntax but omitting the do
keyword and braces. The syntax is given by:
remote-variable-declaration-statement:
'on' expression variable-declaration-statement
Note
Support for this syntax is not yet implemented.