Module: CommDiagnostics

This module provides support for reporting and counting communication operations between network-connected locales. The operations include various kinds of remote reads (GETs), remote writes (PUTs), and remote forks (which are actually more like remote procedure calls, and are used to implement on statements). Callers can request on-the-fly output each time a remote operation occurs, or count such operations as they occur and retrieve the counts later. The former gives more detailed information but has much more overhead. The latter has much less overhead but only provides aggregate information.

On-the-fly Reporting

All forms of communication reporting and counting are done between pairs of function calls that turn it on and off. On-the-fly reporting across all locales is done like this:

startVerboseComm();
// between start/stop calls, report comm ops initiated on any locale
stopVerboseComm();

On-the-fly reporting for just the calling locale is similar. Only the procedure names change:

startVerboseCommHere();
// between start/stop calls, report comm ops initiated on this locale
stopVerboseCommHere();

In either case, the output produced consists of a line written to stdout for each communication operation. (Here stdout means the file associated with the process, not the Chapel channel with the same name.)

Consider this little example program:

use CommDiagnostics;
proc main() {
  startVerboseComm();
  var x: int = 1;
  on Locales(1) {     // should fork a blocking task onto locale 1
    x = x + 1;        // should invoke a remote put and a remote get
  }
  stopVerboseComm();
}

Executing this on two locales with the -nl 2 command line option results in the following output:

0: remote task created on 1
1: t.chpl:6: remote get from 0
1: t.chpl:6: remote put to 0

The initial number refers to the locale reporting the communication event. The file name and line number point to the place in the code that triggered the communication event. (For remote forks, file name and line number information is not yet reported.)

Counting Communication Operations

Counting communication operations requires a few more calls then just reporting them does. In particular, the counts have to be retrieved after they are collected and, if they have been used previously, the internal counters have to be reset before counting is turned on. Counting across all locales is done like this:

// (optional) if we counted previously, reset the counters to zero
resetCommDiagnostics();
startCommDiagnostics();
// between start/stop calls, count comm ops initiated on any locale
stopCommDiagnostics();
// retrieve the counts and report the results
writeln(getCommDiagnostics());

Counting on just the calling locale is similar. Just as for on-the-fly reporting, only the procedure names change:

// (optional) if we counted previously, reset the counters to zero
resetCommDiagnosticsHere();
startCommDiagnosticsHere();
// between start/stop calls, count comm ops initiated on this locale
stopCommDiagnosticsHere();
// retrieve the counts and report the results
writeln(getCommDiagnosticsHere());

The optional call to reset the counters is only needed when a program collects counts more than once. In this case, the counters have to be set back to zero before starting the second and succeeding counting periods. By far the most common situation is that programs only collect communication counts once per run, in which case this step is not needed.

Note that the same internal mechanisms and counters are used for counting on all locales and counting on just the calling locale, so trying to do both at once may lead to surprising turn-on/turn-off behavior and/or incorrect results.

Consider this little example program:

use CommDiagnostics;
proc main() {
  startCommDiagnostics();
  var x: int = 1;
  on Locales(1) {     // should fork a blocking task onto locale 1
    x = x + 1;        // should invoke a remote put and a remote get
  }
  stopCommDiagnostics();
  writeln(getCommDiagnostics());
}

Executing this on two locales with the -nl 2 command line option results in the following output:

(get = 0, get_nb = 0, put = 0, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 1, fork_fast = 0, fork_nb = 0) (get = 1, get_nb = 0, put = 1, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 0, fork_fast = 0, fork_nb = 0)

The first parenthesized group contains the counts for locale 0, and the second contains the counts for locale 1. So, for the instrumented section of this program we can say that a remote fork was executed on locale 0, and a remote get and a remote put were executed on locale 1.

record chpl_commDiagnostics

Aggregated communication operation counts. This record type is defined in the same way by both the underlying comm layer(s) and this module, because we don’t have a good way to inherit types back and forth between the two. This first definition duplicates the one in the comm layer(s).

var get: uint(64)

blocking GETs, in which initiator waits for completion

var get_nb: uint(64)

non-blocking GETs

var put: uint(64)

blocking PUTs, in which initiator waits for completion

var put_nb: uint(64)

non-blocking PUTs

var test_nb: uint(64)

tests for non-blocking GET/PUT completions

var wait_nb: uint(64)

blocking waits for non-blocking GET/PUT completions

var try_nb: uint(64)

non-blocking waits for non-blocking GET/PUT completions

var fork: uint(64)

blocking remote executions, in which initiator waits for completion

var fork_fast: uint(64)

blocking remote executions performed by the target locale’s Active Message handler

var fork_nb: uint(64)

non-blocking remote executions

type commDiagnostics = chpl_commDiagnostics

The Chapel record type inherits the comm layer definition of it.

proc startVerboseComm()

Start on-the-fly reporting of communication initiated on any locale.

proc stopVerboseComm()

Stop on-the-fly reporting of communication initiated on any locale.

proc startVerboseCommHere()

Start on-the-fly reporting of communication initiated on this locale.

proc stopVerboseCommHere()

Stop on-the-fly reporting of communication initiated on this locale.

proc startCommDiagnostics()

Start counting communication operations across the whole program.

proc stopCommDiagnostics()

Stop counting communication operations across the whole program.

proc startCommDiagnosticsHere()

Start counting communication operations initiated on this locale.

proc stopCommDiagnosticsHere()

Stop counting communication operations initiated on this locale.

proc resetCommDiagnostics()

Reset aggregate communication counts across the whole program.

proc resetCommDiagnosticsHere()

Reset aggregate communication counts on the calling locale.

proc getCommDiagnostics()

Retrieve aggregate communication counts for the whole program.

Returns:array of counts of comm ops initiated on each locale
Return type:[LocaleSpace] commDiagnostics
proc getCommDiagnosticsHere()

Retrieve aggregate communication counts for this locale.

Returns:counts of comm ops initiated on this locale
Return type:commDiagnostics