BLAS

Usage

use BLAS;

or

import BLAS;

Submodules

Support for Basic Linear Algebra Subprograms (BLAS) kernel routines.

BLAS (Basic Linear Algebra Subprograms) are the de facto standard routines for low-level matrix, vector, and scalar operations. While netlib provides the official reference version of BLAS, this documentation refers to the MKL BLAS documentation, due to interface similarities.

This module is intended to work with non-distributed dense rectangular (DefaultRectangular) arrays.

Compiling with BLAS

In order to compile a Chapel program that uses this module, the BLAS and C_BLAS (C wrappers to BLAS) libraries must be installed on the system. The paths to both the cblas.h header file and BLAS library must be passed to the -I and -L compiler arguments. The library name, typically blas, must be passed to the -l argument as well.

The compilation command should look something like this:

chpl -I$PATH_TO_CBLAS_DIR \
     -L$PATH_TO_BLAS_LIBS -lblas source.chpl
BLAS Implementations:

There is a wide range of BLAS implementations available. This module was built and tested with netlib’s C_BLAS, but many other implementations are compatible as well. Using a version of BLAS optimized for the user’s system will yield the best performance.

There are a few known caveats with some popular implementations:

  • ATLAS

    • Compilation (linking) requires the additional flag: -latlas

  • MKL

    • Compilation requires the additional flag in order to use the MKL header instead: --set blasImpl=mkl

  • OpenBLAS

    • Compilation may generate warnings about incompatible pointer types, which may be ignored. These warnings are due to the header files of OpenBLAS differing from the reference C_BLAS prototypes for complex arguments by using float* and double* pointers, instead of void* pointers.

  • Cray LibSci

    • On Cray systems with the cray-libsci module loaded, no compiler flags should be necessary when compiling programs that use BLAS. This is typically loaded by default, but can be manually loaded with module load cray-libsci as well. Chapel programs compiled on Crays utilize the cc wrapper as the backend compiler, which implicitly links against the libsci library. Therefore, no additional steps are required of the user.

Chapel BLAS API

This module provides higher-level wrappers around the BLAS functions. These provide reasonable default values for many less commonly used arguments and determine the appropriate functions to call, based on the array element types, as well as the array dimensions. The other functionality is identical to the corresponding BLAS functions.

The names of these routines are identical to the corresponding BLAS functions, except that the type prefix is dropped. For instance, gemm is the wrapper for the [sdcz]gemm routines.

The native BLAS interface can still be accessed by calling routines from the C_BLAS submodule.

The ldA argument is omitted from the Chapel BLAS API. Chapel determines the dimensions of the matrices from the arrays that are passed in, even when one is passing in a sub-array such that the array elements are not contiguously stored in memory.

enum BlasImpl { blas, mkl, off }

Available BLAS implementations for blasImpl

enum constant blas
enum constant mkl
enum constant off
config param blasImpl = BlasImpl.blas

Specifies which header filename to include, based on the BLAS implementation.

Most BLAS implementations rely on cblas.h, which is used when blasImpl = blas, the default setting.

  • blas includes cblas.h (default)

  • mkl includes mkl_cblas.h

  • off includes nothing

config param blasHeader = ""

Manually specifies the header filename to include. This flag overrides the header determined by blasImpl.

This flag should only be necessary if using an BLAS implementation with a unique header name that is not supported by blasImpl. However, no guarantees can be made about this module working with untested implementations.

proc isBLASType(type t) param : bool

Return true if type is supported by BLAS

enum Order { Row = 101 : c_int, Col }

Define row or column order

enum constant Row = 101 : c_int
enum constant Col
enum Op { N = 111 : c_int, T, H }

Operation of matrix : none, transpose, or adjoint

enum constant N = 111 : c_int
enum constant T
enum constant H
enum Uplo { Upper = 121 : c_int, Lower }

Storage for symmetric matrices

enum constant Upper = 121 : c_int
enum constant Lower
enum Diag { NonUnit = 131 : c_int, Unit }

Assume a unit or non-unit diagonal

enum constant NonUnit = 131 : c_int
enum constant Unit
enum Side { Left = 141 : c_int, Right }

Operate on the left or right side

enum constant Left = 141 : c_int
enum constant Right
proc gemm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, opA: Op = Op.N, opB: Op = Op.N, order: Order = Order.Row)  where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2

Wrapper for the GEMM routines:

C := alpha * op(A) * op(B) + beta * C
proc symm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row)  where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2

Wrapper for the SYMM routines:

C := alpha * A * B + beta * C

or:

C := alpha * B * A + beta * C

where A is a symmetric matrix.

proc hemm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row)  where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2

Wrapper for the HEMM routines:

C := alpha * A * B + beta * C

or:

C := alpha * B * A + beta * C

where A is an hermitian matrix.

proc syrk(A: [?Adom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)  where Adom.rank == 2 && Cdom.rank == 2

Wrapper for the SYRK routines:

C := alpha * A * A**T + beta * C

or:

C := alpha * A**T * A + beta * C

where C is a symmetric matrix, and A**T is the transpose of A.

proc herk(A: [?Adom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)  where Adom.rank == 2 && Cdom.rank == 2

Wrapper for the HERK routines:

C := alpha * A * A**H + beta * C

or:

C := alpha * A**H * A + beta * C

where C is an hermitian matrix, and A**H is the conjugate transpose of A.

proc syr2k(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)  where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2

Wrapper for the SYR2K routines:

C := alpha * A * B**T + alpha * B * A**T +  beta * C

or:

C := alpha * A**T * B + alpha * B**T * A +  beta * C

where C is a symmetric matrix, and A**T and B**T are the transpose of A and B , respectively.

proc her2k(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)  where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2

Wrapper for the HER2K routines:

C := alpha * A * B**H + conjg(alpha) * B * A**H +  beta * C

or:

C := alpha * A**H * B + conjg(alpha) * B**H * A +  beta * C

where C is an hermitian matrix, B**H is the conjugate transpose of B , conjg(alpha) is the complex conjugate of alpha.

proc trmm(A: [?Adom] ?eltType, ref B: [?Bdom] eltType, alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws  where Adom.rank == 2 && Bdom.rank == 2

Wrapper for the TRMM routines:

B := alpha * op(A) * B

or:

B := alpha * B * op(A)

where A is a triangular matrix.

Throws:

IllegalArgumentError – When B is a non-square array.

proc trsm(A: [?Adom], ref B: [?Bdom], alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws  where Adom.rank == 2 && Bdom.rank == 2

Wrapper for the TRSM routines:

op(A) * X = alpha * B

or:

X * op(A) = alpha * B

where A is a triangular matrix.

Throws:

IllegalArgumentError – When B is a non-square array.

proc gemv(A: [?Adom] ?eltType, x: [?xdom] eltType, ref y: [?ydom] eltType, alpha, beta, opA: Op = Op.N, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)  where Adom.rank == 2 && xdom.rank == 1 && ydom.rank == 1

Wrapper for the GEMV routines:

y := alpha*op(A)*x + beta*y,

where A is an m``x``n matrix.

proc ger(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)  where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1

Wrapper for GER routines:

A := alpha*x*y'+ A
proc gerc(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)  where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1

Wrapper for GERC routines:

A := alpha*x*conjg(y') + A
proc geru(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)  where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1

Wrapper for the GERU routines:

A := alpha*x*y' + A
proc hemv(A: [?Adom] ?eltType, X: [?vDom] eltType, ref Y: [vDom] eltType, alpha: eltType, beta: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for the HEMV routines:

y := alpha*A*x + beta*y
Throws:

IllegalArgumentError – When A is a non-square array.

proc her(ref A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for the HER routines:

A := alpha*x*conjg(x') + A
Throws:

IllegalArgumentError – When A is a non-square array.

proc her2(ref A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for HER2 routines:

A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A
Throws:

IllegalArgumentError – When A is a non-square array.

proc symv(A: [?Adom] ?eltType, X: [?vDom] eltType, ref Y: [vDom] eltType, alpha, beta, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for the SYMV routines:

y := alpha*A*x + beta*y
Throws:

IllegalArgumentError – When A is a non-square array.

proc syr(ref A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for SYR routines:

A := alpha*x*x' + A
Throws:

IllegalArgumentError – When A is a non-square array.

proc syr2(ref A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for SYR2 routines:

A := alpha*x*y'+ alpha*y*x' + A
Throws:

IllegalArgumentError – When A is a non-square array.

proc trmv(A: [?Adom] ?eltType, ref X: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for TRMV routines:

x := op(A)*x
Throws:

IllegalArgumentError – When A is a non-square array.

proc trsv(A: [?Adom] ?eltType, ref B: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws  where Adom.rank == 2 && vDom.rank == 1

Wrapper for the TRSV routines:

A*op(x) = b
Throws:

IllegalArgumentError – When A is a non-square array.

proc rotg(ref a: ?eltType, ref b: eltType, ref c: eltType, ref s: eltType)

Wrapper for the ROTG routines

Construct Givens plane rotation of point p defined by Cartesian coordinates (a, b):

| c  s |   |a|   |r|
|      | . | | = | |
|-s  c |   |b|   |0|

Input:

  • a: x-coordinate of point p

  • b: y-coordinate of point p

Output:

  • a: stores length vector (r) of inputs (a, b)

  • b: stores z parameter that is defined below

  • c: stores value of c element defined above

  • s: stores value of s element defined above

The z parameter (stored in b) is defined such that:

if |a| > |b| then z = s;
else if c != 0 then z = 1/c;
else z = 1
proc rotmg(ref d1: ?eltType, ref d2: eltType, ref b1: eltType, b2: eltType, ref P: [] eltType) throws

Wrapper for the ROTMG routines

Generate Givens rotation of points:

|b1|     |b1*sqrt(d1)|
|  | = H.|           |
|0 |     |b2*sqrt(d2)|

Input:

  • d1: Scaling factor for b1 (x-axis)

  • d2: Scaling factor for b2 (y-axis)

  • b1: x-coordinate of input vector

  • b2: y-coordinate of input vector

Output:

  • d1: Provides the first element the diagonal matrix

  • d2: Provides the second element the diagonal matrix

  • b1: Provides the b1 rotated (rotated x coordination) of the vector

  • P: Parameter array of 5 elements, detailed below:

                            |P[1] P[3]|
    if P[0] == -1 then  H = |         |
                            |P[2] P[4]|
    
                            |1.0  P[3]|
    if P[0] ==  0 then  H = |         |
                            |P[2]  1.0|
    
                            |P[1]  1.0|
    if P[0] == 1 then   H = |         |
                            |-1.0 P[4]|
    
                            |1   0|
    if P[0] == -2 then  H = |     |
                            |0   1|
    
Throws:

IllegalArgumentError – When P does not consist of exactly five elements.

proc rot(ref X: [?D] ?eltType, ref Y: [D] eltType, c: eltType, s: eltType, incY: c_int = 1, incX: c_int = 1)  where D.rank == 1

Wrapper for the ROT routines

Replaces the value elements of two vectors X and Y using the equation:

X[i] = c*X[i] + s*X[i]
Y[i] = c*Y[i] - s*X[i]

Input:

  • X: Input vector

  • Y: Input vector

  • c: Scalar parameter

  • s: Scalar parameter

  • incY: Defines the increment for the vector Y

  • incX: Defines the increment for the vector X

Output:

  • X: Vector with updated elements

  • Y: Vector with updated elements

proc rotm(ref X: [?D] ?eltType, ref Y: [D] eltType, ref P: [] eltType, incY: c_int = 1, incX: c_int = 1) throws  where D.rank == 1

Wrapper for the ROTM routines

Executes the modified Givens rotations with element wise substitution:

|X[i]|     |X[i]|
|    | = H.|    |
|Y[i]|     |Y[i]|

Input:

  • X: Input vector

  • Y: Input vector

  • incY: Defines the increment for the vector Y

  • incX: Defines the increment for the vector X

  • P: Parameter array of 5 elements, detailed below:

                            |P[1] P[3]|
    if P[0] == -1 then  H = |         |
                            |P[2] P[4]|
    
                            |1.0  P[3]|
    if P[0] ==  0 then  H = |         |
                            |P[2]  1.0|
    
                            |P[1]  1.0|
    if P[0] == 1 then   H = |         |
                            |-1.0 P[4]|
    
                            |1   0|
    if P[0] == -2 then  H = |     |
                            |0   1|
    

Output:

  • X: Vector with updated elements

  • Y: Vector with updated elements

Throws:

IllegalArgumentError – When P does not consist of exactly five elements.

proc swap(ref X: [?D] ?eltType, ref Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)  where D.rank == 1

Wrapper for the SWAP routines

Exchanges elements of two vectors.

Input:

  • X: Input vector

  • Y: Input vector

  • incY: Defines the increment for the vector Y

  • incX: Defines the increment for the vector X

Output:

  • X: Contains input Y elements

  • Y: Contains input X elements

proc scal(ref X: [?D] ?eltType, alpha: eltType, incX: c_int = 1)  where D.rank == 1

Wrapper for the SCAL routines

Calculates the product of a vector X with scalar alpha:

X' = alpha*X

Input:

  • X: Input vector

  • alpha: Scalar parameter

Output:

  • X: Vector updated by the equation: X[i] = alpha*X[i]

proc copy(X: [?D] ?eltType, ref Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)  where D.rank == 1

Wrapper for the COPY routines

Copies one vector (X the source) to another (Y the destination):

Y = X

Input:

  • X: Input vector

  • Y: Input vector

  • incX: Defines the increment for the vector X

  • incY: Defines the increment for the vector Y

Output:

Y: Contains the values copied from X vector

proc axpy(X: [?D] ?eltType, ref Y: [D] eltType, alpha: eltType, incY: c_int = 1, incX: c_int = 1)  where D.rank == 1

Wrapper for the AXPY routines

Computes the vector-scalar product of apha and X and adds the result to Y:

Y = alpha*X + Y

Input:

  • X: Input vector

  • alpha: Scalar parameter

  • incX: Defines the increment for the vector X

  • incY: Defines the increment for the vector Y

Output:

  • Y: Vector updated by the equation: Y[i] = alpha*X[i]+Y[i]

proc dot(X: [?xD] ?eltType, Y: [?yD] eltType, incY: c_int = 1, incX: c_int = 1) : eltType  where xD.rank == 1 && yD.rank == 1

Wrapper for DOT routines

Returns the dot product of two vectors:

X*Y
Arguments:
  • X – Input vector

  • Y – Input vector

  • incX – Defines the increment for the vector X

  • incY – Defines the increment for the vector Y

Returns:

Scalar value of dot product

proc dotu(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)  where D.rank == 1

Wrapper for DOTU routines (DOTU_SUB)

Obtains the dot product of two complex vectors:

X*Y
Arguments:
  • X – Input vector

  • Y – Input vector

  • incX – Defines the increment for the vector X

  • incY – Defines the increment for the vector Y

Returns:

The complex dot product

proc dotc(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)  where D.rank == 1

Wrapper for DOTC routines (DOTC_SUB)

Obtains the dot product of conjugated X vector with Y vector:

conj(X)*Y
Arguments:
  • X – Conjugated input vector

  • Y – Input vector

  • incX – Defines the increment for the vector X

  • incY – Defines the increment for the vector Y

Returns:

The complex dot product

proc dsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1) : real(64)  where D.rank == 1

Wrapper for SDOT routines (DSDOT variant)

Returns the dot product of two real(32) vectors as a real(64), using real(64) precision internally:

X*Y
Arguments:
  • X – Input vector

  • Y – Input vector

  • incX – Defines the increment for the vector X

  • incY – Defines the increment for the vector Y

Returns:

Scalar value of dot product

proc sdsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1) : real(32)  where D.rank == 1

Wrapper for SDOT routines (SDSDOT variant)

Returns the dot product of two real(32) vectors as a real(32), using real(64) precision internally:

X*Y
Arguments:
  • X – Input vector

  • Y – Input vector

  • incX – Defines the increment for the vector X

  • incY – Defines the increment for the vector Y

Returns:

Scalar value of dot product

proc nrm2(X: [?D] ?eltType, incX: c_int = 1)  where D.rank == 1

Wrapper for NRM2 routines

Returns the Euclidean norm of vector X:

||X||
Arguments:
  • X – Input vector

  • incX – Defines the increment for the vector X

Returns:

The 2-norm of X vector

proc asum(X: [?D] ?eltType, incX: c_int = 1)  where D.rank == 1

Wrapper for the ASUM routines

Returns the sum of the magnitude values of X elements:

|Re X[1]| + |Im X[1]| + |Re  X[2]| + |Im  X[2]|+ ... + |Re  X[N]| + |Im X[N]|.
Arguments:
  • X – Input vector

  • incX – Defines the increment for the vector X

Returns:

The 1-norm of X vector

proc amax(X: [?D] ?eltType, incX: c_int = 1)  where D.rank == 1

Wrapper for AMAX routines

Returns the index of element in the vector with maximum absolute value.

Arguments:
  • X – Input vector

  • incX – Defines the increment for the vector X

Returns:

The index of maximum absolute value