BLAS

Usage

use BLAS;

Submodules

Support for Basic Linear Algebra Subprograms (BLAS) kernel routines.

BLAS (Basic Linear Algebra Subprograms) are the de facto standard routines for low-level matrix, vector, and scalar operations. While netlib provides the official reference version of BLAS, this documentation refers to the MKL BLAS documentation, due to interface similarities.

This module is intended to work with non-distributed dense rectangular (DefaultRectangular) arrays.

Compiling with BLAS

In order to compile a Chapel program that uses this module, the BLAS and C_BLAS (C wrappers to BLAS) libraries must be installed on the system. The paths to both the cblas.h header file and BLAS library must be passed to the -I and -L compiler arguments. The library name, typically blas, must be passed to the -l argument as well.

The compilation command should look something like this:

chpl -I$PATH_TO_CBLAS_DIR \
     -L$PATH_TO_BLAS_LIBS -lblas source.chpl
BLAS Implementations:

There is a wide range of BLAS implementations available. This module was built and tested with netlib's C_BLAS, but many other implementations are compatible as well. Using a version of BLAS optimized for the user's system will yield the best performance.

There are a few known caveats with some popular implementations:

  • ATLAS
    • Compilation (linking) requires the additional flag: -latlas
  • MKL
    • Compilation requires the additional flag in order to use the MKL header instead: --set blasImpl=mkl
  • OpenBLAS
    • Compilation will generate warnings about incompatible pointer types, which may be ignored. These warnings are due to the header files of OpenBLAS differing from the reference C_BLAS prototypes for complex arguments by using float* and double* pointers, instead of void* pointers.
  • Cray LibSci
    • On Cray systems with the cray-libsci module loaded, no compiler flags should be necessary when compiling programs that use BLAS. This is typically loaded by default, but can be manually loaded with module load cray-libsci as well. Chapel programs compiled on Crays utilize the cc wrapper as the backend compiler, which implicitly links against the libsci library. Therefore, no additional steps are required of the user.

Chapel BLAS API

This module provides higher-level wrappers around the BLAS functions. These provide reasonable default values for many less commonly used arguments and determine the appropriate functions to call, based on the array element types, as well as the array dimensions. The other functionality is identical to the corresponding BLAS functions.

The names of these routines are identical to the corresponding BLAS functions, except that the type prefix is dropped. For instance, gemm is the wrapper for the [sdcz]gemm routines.

The native BLAS interface can still be accessed by calling routines from the C_BLAS submodule.

The ldA argument is omitted from the Chapel BLAS API. Chapel determines the dimensions of the matrices from the arrays that are passed in, even when one is passing in a sub-array such that the array elements are not contiguously stored in memory.

Warning

The CHPL_LOCALE_MODEL=numa configuration is currently not supported by this module.

enum BlasImpl { blas, mkl, none }

Available BLAS implementations for blasImpl

config param blasImpl = BlasImpl.blas

Specifies which header filename to include, based on the BLAS implementation.

Most BLAS implementations rely on cblas.h, which is used when blasImpl = blas, the default setting.

  • blas includes cblas.h (default)
  • mkl includes mkl_cblas.h
  • none includes nothing
config param blasHeader = ""

Manually specifies the header filename to include. This flag overrides the header determined by blasImpl.

This flag should only be necessary if using an BLAS implementation with a unique header name that is not supported by blasImpl. However, no guarantees can be made about this module working with untested implementations.

config param isBLAS_MKL = false

Deprecated. Use --set blasImpl=mkl instead

proc isBLASType(type t) param: bool

Return true if type is supported by BLAS

enum Order { Row = 101: c_int, Col }

Define row or column order

enum Op { N = 111: c_int, T, H }

Operation of matrix : none, transpose, or adjoint

enum Uplo { Upper = 121: c_int, Lower }

Storage for symmetric matrices

enum Diag { NonUnit = 131: c_int, Unit }

Assume a unit or non-unit diagonal

enum Side { Left = 141: c_int, Right }

Operate on the left or right side

proc gemm(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, opA: Op = Op.N, opB: Op = Op.N, order: Order = Order.Row)

Wrapper for the GEMM routines:

C := alpha * op(A) * op(B) + beta * C
proc symm(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row)

Wrapper for the SYMM routines:

C := alpha * A * B + beta * C

or:

C := alpha * B * A + beta * C

where A is a symmetric matrix.

proc hemm(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row)

Wrapper for the HEMM routines:

C := alpha * A * B + beta * C

or:

C := alpha * B * A + beta * C

where A is an hermitian matrix.

proc syrk(A: [?Adom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)

Wrapper for the SYRK routines:

C := alpha * A * A**T + beta * C

or:

C := alpha * A**T * A + beta * C

where C is a symmetric matrix, and A**T is the transpose of A.

proc herk(A: [?Adom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)

Wrapper for the HERK routines:

C := alpha * A * A**H + beta * C

or:

C := alpha * A**H * A + beta * C

where C is an hermitian matrix, and A**H is the conjugate transpose of A.

proc syr2k(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)

Wrapper for the SYR2K routines:

C := alpha * A * B**T + alpha * B * A**T +  beta * C

or:

C := alpha * A**T * B + alpha * B**T * A +  beta * C

where C is a symmetric matrix, and A**T and B**T are the transpose of A and B , respectively.

proc her2k(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)

Wrapper for the HER2K routines:

C := alpha * A * B**H + conjg(alpha) * B * A**H +  beta * C

or:

C := alpha * A**H * B + conjg(alpha) * B**H * A +  beta * C

where C is an hermitian matrix, B**H is the conjugate transpose of B , conjg(alpha) is the complex conjugate of alpha.

proc trmm(A: [?Adom] ?eltType, B: [?Bdom] eltType, alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws

Wrapper for the TRMM routines:

B := alpha * op(A) * B

or:

B := alpha * B * op(A)

where A is a triangular matrix.

proc trsm(A: [?Adom], B: [?Bdom], alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws

Wrapper for the TRSM routines:

op(A) * X = alpha * B

or:

X * op(A) = alpha * B

where A is a triangular matrix.

proc gemv(A: [?Adom] ?eltType, x: [?xdom] eltType, y: [?ydom] eltType, alpha, beta, opA: Op = Op.N, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)

Wrapper for the GEMV routines:

y := alpha*op(A)*x + beta*y,

where A is an m``x``n matrix.

proc ger(A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)

Wrapper for GER routines:

A := alpha*x*y'+ A
proc gerc(A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, ref alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)

Wrapper for GERC routines:

A := alpha*x*conjg(y') + A
proc geru(A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, ref alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)

Wrapper for the GERU routines:

A := alpha*x*y' + A
proc hemv(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, ref alpha: eltType, ref beta: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws

Wrapper for the HEMV routines:

y := alpha*A*x + beta*y
proc her(A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws

Wrapper for the HER routines:

A := alpha*x*conjg(x') + A
proc her2(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, ref alpha: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws

Wrapper for HER2 routines:

A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A
proc symv(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha, beta, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws

Wrapper for the SYMV routines:

y := alpha*A*x + beta*y
proc syr(A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws

Wrapper for SYR routines:

A := alpha*x*x' + A
proc syr2(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws

Wrapper for SYR2 routines:

A := alpha*x*y'+ alpha*y*x' + A
proc trmv(A: [?Adom] ?eltType, X: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws

Wrapper for TRMV routines:

x := op(A)*x
proc trsv(A: [?Adom] ?eltType, B: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws

Wrapper for the TRSV routines:

A*op(x) = b
proc rotg(ref a: ?eltType, ref b: eltType, ref c: eltType, ref s: eltType)

Wrapper for the ROTG routines

Construct Givens plane rotation of point p defined by Cartesian coordinates (a, b):

| c  s |   |a|   |r|
|      | . | | = | |
|-s  c |   |b|   |0|

Input:

  • a: x-coordinate of point p
  • b: y-coordinate of point p

Output:

  • a: stores length vector (r) of inputs (a, b)
  • b: stores z parameter that is defined below
  • c: stores value of c element defined above
  • s: stores value of s element defined above

The z parameter (stored in b) is defined such that:

if |a| > |b| then z = s;
else if c != 0 then z = 1/c;
else z = 1
proc rotmg(ref d1: ?eltType, ref d2: eltType, ref b1: eltType, b2: eltType, P: [] eltType) throws

Wrapper for the ROTMG routines

Generate Givens rotation of points:

|b1|     |b1*sqrt(d1)|
|  | = H.|           |
|0 |     |b2*sqrt(d2)|

Input:

  • d1: Scaling factor for b1 (x-axis)
  • d2: Scaling factor for b2 (y-axis)
  • b1: x-coordinate of input vector
  • b2: y-coordinate of input vector

Output:

  • d1: Provides the first element the diagonal matrix

  • d2: Provides the second element the diagonal matrix

  • b1: Provides the b1 rotated (rotated x coordination) of the vector

  • P: Parameter array of 5 elements, detailed below:

                            |P[1] P[3]|
    if P[0] == -1 then  H = |         |
                            |P[2] P[4]|
    
                            |1.0  P[3]|
    if P[0] ==  0 then  H = |         |
                            |P[2]  1.0|
    
                            |P[1]  1.0|
    if P[0] == 1 then   H = |         |
                            |-1.0 P[4]|
    
                            |1   0|
    if P[0] == -2 then  H = |     |
                            |0   1|
    
proc rot(X: [?D] ?eltType, Y: [D] eltType, c: eltType, s: eltType, incY: c_int = 1, incX: c_int = 1)

Wrapper for the ROT routines

Replaces the value elements of two vectors X and Y using the equation:

X[i] = c*X[i] + s*X[i]
Y[i] = c*Y[i] - s*X[i]

Input:

  • X: Input vector
  • Y: Input vector
  • c: Scalar parameter
  • s: Scalar parameter
  • incY: Defines the increment for the vector Y
  • incX: Defines the increment for the vector X

Output:

  • X: Vector with updated elements
  • Y: Vector with updated elements
proc rotm(X: [?D] ?eltType, Y: [D] eltType, P: [] eltType, incY: c_int = 1, incX: c_int = 1) throws

Wrapper for the ROTM routines

Executes the modified Givens rotations with element wise substitution:

|X[i]|     |X[i]|
|    | = H.|    |
|Y[i]|     |Y[i]|

Input:

  • X: Input vector

  • Y: Input vector

  • incY: Defines the increment for the vector Y

  • incX: Defines the increment for the vector X

  • P: Parameter array of 5 elements, detailed below:

                            |P[1] P[3]|
    if P[0] == -1 then  H = |         |
                            |P[2] P[4]|
    
                            |1.0  P[3]|
    if P[0] ==  0 then  H = |         |
                            |P[2]  1.0|
    
                            |P[1]  1.0|
    if P[0] == 1 then   H = |         |
                            |-1.0 P[4]|
    
                            |1   0|
    if P[0] == -2 then  H = |     |
                            |0   1|
    

Output:

  • X: Vector with updated elements
  • Y: Vector with updated elements
proc swap(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)

Wrapper for the SWAP routines

Exchanges elements of two vectors.

Input:

  • X: Input vector
  • Y: Input vector
  • incY: Defines the increment for the vector Y
  • incX: Defines the increment for the vector X

Output:

  • X: Contains input Y elements
  • Y: Contains input X elements
proc scal(X: [?D] ?eltType, ref alpha: eltType, incX: c_int = 1)

Wrapper for the SCAL routines

Calculates the product of a vector X with scalar alpha:

X' = alpha*X

Input:

  • X: Input vector
  • alpha: Scalar parameter

Output:

  • X: Vector updated by the equation: X[i] = alpha*X[i]
proc copy(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)

Wrapper for the COPY routines

Copies one vector (X the source) to another (Y the destination):

Y = X

Input:

  • X: Input vector
  • Y: Input vector
  • incX: Defines the increment for the vector X
  • incY: Defines the increment for the vector Y

Output:

Y: Contains the values copied from X vector

proc axpy(X: [?D] ?eltType, Y: [D] eltType, ref alpha: eltType, incY: c_int = 1, incX: c_int = 1)

Wrapper for the AXPY routines

Computes the vector-scalar product of apha and X and adds the result to Y:

Y = alpha*X + Y

Input:

  • X: Input vector
  • alpha: Scalar parameter
  • incX: Defines the increment for the vector X
  • incY: Defines the increment for the vector Y

Output:

  • Y: Vector updated by the equation: Y[i] = alpha*X[i]+Y[i]
proc dot(X: [?xD] ?eltType, Y: [?yD] eltType, incY: c_int = 1, incX: c_int = 1): eltType

Wrapper for DOT routines

Returns the dot product of two vectors:

X*Y

Input:

  • X: Input vector
  • Y: Input vector
  • incX: Defines the increment for the vector X
  • incY: Defines the increment for the vector Y
Returns:Scalar value of dot product
proc dotu(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)

Wrapper for DOTU routines (DOTU_SUB)

Obtains the dot product of two complex vectors:

X*Y

Input:

  • X: Input vector
  • Y: Input vector
  • incX: Defines the increment for the vector X
  • incY: Defines the increment for the vector Y
Returns:The complex dot product
proc dotc(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)

Wrapper for DOTC routines (DOTC_SUB)

Obtains the dot product of conjugated X vector with Y vector:

conj(X)*Y

Input:

  • X: Conjugated input vector
  • Y: Input vector
  • incX: Defines the increment for the vector X
  • incY: Defines the increment for the vector Y
Returns:The complex dot product
proc dsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1): real(64)

Wrapper for SDOT routines (DSDOT variant)

Returns the dot product of two real(32) vectors as a real(64), using real(64) precision internally:

X*Y

Input:

  • X: Input vector
  • Y: Input vector
  • incX: Defines the increment for the vector X
  • incY: Defines the increment for the vector Y
Returns:Scalar value of dot product
proc sdsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1): real(32)

Wrapper for SDOT routines (SDSDOT variant)

Returns the dot product of two real(32) vectors as a real(32), using real(64) precision internally:

X*Y

Input:

  • X: Input vector
  • Y: Input vector
  • incX: Defines the increment for the vector X
  • incY: Defines the increment for the vector Y
Returns:Scalar value of dot product
proc nrm2(X: [?D] ?eltType, incX: c_int = 1)

Wrapper for NRM2 routines

Returns the Euclidean norm of vector X:

||X||

Input:

  • X: Input vector
  • incX: Defines the increment for the vector X
Returns:The 2-norm of X vector
proc asum(X: [?D] ?eltType, incX: c_int = 1)

Wrapper for the ASUM routines

Returns the sum of the magnitude values of X elements:

|Re X[1]| + |Im X[1]| + |Re  X[2]| + |Im  X[2]|+ ... + |Re  X[N]| + |Im X[N]|.

Input:

  • X: Input vector
  • incX: Defines the increment for the vector X
Returns:The 1-norm of X vector
proc amax(X: [?D] ?eltType, incX: c_int = 1)

Wrapper for AMAX routines

Returns the index of element in the vector with maximum absolute value.

Input:

  • X: Input vector
  • incX: Defines the increment for the vector X
Returns:The index of maximum absolute value