BLAS
Usage
use BLAS;
or
import BLAS;
Submodules
Support for Basic Linear Algebra Subprograms (BLAS) kernel routines.
BLAS (Basic Linear Algebra Subprograms) are the de facto standard routines for low-level matrix, vector, and scalar operations. While netlib provides the official reference version of BLAS, this documentation refers to the MKL BLAS documentation, due to interface similarities.
This module is intended to work with non-distributed dense rectangular
(DefaultRectangular) arrays.
Compiling with BLAS
In order to compile a Chapel program that uses this module, the
BLAS and C_BLAS (C wrappers to BLAS) libraries must be installed on the system.
The paths to both the cblas.h header file and BLAS library
must be passed to the -I and -L compiler arguments. The library name,
typically blas, must be passed to the -l argument as well.
The compilation command should look something like this:
chpl -I$PATH_TO_CBLAS_DIR \
-L$PATH_TO_BLAS_LIBS -lblas source.chpl
- BLAS Implementations:
There is a wide range of BLAS implementations available. This module was built and tested with netlib’s C_BLAS, but many other implementations are compatible as well. Using a version of BLAS optimized for the user’s system will yield the best performance.
There are a few known caveats with some popular implementations:
ATLAS
Compilation (linking) requires the additional flag:
-latlas
MKL
Compilation requires the additional flag in order to use the MKL header instead:
--set blasImpl=mkl
OpenBLAS
Compilation may generate warnings about incompatible pointer types, which may be ignored. These warnings are due to the header files of OpenBLAS differing from the reference C_BLAS prototypes for complex arguments by using
float*anddouble*pointers, instead ofvoid*pointers.
Cray LibSci
On Cray systems with the
cray-libscimodule loaded, no compiler flags should be necessary when compiling programs that use BLAS. This is typically loaded by default, but can be manually loaded withmodule load cray-libscias well. Chapel programs compiled on Crays utilize theccwrapper as the backend compiler, which implicitly links against the libsci library. Therefore, no additional steps are required of the user.
Chapel BLAS API
This module provides higher-level wrappers around the BLAS functions. These provide reasonable default values for many less commonly used arguments and determine the appropriate functions to call, based on the array element types, as well as the array dimensions. The other functionality is identical to the corresponding BLAS functions.
The names of these routines are identical to the corresponding BLAS functions,
except that the type prefix is dropped. For instance, gemm is the
wrapper for the [sdcz]gemm routines.
The native BLAS interface can still be accessed by calling routines from the
C_BLAS submodule.
The ldA argument is omitted from the Chapel BLAS API. Chapel determines the
dimensions of the matrices from the arrays that are passed in, even when one is
passing in a sub-array such that the array elements are not contiguously stored
in memory.
- enum BlasImpl { blas, mkl, off }
Available BLAS implementations for
blasImpl- enum constant blas
- enum constant mkl
- enum constant off
- config param blasImpl = BlasImpl.blas
Specifies which header filename to include, based on the BLAS implementation.
Most BLAS implementations rely on
cblas.h, which is used whenblasImpl = blas, the default setting.blasincludescblas.h(default)mklincludesmkl_cblas.hoffincludes nothing
- config param blasHeader = ""
Manually specifies the header filename to include. This flag overrides the header determined by
blasImpl.This flag should only be necessary if using an
BLASimplementation with a unique header name that is not supported byblasImpl. However, no guarantees can be made about this module working with untested implementations.
- proc isBLASType(type t) param : bool
Return true if type is supported by BLAS
- enum Order { Row = 101: c_int, Col }
Define row or column order
- enum constant Row = 101: c_int
- enum constant Col
- enum Op { N = 111: c_int, T, H }
Operation of matrix : none, transpose, or adjoint
- enum constant N = 111: c_int
- enum constant T
- enum constant H
- enum Uplo { Upper = 121: c_int, Lower }
Storage for symmetric matrices
- enum constant Upper = 121: c_int
- enum constant Lower
- enum Diag { NonUnit = 131: c_int, Unit }
Assume a unit or non-unit diagonal
- enum constant NonUnit = 131: c_int
- enum constant Unit
- enum Side { Left = 141: c_int, Right }
Operate on the left or right side
- enum constant Left = 141: c_int
- enum constant Right
- proc gemm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, opA: Op = Op.N, opB: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2
Wrapper for the GEMM routines:
C := alpha * op(A) * op(B) + beta * C
- proc symm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2
Wrapper for the SYMM routines:
C := alpha * A * B + beta * C
or:
C := alpha * B * A + beta * C
where
Ais a symmetric matrix.
- proc hemm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2
Wrapper for the HEMM routines:
C := alpha * A * B + beta * C
or:
C := alpha * B * A + beta * C
where
Ais an hermitian matrix.
- proc syrk(A: [?Adom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Cdom.rank == 2
Wrapper for the SYRK routines:
C := alpha * A * A**T + beta * C
or:
C := alpha * A**T * A + beta * C
where
Cis a symmetric matrix, andA**Tis the transpose ofA.
- proc herk(A: [?Adom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Cdom.rank == 2
Wrapper for the HERK routines:
C := alpha * A * A**H + beta * C
or:
C := alpha * A**H * A + beta * C
where
Cis an hermitian matrix, andA**His the conjugate transpose ofA.
- proc syr2k(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2
Wrapper for the SYR2K routines:
C := alpha * A * B**T + alpha * B * A**T + beta * C
or:
C := alpha * A**T * B + alpha * B**T * A + beta * C
where
Cis a symmetric matrix, andA**TandB**Tare the transpose ofAandB, respectively.
- proc her2k(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2
Wrapper for the HER2K routines:
C := alpha * A * B**H + conjg(alpha) * B * A**H + beta * C
or:
C := alpha * A**H * B + conjg(alpha) * B**H * A + beta * C
where
Cis an hermitian matrix,B**His the conjugate transpose ofB,conjg(alpha)is the complex conjugate ofalpha.
- proc trmm(A: [?Adom] ?eltType, ref B: [?Bdom] eltType, alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws where Adom.rank == 2 && Bdom.rank == 2
Wrapper for the TRMM routines:
B := alpha * op(A) * B
or:
B := alpha * B * op(A)
where
Ais a triangular matrix.- Throws:
IllegalArgumentError – When B is a non-square array.
- proc trsm(A: [?Adom], ref B: [?Bdom], alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws where Adom.rank == 2 && Bdom.rank == 2
Wrapper for the TRSM routines:
op(A) * X = alpha * B
or:
X * op(A) = alpha * B
where
Ais a triangular matrix.- Throws:
IllegalArgumentError – When B is a non-square array.
- proc gemv(A: [?Adom] ?eltType, x: [?xdom] eltType, ref y: [?ydom] eltType, alpha, beta, opA: Op = Op.N, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && xdom.rank == 1 && ydom.rank == 1
Wrapper for the GEMV routines:
y := alpha*op(A)*x + beta*y,
where
Ais anm``x``nmatrix.
- proc ger(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1
Wrapper for GER routines:
A := alpha*x*y'+ A
- proc gerc(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1
Wrapper for GERC routines:
A := alpha*x*conjg(y') + A
- proc geru(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1
Wrapper for the GERU routines:
A := alpha*x*y' + A
- proc hemv(A: [?Adom] ?eltType, X: [?vDom] eltType, ref Y: [vDom] eltType, alpha: eltType, beta: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for the HEMV routines:
y := alpha*A*x + beta*y
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc her(ref A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for the HER routines:
A := alpha*x*conjg(x') + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc her2(ref A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for HER2 routines:
A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc symv(A: [?Adom] ?eltType, X: [?vDom] eltType, ref Y: [vDom] eltType, alpha, beta, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for the SYMV routines:
y := alpha*A*x + beta*y
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc syr(ref A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for SYR routines:
A := alpha*x*x' + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc syr2(ref A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for SYR2 routines:
A := alpha*x*y'+ alpha*y*x' + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc trmv(A: [?Adom] ?eltType, ref X: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for TRMV routines:
x := op(A)*x
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc trsv(A: [?Adom] ?eltType, ref B: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1
Wrapper for the TRSV routines:
A*op(x) = b
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc rotg(ref a: ?eltType, ref b: eltType, ref c: eltType, ref s: eltType)
Wrapper for the ROTG routines
Construct Givens plane rotation of point
pdefined by Cartesian coordinates(a, b):| c s | |a| |r| | | . | | = | | |-s c | |b| |0|
Input:
a: x-coordinate of pointpb: y-coordinate of pointp
Output:
a: stores length vector (r) of inputs(a, b)b: storeszparameter that is defined belowc: stores value ofcelement defined aboves: stores value ofselement defined above
The
zparameter (stored inb) is defined such that:if |a| > |b| then z = s; else if c != 0 then z = 1/c; else z = 1
- proc rotmg(ref d1: ?eltType, ref d2: eltType, ref b1: eltType, b2: eltType, ref P: [] eltType) throws
Wrapper for the ROTMG routines
Generate Givens rotation of points:
|b1| |b1*sqrt(d1)| | | = H.| | |0 | |b2*sqrt(d2)|
Input:
d1: Scaling factor forb1(x-axis)d2: Scaling factor forb2(y-axis)b1: x-coordinate of input vectorb2: y-coordinate of input vector
Output:
d1: Provides the first element the diagonal matrixd2: Provides the second element the diagonal matrixb1: Provides theb1rotated (rotated x coordination) of the vectorP: Parameter array of 5 elements, detailed below:|P[1] P[3]| if P[0] == -1 then H = | | |P[2] P[4]| |1.0 P[3]| if P[0] == 0 then H = | | |P[2] 1.0| |P[1] 1.0| if P[0] == 1 then H = | | |-1.0 P[4]| |1 0| if P[0] == -2 then H = | | |0 1|
- Throws:
IllegalArgumentError – When P does not consist of exactly five elements.
- proc rot(ref X: [?D] ?eltType, ref Y: [D] eltType, c: eltType, s: eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1
Wrapper for the ROT routines
Replaces the value elements of two vectors
XandYusing the equation:X[i] = c*X[i] + s*X[i] Y[i] = c*Y[i] - s*X[i]
Input:
X: Input vectorY: Input vectorc: Scalar parameters: Scalar parameterincY: Defines the increment for the vectorYincX: Defines the increment for the vectorX
Output:
X: Vector with updated elementsY: Vector with updated elements
- proc rotm(ref X: [?D] ?eltType, ref Y: [D] eltType, ref P: [] eltType, incY: c_int = 1, incX: c_int = 1) throws where D.rank == 1
Wrapper for the ROTM routines
Executes the modified Givens rotations with element wise substitution:
|X[i]| |X[i]| | | = H.| | |Y[i]| |Y[i]|
Input:
X: Input vectorY: Input vectorincY: Defines the increment for the vectorYincX: Defines the increment for the vectorXP: Parameter array of 5 elements, detailed below:|P[1] P[3]| if P[0] == -1 then H = | | |P[2] P[4]| |1.0 P[3]| if P[0] == 0 then H = | | |P[2] 1.0| |P[1] 1.0| if P[0] == 1 then H = | | |-1.0 P[4]| |1 0| if P[0] == -2 then H = | | |0 1|
Output:
X: Vector with updated elementsY: Vector with updated elements
- Throws:
IllegalArgumentError – When P does not consist of exactly five elements.
- proc swap(ref X: [?D] ?eltType, ref Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1
Wrapper for the SWAP routines
Exchanges elements of two vectors.
Input:
X: Input vectorY: Input vectorincY: Defines the increment for the vectorYincX: Defines the increment for the vectorX
Output:
X: Contains inputYelementsY: Contains inputXelements
- proc scal(ref X: [?D] ?eltType, alpha: eltType, incX: c_int = 1) where D.rank == 1
Wrapper for the SCAL routines
Calculates the product of a vector
Xwith scalar alpha:X' = alpha*X
Input:
X: Input vectoralpha: Scalar parameter
Output:
X: Vector updated by the equation:X[i] = alpha*X[i]
- proc copy(X: [?D] ?eltType, ref Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1
Wrapper for the COPY routines
Copies one vector (
Xthe source) to another (Ythe destination):Y = X
Input:
X: Input vectorY: Input vectorincX: Defines the increment for the vectorXincY: Defines the increment for the vectorY
Output:
Y: Contains the values copied fromXvector
- proc axpy(X: [?D] ?eltType, ref Y: [D] eltType, alpha: eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1
Wrapper for the AXPY routines
Computes the vector-scalar product of apha and
Xand adds the result toY:Y = alpha*X + Y
Input:
X: Input vectoralpha: Scalar parameterincX: Defines the increment for the vectorXincY: Defines the increment for the vectorY
Output:
Y: Vector updated by the equation:Y[i] = alpha*X[i]+Y[i]
- proc dot(X: [?xD] ?eltType, Y: [?yD] eltType, incY: c_int = 1, incX: c_int = 1) : eltType where xD.rank == 1 && yD.rank == 1
Wrapper for DOT routines
Returns the dot product of two vectors:
X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
XincY – Defines the increment for the vector
Y
- Returns:
Scalar value of dot product
- proc dotu(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1
Wrapper for DOTU routines (
DOTU_SUB)Obtains the dot product of two complex vectors:
X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
XincY – Defines the increment for the vector
Y
- Returns:
The complex dot product
- proc dotc(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1
Wrapper for DOTC routines (
DOTC_SUB)Obtains the dot product of conjugated X vector with Y vector:
conj(X)*Y
- Arguments:
X – Conjugated input vector
Y – Input vector
incX – Defines the increment for the vector
XincY – Defines the increment for the vector
Y
- Returns:
The complex dot product
- proc dsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1) : real(64) where D.rank == 1
Wrapper for SDOT routines (
DSDOTvariant)Returns the dot product of two
real(32)vectors as areal(64), usingreal(64)precision internally:X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
XincY – Defines the increment for the vector
Y
- Returns:
Scalar value of dot product
- proc sdsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1) : real(32) where D.rank == 1
Wrapper for SDOT routines (
SDSDOTvariant)Returns the dot product of two
real(32)vectors as areal(32), usingreal(64)precision internally:X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
XincY – Defines the increment for the vector
Y
- Returns:
Scalar value of dot product
- proc nrm2(X: [?D] ?eltType, incX: c_int = 1) where D.rank == 1
Wrapper for NRM2 routines
Returns the Euclidean norm of vector
X:||X||
- Arguments:
X – Input vector
incX – Defines the increment for the vector
X
- Returns:
The 2-norm of
Xvector
- proc asum(X: [?D] ?eltType, incX: c_int = 1) where D.rank == 1
Wrapper for the ASUM routines
Returns the sum of the magnitude values of X elements:
|Re X[1]| + |Im X[1]| + |Re X[2]| + |Im X[2]|+ ... + |Re X[N]| + |Im X[N]|.
- Arguments:
X – Input vector
incX – Defines the increment for the vector
X
- Returns:
The 1-norm of
Xvector