BLAS¶
Usage
use BLAS;
or
import BLAS;
Submodules
Support for Basic Linear Algebra Subprograms (BLAS) kernel routines.
BLAS (Basic Linear Algebra Subprograms) are the de facto standard routines for low-level matrix, vector, and scalar operations. While netlib provides the official reference version of BLAS, this documentation refers to the MKL BLAS documentation, due to interface similarities.
This module is intended to work with non-distributed dense rectangular
(DefaultRectangular
) arrays.
Compiling with BLAS¶
In order to compile a Chapel program that uses this module, the
BLAS and C_BLAS (C wrappers to BLAS) libraries must be installed on the system.
The paths to both the cblas.h
header file and BLAS library
must be passed to the -I
and -L
compiler arguments. The library name,
typically blas
, must be passed to the -l
argument as well.
The compilation command should look something like this:
chpl -I$PATH_TO_CBLAS_DIR \
-L$PATH_TO_BLAS_LIBS -lblas source.chpl
- BLAS Implementations:
There is a wide range of BLAS implementations available. This module was built and tested with netlib’s C_BLAS, but many other implementations are compatible as well. Using a version of BLAS optimized for the user’s system will yield the best performance.
There are a few known caveats with some popular implementations:
ATLAS
Compilation (linking) requires the additional flag:
-latlas
MKL
Compilation requires the additional flag in order to use the MKL header instead:
--set blasImpl=mkl
OpenBLAS
Compilation may generate warnings about incompatible pointer types, which may be ignored. These warnings are due to the header files of OpenBLAS differing from the reference C_BLAS prototypes for complex arguments by using
float*
anddouble*
pointers, instead ofvoid*
pointers.
Cray LibSci
On Cray systems with the
cray-libsci
module loaded, no compiler flags should be necessary when compiling programs that use BLAS. This is typically loaded by default, but can be manually loaded withmodule load cray-libsci
as well. Chapel programs compiled on Crays utilize thecc
wrapper as the backend compiler, which implicitly links against the libsci library. Therefore, no additional steps are required of the user.
Chapel BLAS API¶
This module provides higher-level wrappers around the BLAS functions. These provide reasonable default values for many less commonly used arguments and determine the appropriate functions to call, based on the array element types, as well as the array dimensions. The other functionality is identical to the corresponding BLAS functions.
The names of these routines are identical to the corresponding BLAS functions,
except that the type prefix is dropped. For instance, gemm
is the
wrapper for the [sdcz]gemm
routines.
The native BLAS interface can still be accessed by calling routines from the
C_BLAS
submodule.
The ldA
argument is omitted from the Chapel BLAS API. Chapel determines the
dimensions of the matrices from the arrays that are passed in, even when one is
passing in a sub-array such that the array elements are not contiguously stored
in memory.
- enum BlasImpl { blas, mkl, off }¶
Available BLAS implementations for
blasImpl
- enum constant blas¶
- enum constant mkl¶
- enum constant off¶
- config param blasImpl = BlasImpl.blas¶
Specifies which header filename to include, based on the BLAS implementation.
Most BLAS implementations rely on
cblas.h
, which is used whenblasImpl = blas
, the default setting.blas
includescblas.h
(default)mkl
includesmkl_cblas.h
off
includes nothing
- config param blasHeader = ""¶
Manually specifies the header filename to include. This flag overrides the header determined by
blasImpl
.This flag should only be necessary if using an
BLAS
implementation with a unique header name that is not supported byblasImpl
. However, no guarantees can be made about this module working with untested implementations.
- proc isBLASType(type t) param : bool¶
Return true if type is supported by BLAS
- enum Order { Row = 101 : c_int, Col }¶
Define row or column order
- enum constant Row = 101 : c_int¶
- enum constant Col¶
- enum Op { N = 111 : c_int, T, H }¶
Operation of matrix : none, transpose, or adjoint
- enum constant N = 111 : c_int¶
- enum constant T¶
- enum constant H¶
- enum Uplo { Upper = 121 : c_int, Lower }¶
Storage for symmetric matrices
- enum constant Upper = 121 : c_int¶
- enum constant Lower¶
- enum Diag { NonUnit = 131 : c_int, Unit }¶
Assume a unit or non-unit diagonal
- enum constant NonUnit = 131 : c_int¶
- enum constant Unit¶
- enum Side { Left = 141 : c_int, Right }¶
Operate on the left or right side
- enum constant Left = 141 : c_int¶
- enum constant Right¶
- proc gemm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, opA: Op = Op.N, opB: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2¶
Wrapper for the GEMM routines:
C := alpha * op(A) * op(B) + beta * C
- proc symm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2¶
Wrapper for the SYMM routines:
C := alpha * A * B + beta * C
or:
C := alpha * B * A + beta * C
where
A
is a symmetric matrix.
- proc hemm(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2¶
Wrapper for the HEMM routines:
C := alpha * A * B + beta * C
or:
C := alpha * B * A + beta * C
where
A
is an hermitian matrix.
- proc syrk(A: [?Adom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Cdom.rank == 2¶
Wrapper for the SYRK routines:
C := alpha * A * A**T + beta * C
or:
C := alpha * A**T * A + beta * C
where
C
is a symmetric matrix, andA**T
is the transpose ofA
.
- proc herk(A: [?Adom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Cdom.rank == 2¶
Wrapper for the HERK routines:
C := alpha * A * A**H + beta * C
or:
C := alpha * A**H * A + beta * C
where
C
is an hermitian matrix, andA**H
is the conjugate transpose ofA
.
- proc syr2k(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2¶
Wrapper for the SYR2K routines:
C := alpha * A * B**T + alpha * B * A**T + beta * C
or:
C := alpha * A**T * B + alpha * B**T * A + beta * C
where
C
is a symmetric matrix, andA**T
andB**T
are the transpose ofA
andB
, respectively.
- proc her2k(A: [?Adom], B: [?Bdom], ref C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row) where Adom.rank == 2 && Bdom.rank == 2 && Cdom.rank == 2¶
Wrapper for the HER2K routines:
C := alpha * A * B**H + conjg(alpha) * B * A**H + beta * C
or:
C := alpha * A**H * B + conjg(alpha) * B**H * A + beta * C
where
C
is an hermitian matrix,B**H
is the conjugate transpose ofB
,conjg(alpha)
is the complex conjugate ofalpha
.
- proc trmm(A: [?Adom] ?eltType, ref B: [?Bdom] eltType, alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws where Adom.rank == 2 && Bdom.rank == 2¶
Wrapper for the TRMM routines:
B := alpha * op(A) * B
or:
B := alpha * B * op(A)
where
A
is a triangular matrix.- Throws:
IllegalArgumentError – When B is a non-square array.
- proc trsm(A: [?Adom], ref B: [?Bdom], alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row) throws where Adom.rank == 2 && Bdom.rank == 2¶
Wrapper for the TRSM routines:
op(A) * X = alpha * B
or:
X * op(A) = alpha * B
where
A
is a triangular matrix.- Throws:
IllegalArgumentError – When B is a non-square array.
- proc gemv(A: [?Adom] ?eltType, x: [?xdom] eltType, ref y: [?ydom] eltType, alpha, beta, opA: Op = Op.N, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && xdom.rank == 1 && ydom.rank == 1¶
Wrapper for the GEMV routines:
y := alpha*op(A)*x + beta*y,
where
A
is anm``x``n
matrix.
- proc ger(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1¶
Wrapper for GER routines:
A := alpha*x*y'+ A
- proc gerc(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1¶
Wrapper for GERC routines:
A := alpha*x*conjg(y') + A
- proc geru(ref A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1) where Adom.rank == 2 && Xdom.rank == 1 && Ydom.rank == 1¶
Wrapper for the GERU routines:
A := alpha*x*y' + A
- proc hemv(A: [?Adom] ?eltType, X: [?vDom] eltType, ref Y: [vDom] eltType, alpha: eltType, beta: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for the HEMV routines:
y := alpha*A*x + beta*y
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc her(ref A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for the HER routines:
A := alpha*x*conjg(x') + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc her2(ref A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for HER2 routines:
A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc symv(A: [?Adom] ?eltType, X: [?vDom] eltType, ref Y: [vDom] eltType, alpha, beta, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for the SYMV routines:
y := alpha*A*x + beta*y
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc syr(ref A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for SYR routines:
A := alpha*x*x' + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc syr2(ref A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for SYR2 routines:
A := alpha*x*y'+ alpha*y*x' + A
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc trmv(A: [?Adom] ?eltType, ref X: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for TRMV routines:
x := op(A)*x
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc trsv(A: [?Adom] ?eltType, ref B: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1) throws where Adom.rank == 2 && vDom.rank == 1¶
Wrapper for the TRSV routines:
A*op(x) = b
- Throws:
IllegalArgumentError – When A is a non-square array.
- proc rotg(ref a: ?eltType, ref b: eltType, ref c: eltType, ref s: eltType)¶
Wrapper for the ROTG routines
Construct Givens plane rotation of point
p
defined by Cartesian coordinates(a, b)
:| c s | |a| |r| | | . | | = | | |-s c | |b| |0|
Input:
a
: x-coordinate of pointp
b
: y-coordinate of pointp
Output:
a
: stores length vector (r
) of inputs(a, b)
b
: storesz
parameter that is defined belowc
: stores value ofc
element defined aboves
: stores value ofs
element defined above
The
z
parameter (stored inb
) is defined such that:if |a| > |b| then z = s; else if c != 0 then z = 1/c; else z = 1
- proc rotmg(ref d1: ?eltType, ref d2: eltType, ref b1: eltType, b2: eltType, ref P: [] eltType) throws¶
Wrapper for the ROTMG routines
Generate Givens rotation of points:
|b1| |b1*sqrt(d1)| | | = H.| | |0 | |b2*sqrt(d2)|
Input:
d1
: Scaling factor forb1
(x-axis)d2
: Scaling factor forb2
(y-axis)b1
: x-coordinate of input vectorb2
: y-coordinate of input vector
Output:
d1
: Provides the first element the diagonal matrixd2
: Provides the second element the diagonal matrixb1
: Provides theb1
rotated (rotated x coordination) of the vectorP
: Parameter array of 5 elements, detailed below:|P[1] P[3]| if P[0] == -1 then H = | | |P[2] P[4]| |1.0 P[3]| if P[0] == 0 then H = | | |P[2] 1.0| |P[1] 1.0| if P[0] == 1 then H = | | |-1.0 P[4]| |1 0| if P[0] == -2 then H = | | |0 1|
- Throws:
IllegalArgumentError – When P does not consist of exactly five elements.
- proc rot(ref X: [?D] ?eltType, ref Y: [D] eltType, c: eltType, s: eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1¶
Wrapper for the ROT routines
Replaces the value elements of two vectors
X
andY
using the equation:X[i] = c*X[i] + s*X[i] Y[i] = c*Y[i] - s*X[i]
Input:
X
: Input vectorY
: Input vectorc
: Scalar parameters
: Scalar parameterincY
: Defines the increment for the vectorY
incX
: Defines the increment for the vectorX
Output:
X
: Vector with updated elementsY
: Vector with updated elements
- proc rotm(ref X: [?D] ?eltType, ref Y: [D] eltType, ref P: [] eltType, incY: c_int = 1, incX: c_int = 1) throws where D.rank == 1¶
Wrapper for the ROTM routines
Executes the modified Givens rotations with element wise substitution:
|X[i]| |X[i]| | | = H.| | |Y[i]| |Y[i]|
Input:
X
: Input vectorY
: Input vectorincY
: Defines the increment for the vectorY
incX
: Defines the increment for the vectorX
P
: Parameter array of 5 elements, detailed below:|P[1] P[3]| if P[0] == -1 then H = | | |P[2] P[4]| |1.0 P[3]| if P[0] == 0 then H = | | |P[2] 1.0| |P[1] 1.0| if P[0] == 1 then H = | | |-1.0 P[4]| |1 0| if P[0] == -2 then H = | | |0 1|
Output:
X
: Vector with updated elementsY
: Vector with updated elements
- Throws:
IllegalArgumentError – When P does not consist of exactly five elements.
- proc swap(ref X: [?D] ?eltType, ref Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1¶
Wrapper for the SWAP routines
Exchanges elements of two vectors.
Input:
X
: Input vectorY
: Input vectorincY
: Defines the increment for the vectorY
incX
: Defines the increment for the vectorX
Output:
X
: Contains inputY
elementsY
: Contains inputX
elements
- proc scal(ref X: [?D] ?eltType, alpha: eltType, incX: c_int = 1) where D.rank == 1¶
Wrapper for the SCAL routines
Calculates the product of a vector
X
with scalar alpha:X' = alpha*X
Input:
X
: Input vectoralpha
: Scalar parameter
Output:
X
: Vector updated by the equation:X[i] = alpha*X[i]
- proc copy(X: [?D] ?eltType, ref Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1¶
Wrapper for the COPY routines
Copies one vector (
X
the source) to another (Y
the destination):Y = X
Input:
X
: Input vectorY
: Input vectorincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Output:
Y
: Contains the values copied fromX
vector
- proc axpy(X: [?D] ?eltType, ref Y: [D] eltType, alpha: eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1¶
Wrapper for the AXPY routines
Computes the vector-scalar product of apha and
X
and adds the result toY
:Y = alpha*X + Y
Input:
X
: Input vectoralpha
: Scalar parameterincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Output:
Y
: Vector updated by the equation:Y[i] = alpha*X[i]+Y[i]
- proc dot(X: [?xD] ?eltType, Y: [?yD] eltType, incY: c_int = 1, incX: c_int = 1) : eltType where xD.rank == 1 && yD.rank == 1¶
Wrapper for DOT routines
Returns the dot product of two vectors:
X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
X
incY – Defines the increment for the vector
Y
- Returns:
Scalar value of dot product
- proc dotu(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1¶
Wrapper for DOTU routines (
DOTU_SUB
)Obtains the dot product of two complex vectors:
X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
X
incY – Defines the increment for the vector
Y
- Returns:
The complex dot product
- proc dotc(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1) where D.rank == 1¶
Wrapper for DOTC routines (
DOTC_SUB
)Obtains the dot product of conjugated X vector with Y vector:
conj(X)*Y
- Arguments:
X – Conjugated input vector
Y – Input vector
incX – Defines the increment for the vector
X
incY – Defines the increment for the vector
Y
- Returns:
The complex dot product
- proc dsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1) : real(64) where D.rank == 1¶
Wrapper for SDOT routines (
DSDOT
variant)Returns the dot product of two
real(32)
vectors as areal(64)
, usingreal(64)
precision internally:X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
X
incY – Defines the increment for the vector
Y
- Returns:
Scalar value of dot product
- proc sdsdot(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1) : real(32) where D.rank == 1¶
Wrapper for SDOT routines (
SDSDOT
variant)Returns the dot product of two
real(32)
vectors as areal(32)
, usingreal(64)
precision internally:X*Y
- Arguments:
X – Input vector
Y – Input vector
incX – Defines the increment for the vector
X
incY – Defines the increment for the vector
Y
- Returns:
Scalar value of dot product
- proc nrm2(X: [?D] ?eltType, incX: c_int = 1) where D.rank == 1¶
Wrapper for NRM2 routines
Returns the Euclidean norm of vector
X
:||X||
- Arguments:
X – Input vector
incX – Defines the increment for the vector
X
- Returns:
The 2-norm of
X
vector
- proc asum(X: [?D] ?eltType, incX: c_int = 1) where D.rank == 1¶
Wrapper for the ASUM routines
Returns the sum of the magnitude values of X elements:
|Re X[1]| + |Im X[1]| + |Re X[2]| + |Im X[2]|+ ... + |Re X[N]| + |Im X[N]|.
- Arguments:
X – Input vector
incX – Defines the increment for the vector
X
- Returns:
The 1-norm of
X
vector