BLAS¶
Usage
use BLAS;
Submodules
Support for Basic Linear Algebra Subprograms (BLAS) kernel routines.
BLAS (Basic Linear Algebra Subprograms) are the de facto standard routines for low-level matrix, vector, and scalar operations. While netlib provides the official reference version of BLAS, this documentation refers to the MKL BLAS documentation, due to interface similarities.
This module is intended to work with non-distributed dense rectangular arrays.
Compiling with BLAS¶
In order to compile a Chapel program that uses this module, the
BLAS and C_BLAS (C wrappers to BLAS) libraries must be installed on the system.
The paths to both the cblas.h
header file and BLAS library
must be passed to the -I
and -L
compiler arguments. The library name,
typically blas
, must be passed to the -l
argument as well.
The compilation command should look something like this:
chpl -I$PATH_TO_CBLAS_DIR \
-L$PATH_TO_BLAS_LIBS -lblas source.chpl
- BLAS Implementations:
There is a wide range of BLAS implementations available. This module was built and tested with netlib's C_BLAS, but many other implementations are compatible as well. Using a version of BLAS optimized for the user's system will yield the best performance.
There are a few known caveats with some popular implementations:
- ATLAS
- The compilation command above likely requires the additional flag:
-latlas
- The compilation command above likely requires the additional flag:
- MKL
- Compile with
isBLAS_MKL
if using MKL BLAS.
- Compile with
- OpenBLAS
- The header files that are included with OpenBLAS differ from the reference
C_BLAS prototypes for complex arguments by using
float*
anddouble*
pointers, instead ofvoid*
pointers. Using this will likely result in warnings about incompatible pointer types. These may be ignored.
- The header files that are included with OpenBLAS differ from the reference
C_BLAS prototypes for complex arguments by using
- ATLAS
- Cray Systems:
- No compiler flags should be necessary when compiling BLAS programs on
Crays. The CrayBLAS implementation is made available through Cray's libsci,
which comes installed on all Cray systems. This is typically loaded by
default, but can be manually loaded with
module load cray-libsci
as well. Chapel programs compiled on Crays utilize thecc
wrapper as the backend compiler, which implicitly links against the libsci library. Therefore, no additional steps are required of the user.
Chapel BLAS API¶
This module provides higher-level wrappers around the BLAS functions. These provide reasonable default values for many less commonly used arguments and determine the appropriate functions to call, based on the array element types, as well as the array dimensions. The other functionality is identical to the corresponding BLAS functions.
The names of these routines are identical to the corresponding BLAS functions,
except that the type prefix is dropped. For instance, gemm
is the
wrapper for the [sdcz]gemm
routines.
The native BLAS interface can still be accessed by calling routines from the
C_BLAS
submodule.
The ldA
argument is omitted from the Chapel BLAS API. Chapel determines the
dimensions of the matrices from the arrays that are passed in, even when one is
passing in a sub-array such that the array elements are not contiguously stored
in memory.
Warning
The CHPL_LOCALE_MODEL=numa
configuration is currently not supported by
this module.
-
config param
isBLAS_MKL
= false¶ Tells the BLAS module to look for
mkl_cblas.h
instead ofcblas.h
. Set this to true if you are using the Intel MKL BLAS implementation.
-
proc
isBLASType
(type t) param: bool¶ Return true if type is supported by BLAS
-
enum
Order
{ Row = 101: c_int, Col }¶ Define row or column order
-
enum
Op
{ N = 111: c_int, T, H }¶ Operation of matrix : none, transpose, or adjoint
-
enum
Uplo
{ Upper = 121: c_int, Lower }¶ Storage for symmetric matrices
-
enum
Diag
{ NonUnit = 131: c_int, Unit }¶ Assume a unit or non-unit diagonal
-
enum
Side
{ Left = 141: c_int, Right }¶ Operate on the left or right side
-
proc
gemm
(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, opA: Op = Op.N, opB: Op = Op.N, order: Order = Order.Row)¶ Wrapper for the GEMM routines:
C := alpha * op(A) * op(B) + beta * C
-
proc
symm
(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row)¶ Wrapper for the SYMM routines:
C := alpha * A * B + beta * C
or:
C := alpha * B * A + beta * C
where
A
is a symmetric matrix.
-
proc
hemm
(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, side: Side = Side.Left, order: Order = Order.Row)¶ Wrapper for the HEMM routines:
C := alpha * A * B + beta * C
or:
C := alpha * B * A + beta * C
where
A
is an hermitian matrix.
-
proc
syrk
(A: [?Adom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)¶ Wrapper for the SYRK routines:
C := alpha * A * A**T + beta * C
or:
C := alpha * A**T * A + beta * C
where
C
is a symmetric matrix, andA**T
is the transpose ofA
.
-
proc
herk
(A: [?Adom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)¶ Wrapper for the HERK routines:
C := alpha * A * A**H + beta * C
or:
C := alpha * A**H * A + beta * C
where
C
is an hermitian matrix, andA**H
is the conjugate transpose ofA
.
-
proc
syr2k
(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)¶ Wrapper for the SYR2K routines:
C := alpha * A * B**T + alpha * B * A**T + beta * C
or:
C := alpha * A**T * B + alpha * B**T * A + beta * C
where
C
is a symmetric matrix, andA**T
andB**T
are the transpose ofA
andB
, respectively.
-
proc
her2k
(A: [?Adom], B: [?Bdom], C: [?Cdom], alpha, beta, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, order: Order = Order.Row)¶ Wrapper for the HER2K routines:
C := alpha * A * B**H + conjg(alpha) * B * A**H + beta * C
or:
C := alpha * A**H * B + conjg(alpha) * B**H * A + beta * C
where
C
is an hermitian matrix,B**H
is the conjugate transpose ofB
,conjg(alpha)
is the complex conjugate ofalpha
.
-
proc
trmm
(A: [?Adom] ?eltType, B: [?Bdom] eltType, alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row)¶ Wrapper for the TRMM routines:
B := alpha * op(A) * B
or:
B := alpha * B * op(A)
where
A
is a triangular matrix.
-
proc
trsm
(A: [?Adom], B: [?Bdom], alpha, uplo: Uplo = Uplo.Upper, trans: Op = Op.N, side: Side = Side.Left, diag: Diag = Diag.NonUnit, order: Order = Order.Row)¶ Wrapper for the TRSM routines:
op(A) * X = alpha * B
or:
X * op(A) = alpha * B
where
A
is a triangular matrix.
-
proc
gemv
(A: [?Adom] ?eltType, x: [?xdom] eltType, y: [?ydom] eltType, alpha, beta, opA: Op = Op.N, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for the GEMV routines:
y := alpha*op(A)*x + beta*y,
where
A
is anm``x``n
matrix.
-
proc
ger
(A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, alpha, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for GER routines:
A := alpha*x*y'+ A
-
proc
gerc
(A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, ref alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for GERC routines:
A := alpha*x*conjg(y') + A
-
proc
geru
(A: [?Adom] ?eltType, X: [?Xdom] eltType, Y: [?Ydom] eltType, ref alpha: eltType, order: Order = Order.Row, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for the GERU routines:
A := alpha*x*y' + A
-
proc
hemv
(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, ref alpha: eltType, ref beta: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for the HEMV routines:
y := alpha*A*x + beta*y
-
proc
her
(A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1)¶ Wrapper for the HER routines:
A := alpha*x*conjg(x') + A
-
proc
her2
(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, ref alpha: eltType, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for HER2 routines:
A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A
-
proc
symv
(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha, beta, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for the SYMV routines:
y := alpha*A*x + beta*y
-
proc
syr
(A: [?Adom] ?eltType, X: [?vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1)¶ Wrapper for SYR routines:
A := alpha*x*x' + A
-
proc
syr2
(A: [?Adom] ?eltType, X: [?vDom] eltType, Y: [vDom] eltType, alpha, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, incx: c_int = 1, incy: c_int = 1)¶ Wrapper for SYR2 routines:
A := alpha*x*y'+ alpha*y*x' + A
-
proc
trmv
(A: [?Adom] ?eltType, X: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1)¶ Wrapper for TRMV routines:
x := op(A)*x
-
proc
trsv
(A: [?Adom] ?eltType, B: [?vDom] eltType, trans: Op = Op.N, order: Order = Order.Row, uplo: Uplo = Uplo.Upper, diag: Diag = Diag.NonUnit, incx: c_int = 1)¶ Wrapper for the TRSV routines:
A*op(x) = b
-
proc
rotg
(ref a: ?eltType, ref b: eltType, ref c: eltType, ref s: eltType)¶ Wrapper for the ROTG routines
Construct Givens plane rotation of point
p
defined by Cartesian coordinates(a, b)
:| c s | |a| |r| | | . | | = | | |-s c | |b| |0|
Input:
a
: x-coordinate of pointp
b
: y-coordinate of pointp
Output:
a
: stores length vector (r
) of inputs(a, b)
b
: storesz
parameter that is defined belowc
: stores value ofc
element defined aboves
: stores value ofs
element defined above
The
z
parameter (stored inb
) is defined such that:if |a| > |b| then z = s; else if c != 0 then z = 1/c; else z = 1
-
proc
rotmg
(ref d1: ?eltType, ref d2: eltType, ref b1: eltType, b2: eltType, P: [] eltType)¶ Wrapper for the ROTMG routines
Generate Givens rotation of points:
|b1| |b1*sqrt(d1)| | | = H.| | |0 | |b2*sqrt(d2)|
Input:
d1
: Scaling factor forb1
(x-axis)d2
: Scaling factor forb2
(y-axis)b1
: x-coordinate of input vectorb2
: y-coordinate of input vector
Output:
d1
: Provides the first element the diagonal matrixd2
: Provides the second element the diagonal matrixb1
: Provides theb1
rotated (rotated x coordination) of the vectorP
: Parameter array of 5 elements, detailed below:|P[1] P[3]| if P[0] == -1 then H = | | |P[2] P[4]| |1.0 P[3]| if P[0] == 0 then H = | | |P[2] 1.0| |P[1] 1.0| if P[0] == 1 then H = | | |-1.0 P[4]| |1 0| if P[0] == -2 then H = | | |0 1|
-
proc
rot
(X: [?D] ?eltType, Y: [D] eltType, c: eltType, s: eltType, incY: c_int = 1, incX: c_int = 1)¶ Wrapper for the ROT routines
Replaces the value elements of two vectors
X
andY
using the equation:X[i] = c*X[i] + s*X[i] Y[i] = c*Y[i] - s*X[i]
Input:
X
: Input vectorY
: Input vectorc
: Scalar parameters
: Scalar parameterincY
: Defines the increment for the vectorY
incX
: Defines the increment for the vectorX
Output:
X
: Vector with updated elementsY
: Vector with updated elements
-
proc
rotm
(X: [?D] ?eltType, Y: [D] eltType, P: [] eltType, incY: c_int = 1, incX: c_int = 1)¶ Wrapper for the ROTM routines
Executes the modified Givens rotations with element wise substitution:
|X[i]| |X[i]| | | = H.| | |Y[i]| |Y[i]|
Input:
X
: Input vectorY
: Input vectorincY
: Defines the increment for the vectorY
incX
: Defines the increment for the vectorX
P
: Parameter array of 5 elements, detailed below:|P[1] P[3]| if P[0] == -1 then H = | | |P[2] P[4]| |1.0 P[3]| if P[0] == 0 then H = | | |P[2] 1.0| |P[1] 1.0| if P[0] == 1 then H = | | |-1.0 P[4]| |1 0| if P[0] == -2 then H = | | |0 1|
Output:
X
: Vector with updated elementsY
: Vector with updated elements
-
proc
swap
(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)¶ Wrapper for the SWAP routines
Exchanges elements of two vectors.
Input:
X
: Input vectorY
: Input vectorincY
: Defines the increment for the vectorY
incX
: Defines the increment for the vectorX
Output:
X
: Contains inputY
elementsY
: Contains inputX
elements
-
proc
scal
(X: [?D] ?eltType, ref alpha: eltType, incX: c_int = 1)¶ Wrapper for the SCAL routines
Calculates the product of a vector
X
with scalar alpha:X' = alpha*X
Input:
X
: Input vectoralpha
: Scalar parameter
Output:
X
: Vector updated by the equation:X[i] = alpha*X[i]
-
proc
copy
(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)¶ Wrapper for the COPY routines
Copies one vector (
X
the source) to another (Y
the destination):Y = X
Input:
X
: Input vectorY
: Input vectorincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Output:
Y
: Contains the values copied fromX
vector
-
proc
axpy
(X: [?D] ?eltType, Y: [D] eltType, ref alpha: eltType, incY: c_int = 1, incX: c_int = 1)¶ Wrapper for the AXPY routines
Computes the vector-scalar product of apha and
X
and adds the result toY
:Y = alpha*X + Y
Input:
X
: Input vectoralpha
: Scalar parameterincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Output:
Y
: Vector updated by the equation:Y[i] = alpha*X[i]+Y[i]
-
proc
dot
(X: [?xD] ?eltType, Y: [?yD] eltType, incY: c_int = 1, incX: c_int = 1): eltType¶ Wrapper for DOT routines
Returns the dot product of two vectors:
X*Y
Input:
X
: Input vectorY
: Input vectorincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Returns: Scalar value of dot product
-
proc
dotu
(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)¶ Wrapper for DOTU routines (
DOTU_SUB
)Obtains the dot product of two complex vectors:
X*Y
Input:
X
: Input vectorY
: Input vectorincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Returns: The complex dot product
-
proc
dotc
(X: [?D] ?eltType, Y: [D] eltType, incY: c_int = 1, incX: c_int = 1)¶ Wrapper for DOTC routines (
DOTC_SUB
)Obtains the dot product of conjugated X vector with Y vector:
conj(X)*Y
Input:
X
: Conjugated input vectorY
: Input vectorincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Returns: The complex dot product
-
proc
dsdot
(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1): real(64)¶ Wrapper for SDOT routines (
DSDOT
variant)Returns the dot product of two
real(32)
vectors as areal(64)
, usingreal(64)
precision internally:X*Y
Input:
X
: Input vectorY
: Input vectorincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Returns: Scalar value of dot product
-
proc
sdsdot
(X: [?D] real(32), Y: [D] real(32), incY: c_int = 1, incX: c_int = 1): real(32)¶ Wrapper for SDOT routines (
SDSDOT
variant)Returns the dot product of two
real(32)
vectors as areal(32)
, usingreal(64)
precision internally:X*Y
Input:
X
: Input vectorY
: Input vectorincX
: Defines the increment for the vectorX
incY
: Defines the increment for the vectorY
Returns: Scalar value of dot product
-
proc
nrm2
(X: [?D] ?eltType, incX: c_int = 1)¶ Wrapper for NRM2 routines
Returns the Euclidean norm of vector
X
:||X||
Input:
X
: Input vectorincX
: Defines the increment for the vectorX
Returns: The 2-norm of X
vector
-
proc
asum
(X: [?D] ?eltType, incX: c_int = 1)¶ Wrapper for the ASUM routines
Returns the sum of the magnitude values of X elements:
|Re X[1]| + |Im X[1]| + |Re X[2]| + |Im X[2]|+ ... + |Re X[N]| + |Im X[N]|.
Input:
X
: Input vectorincX
: Defines the increment for the vectorX
Returns: The 1-norm of X
vector