.. default-domain:: chpl

.. _primers-FFTWlib:

FFTW
====

`View FFTWlib.chpl on GitHub <https://github.com/chapel-lang/chapel/blob/main/test/release/examples/primers/FFTWlib.chpl>`_

Example usage of the :mod:`FFTW` module in Chapel. This particular file
demonstrates the single-threaded version of the code. In order to initialize
FFTW for multithreaded support, see the :mod:`FFTW` documentation.

If FFTW is in your standard include/library paths, compile this code
using:

.. code-block:: text

  chpl testFFTW.chpl

Otherwise, use the following (where ``$FFTW_DIR`` points to your
FFTW installation):

.. code-block:: text

  chpl testFFTW.chpl -I$FFTW_DIR/include -L$FFTW_DIR/lib

The :mod:`FFTW` module uses the FFTW3 API and currently just implements the
basic, double-precision interface.  We will assume that the reader
is familiar with using FFTW; more details are at http://www.fftw.org.

The code computes a series of 1D, 2D and 3D transforms, exercising
the complex<->complex and real<->complex transforms (both in- and
out-of-place). The output of the code should be a series of small
numbers ( ``<= 10^-13`` ); see ``testFFTW.good`` for example values, though
it is possible that your values may differ in practice.

The input data for these tests is in ``arr{1,2,3}d.dat``. The format of
these files is documented below.

.. code-block:: chapel

    use FFTW;

Config to print out the error values.  Note that values may differ between
platforms, so turn off for testing.

.. code-block:: chapel

    config const printErrors = true;

If we don't print out error values, we'll print a success/failure
message.  This is the epsilon value against which success should
be measured.

.. code-block:: chapel

    config const epsilon = 10e-13;

Performs the tests and cleanup.

.. code-block:: chapel

    proc main() {
      testAllDims();
      cleanup();
    }

A helper function that invokes the test for each rank.

.. code-block:: chapel

    proc testAllDims() {
      for param d in 1..3 {
        writeln(d, "D");
        runtest(d, "arr"+d:string+"d.dat");
      }
    }

This is the main test code, parametrized by the number of dimensions,
``ndim``.  ``fn`` is the filename of the file that contains the
test data.

.. code-block:: chapel

    proc runtest(param ndim : int, fn : string) {
      var dims : ndim*int(32);

We define a number of different domains below, corresponding to
complex input/output arrays, real (input)->complex (output)
out-of-place arrays, and real<->complex in-place arrays.

The domains are as follows:

  * ``D``: for complex<->complex transforms. This domain is also used
    for the real array in a ``real->complex`` out-of-place transform.

  * ``cD`` : for the complex array in a real<->complex
    out-of-place transform.

  * ``rD`` : for the real array in a real<->complex in-place transform.
    This includes the padding needed by the in-place transform.
    ``D`` is a sub-domain of this, and can be used to extract the real
    array, without padding.

  * ``reD, imD`` : Utility domains that access the real/complex parts
    of the complex array in a real<->complex in-place transform.

.. code-block:: chapel

      var D : domain(ndim);
      var rD,cD,reD,imD : domain(ndim,int,strideKind.any);

Read in the arrays from the file below. ``A`` and ``B`` are the
arrays that will be used in the FFTW calls, while ``goodA`` and
``goodB`` store the true values.

Note that ``goodA`` is all real, allowing us to reuse the same
array for the real<->complex tests.

The file format is as follows (all little-endian):

  - ``[ndim] int32`` : dimensions of the array

  - ``[narr] real64`` : the real components of ``A`` (imaginary components
    are all zero). ``narr = N_1*N_2*..*N_ndim`` (product of dimensions)

  - ``[narr] complex128`` : the components of ``B``
    (stored as real-imaginary pairs).

  The arrays are all stored in row-major order, as in C.

.. code-block:: chapel

      var A,B,goodA,goodB : [D] complex(128);
      {
        use IO;
        var f = open(fn,ioMode.r).reader(deserializer=new binaryDeserializer(endianness.little));
    
        // Read in dimensions
        for d in dims {
          f.read(d);
        }

Set the domain ``D``, handling the different rank cases.

.. code-block:: chapel

        select ndim {
          when 1 do D = 0.. #dims(0);
          when 2 do D = {0.. #dims(0), 0.. #dims(1)};
          when 3 do D = {0.. #dims(0), 0.. #dims(1), 0.. #dims(2)};
        }
    
        // Read in the arrays
        for val in goodA {
          f.read(val.re);
          val.im = 0;
        }
        for val in goodB {
          f.read(val);
        }
        f.close();
        writeln("Data read...");
      }

Now set the remaining domains.

Refer to the FFTW documentation on the storage order for the
in-place transforms (Sec 2.4 and 4.3.4).

.. code-block:: chapel

      select ndim {
        when 1 {
          var ldim = dims(0)/2 + 1;
          //
          // Domains for real FFT
          //
          rD = 0.. #(2*ldim); // Padding to do in-place transforms
          cD = 0.. #ldim;
          //
          // Define domains to extract the real and imaginary parts for
          // in-place transforms
          //
          reD = rD[0..(2*ldim-1) by 2];
          imD = rD[1..(2*ldim-1) by 2];
        }
        when 2 {
          //
          // Domains for real FFT
          //
          var ldim = dims(1)/2+1;
          rD = {0.. #dims(0),0.. #(2*ldim)}; // Padding to do in-place transforms
          cD = {0.. #dims(0),0.. #ldim};
          //
          // Define domains to extract the real and imaginary parts for
          // in-place transforms
          //
          reD = rD[..,0..(2*ldim-1) by 2];
          imD = rD[..,1..(2*ldim-1) by 2];
        }
        when 3 {
          //
          // Domains for real FFT
          //
          var ldim = dims(2)/2+1;
          rD = {0.. #dims(0),0.. #dims(1),0.. #(2*ldim)}; // Padding to do in-place transforms
          cD = {0.. #dims(0),0.. #dims(1),0.. #ldim};
          //
          // Define domains to extract the real and imaginary parts for
          // in-place transforms
          //
          reD = rD[..,..,0..(2*ldim-1) by 2];
          imD = rD[..,..,1..(2*ldim-1) by 2];
        }
      }

FFTW does not normalize inverse transforms, so just compute the
normalization constant.

.. code-block:: chapel

      var norm = * reduce dims;

We start the FFT tests below. The structure is the same :

  - Define plans for forward and reverse transforms.
  - Execute forward transform ``A -> B``.
  - Compare with ``goodB``.
  - Execute reverse transform ``B -> A`` and normalize.
  - Compare with ``goodA``.
  - Cleanup plans.


.. primers-fftw-complex-complex-out

complex<->complex out-of-place transform
----------------------------------------

Unlike the basic FFTW interface, we do not have specific 1D/2D/3D
planner routines.  For the complex <-> complex case, the
dimensions of the array are inferred automatically.

.. code-block:: chapel

      var fwd = plan_dft(A, B, FFTW_FORWARD, FFTW_ESTIMATE);
      var rev = plan_dft(B, A, FFTW_BACKWARD, FFTW_ESTIMATE);
    
      // Test forward and reverse transform
      A = goodA;
      execute(fwd);
      printcmp(B,goodB);
      execute(rev);
      A /= norm;
      printcmp(A,goodA);
      destroy_plan(fwd);
      destroy_plan(rev);

.. primers-fftw-complex-complex-in

complex <-> complex in-place transform
--------------------------------------

This is the same calling sequence as above, but using the
in-place versions of the routine.

.. code-block:: chapel

      fwd = plan_dft(A, FFTW_FORWARD, FFTW_ESTIMATE);
      rev = plan_dft(A, FFTW_BACKWARD, FFTW_ESTIMATE);
    
      // Test forward and reverse transform
      A = goodA;
      execute(fwd);
      printcmp(A,goodB);
      execute(rev);
      A /= norm; // FFTW does an unnormalized transform
      printcmp(A,goodA);
      destroy_plan(fwd);
      destroy_plan(rev);

.. primers-fftw-real-complex-out

real <-> complex out-of-place transform
---------------------------------------

As with FFTW, these use ``r2c`` and ``c2r`` suffixes to define the
direction of the transform.

``plan_dft_r2c`` and ``plan_dft_c2r`` are overloaded; for the
out-of-place transforms, they infer the dimensions from the
sizes of the arrays passed in.

.. code-block:: chapel

      var rA : [D] real(64); // No padding for an out-of-place transform
      var cB : [cD] complex(128);
      fwd = plan_dft_r2c(rA,cB,FFTW_ESTIMATE);
      rev = plan_dft_c2r(cB,rA,FFTW_ESTIMATE);
      rA[D] = goodA.re;
      execute(fwd);
      printcmp(cB,goodB[cD]);
      execute(rev);
      rA /= norm;
      printcmp(rA[D],goodA.re);
      destroy_plan(fwd);
      destroy_plan(rev);

.. primers-fftw-real-complex-in

real <-> complex in-place transform
-----------------------------------

In this case, the first argument to the planning routines is the
domain of the *real* array WITHOUT padding (in both ``r2c`` and
``c2r`` cases). This breaks the ambiguity of whether the leading
dimension of the real array is even or odd.

This design decision was motivated by the fact that the user has
likely already defined a domain to extract the unpadded real
array from the fully padded array (or that such a domain is
intrinsically useful).

For both the ``r2c`` and ``c2r`` transforms, a real array is passed in.

.. code-block:: chapel

      var rA2 : [rD] real(64);
      fwd = plan_dft_r2c(D,rA2,FFTW_ESTIMATE);
      rev = plan_dft_c2r(D,rA2,FFTW_ESTIMATE);
      rA2[D] = goodA.re;
      execute(fwd);
      printcmp(rA2[reD],goodB[cD].re); // Check the real and complex parts separately.
      printcmp(rA2[imD],goodB[cD].im);
      execute(rev);
      rA2 /= norm;
      printcmp(rA2[D],goodA.re);
      destroy_plan(fwd);
      destroy_plan(rev);

This is another real <-> complex in-place transform, except we pass
in a complex array instead of a real array. This can get a little
ugly, so we just reverse engineer the previous case.

Note that we reuse the ``rA2`` and ``cB`` arrays, since they're the
correct sizes.

.. code-block:: chapel

      fwd = plan_dft_r2c(D,cB,FFTW_ESTIMATE);
      rev = plan_dft_c2r(D,cB,FFTW_ESTIMATE);
    
      // Zero out rA2 to ensure that anything extraneous in the padding
      // doesn't get passed in.
      rA2 = 0.0;
      rA2[D] = goodA.re;
      cB.re = rA2[reD]; // Fill the complex array
      cB.im = rA2[imD];
      execute(fwd);
      printcmp(cB.re,goodB[cD].re); // Check the real and complex parts separately.
      printcmp(cB.im,goodB[cD].im);
      execute(rev);
      cB /= norm;
    
      // Pull everything back out to the real array for simplicity
      rA2 = 0.0;
      rA2[reD] = cB.re;
      rA2[imD] = cB.im;
      printcmp(rA2[D],goodA.re);
      destroy_plan(fwd);
      destroy_plan(rev);
    }

Utility function to print the maximum absolute deviation between
values computed by this code, and "truth". The true values are
computed using Mathematica v10.

.. code-block:: chapel

    proc printcmp(x, y) {
      var err = max reduce abs(x-y);
      if (printErrors) then
        writeln(err);
      else {
        if err < epsilon then
          writeln("SUCCESS: error below threshold");
        else
          writeln("FAILURE: error (", err, ") exceeds epsilon (", epsilon, ")");
      }
    }