Module ``index_map_type`` ========================= The elements of a Fortran array are accessed by an *index*, which is an integer in a contiguous range that usually starts at 1. To describe the distribution of array elements across one or more parallel processes, it suffices to describe how its index set is distributed across, or mapped to, processes; the type of the array data itself is irrelevant. As its name suggests, the ``index_map_type`` module defines the ``index_map`` derived type which describes the mapping of a contiguous index set to one or more parallel processes. The mapping allows for overlap between processes, and provides array data gather and scatter procedures that are associated with that overlap. The derived type also provides procedures for distributing and collating array data based on the index set, and procedures for localizing serial indirect indexing arrays. Concepts -------- A global index set is a 1-based contiguous range of integers, {1, 2, ..., N}. Given *P* processes, an index is assigned to a unique process according to a block partition of the index set: The first *N_1* indices are assigned to process 1, the next *N_2* indices to process 2, and so forth. The sum of the *block sizes* *N_1 + N_2 + ... + N_P = N*, and a block size of 0 is allowed. An index so assigned to a process is said to be *owned* by that process. In addition to these, a process may include indices owned by other processes. The owned indices are said to be *on-process*, and these additional indices, if any, are said to be *off-process*. It is important to note that a global index is owned by a *unique* process; this is in contrast to other schemes (e.g. Trilinos/Tpetra) where a global index may be owned by multiple processes. **Process-local index sets.** For the purposes of process-local computation, the collection of all global indices known to a process are mapped to a 1-based, contiguous, *local index set* as follows. The contiguous block of on-process global indices are assigned, in order, consecutive local index numbers starting at 1, and the off-process indices are assigned, in order, consecutive local index numbers immediately following these. The ``index_map`` Derived Type ------------------------------ An ``index_map`` type variable is a distributed parallel object. Except for the type-bound function ``global_index``, all type-bound procedures must be called collectively by all processes. There are two different implementations of the derived type: one using MPI and the other using Fortran coarrays (CAF). With just a couple exceptions, both have exactly the same interface. The distribute and collate procedures involve a single distinguished process called the *root* process. By default this is MPI rank 0 (or CAF image 1). Although there seems little reason to choose a different rank (or image), one can, by giving a different value for the optional ``root`` argument to the ``init`` procedure. The next sections describe the public components and type-bound procedures of the type. Initialization ^^^^^^^^^^^^^^ A variable .. code-block:: fortran type(index_map) :: imap must be initialized before being used by calling its ``init`` method. There are several possible versions. All are collective procedures that must be called by all processes. Most ``init`` versions take an MPI communicator ``comm`` as an optional argument. If not specified, ``MPI_COMM_WORLD`` is used. This communicator is used in all internal MPI calls. The ``comm`` argument is omitted in the CAF implementation. 1. ``call imap%init(bsize [,comm] [,root])`` Each process gives its own block size using the scalar integer ``bsize``. A block size of 0 is allowed. The resulting ``imap`` includes no off-process indices. These can be added later using the ``add_offp_index`` method. 2. ``call imap%init(bsizes [,comm] [,root])`` The root process gives the block sizes for all processes using the rank-1 integer array ``bsizes`` whose size equals the number of processes. A block size of 0 is allowed. ``bsize`` is only accessed on the root process; other processes may pass a 0-sized array. The resulting ``imap`` includes no off-process indices. These can be added later using the ``add_offp_index`` method. 3. ``call imap%init(bsize, offp_index [,comm] [,root])`` In this extension of version 1, each process also gives a list of its off-process global indices using the rank-1 integer array ``offp_index``. This will be a 0-sized array if the process has no off-process indices. The global indices given in ``offp_index`` must belong to the global index set as defined by the collective values of ``bsize``, and must not be on-process indices for the process. 4. ``call imap%init(bsizes, offp_counts, offp_indices [,comm] [,root])`` In this extension of version 2, the root process also gives lists of off-process global indices to include for all process using the rank-1 integer arrays ``offp_counts`` and ``offp_indices``. The size of ``offp_counts`` must equal the number of processes. ``offp_counts(n)`` is the number of off-process indices for the nth process; a value of 0 is allowed. The off-process indices are stored in the ``off-process-indices`` array in order of process, so that its size equals the sum of the elements of ``offp_counts``. The global off-process indices are subject to the same requirements as for version 3. The subroutine arguments are only accessed on the root process; other processes may pass 0-sized arrays. 5. ``call imap%init(domain, g_count)`` This is a specialized version associated with ragged indexing arrays. ``domain`` is an ``index_map`` type variable and ``g_count`` is a rank-1 integer array. Both are ``intent(in)``. ``g_count`` is only referenced on the root process of ``domain``, where its size must be at least ``domain%global_size``; other processes can pass a 0-sized array. The global index set for ``imap`` will be derived from the ``g_count`` array: starting with 1, the first consecutive ``g_count(1)`` integers derive from ``g_count(1)``, the next consecutive ``g_count(2)`` integers derive from ``g_count(2)``, and so forth. The size of this global index set is thus the sum of the first ``domain%global_size`` elements of ``g_count``. ``imap`` is initialized with the same MPI communicator as ``domain`` and with the sum of the on-process elements of ``g_count`` (were it distributed according to ``domain``) as the block size for a process. If ``domain`` includes off-process indices, these give rise to off-process indices in ``imap``: the global indices that derive from off-process ``g_count`` elements on this process are added as off-process indices to ``imap``. Data Components ^^^^^^^^^^^^^^^ An initialized ``index_map`` variable has the following public data components. These are **read-only** and must **never** be modified by client code. It will be possible to have the compiler enforce this in the future by use of the ``protected`` attribute being introduced in the upcoming F202X standard. * ``%onp_size``: the number of on-process indices for this process * ``%offp_size``: the number of off-process indices for this process * ``%local_size``: the size of the local index set for this process. This is the sum of ``%onp_size`` and ``%offp_size`` * ``%global_size``: The size of the global index set. * ``%first_gid``: the global index of the first on-process index for this process * ``%last_gid``: the global index of the last on-process index for this process * ``%comm``: the communicator associated with ``imap`` in the MPI implementation * ``%offp_index``: the rank-1 integer array of off-process global indices included on this process Miscellaneous Procedures ^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: fortran type(index_map) :: imap imap%global_index(n) Returns the global index corresponding to local index ``n`` with respect to ``imap``. This function is not collective and can be called from any single process. This is also an elemental function. .. code-block:: fortran type(index_map) :: imap call imap%add_offp_index(offp_index) ``offp_index`` is a rank-1 array of global off-process indices with respect to ``imap`` to include on this process. This may be a 0-sized array. It is an error if the given indices are not contained in the ``imap`` global index set, or if any of the indices are on-process with respect to ``imap``. This is a collective procedure. It is not currently possible to add new off-process indices to an ``index_map`` object that already includes some. Thus it is an error to call this procedure if ``imap`` already includes off-process indices. Distribute and Collate Subroutines ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These type-bound generic subroutines distribute data from the root process to all processes and the reverse operation of collating data from all processes onto the root process. These are collective procedures that must be called from all processes. Both arguments must have the same type, kind parameters, and rank. Currently there are specific procedures for ``real`` and ``complex`` arguments of ``real32`` and ``real64`` kinds, ``integer`` arguments of ``int32`` and ``int64`` kinds, and default ``logical`` arguments, and array ranks up to 3. .. code-block:: fortran type(index_map) :: imap call imap%distribute(src, dest) This distributes the elements of a global array ``src`` on the ``imap`` root process to the local array ``dest`` on all processes according to ``imap``. Both arguments must have the same rank, and must have the same extent in all but the last dimension. In the multi-rank case ``imap`` describes the indexing of the last dimension, and the "elements" of ``src`` being distributed to corresponding "elements" of ``dest`` are rank-(*n-1*) sections ``src(:,...,j)`` for global ``imap`` indices ``j``. ``src`` is only referenced on the ``imap`` root process, where its extent in the last dimension must be at least ``imap%global_size``; other processes can pass a 0-sized array. The extent of ``dest`` in its last dimension must be at least ``imap%onp_size``. It is ``intent(inout)`` so that the value of any trailing elements is unchanged. This subroutine wraps ``MPI_Scatterv``. If ``imap`` includes off-process indices it often desired that the distributed array ``dest`` also include elements for these indices. To accomplish this the ``dest`` array needs to be properly sized (extent in its last dimension at least ``imap%local_size``) and then simply make a subsequent call to gather these values: ``call imap%gather_offp(dest)``. .. code-block:: fortran type(index_map) :: imap call imap%collate(src, dest) This is the reverse of ``distribute``. This collates the on-process elements of the local ``src`` array on each process into the global array ``dest`` on the ``imap`` root process according to ``imap``. Both arguments must have the same rank, and must have the same extent in all but the last dimension. In the multi-rank case ``imap`` describes the indexing of the last dimension, and the "elements" of ``src`` that are being collated into corresponding "elements" of ``dest`` are rank-(*n-1*) sections ``src(:,...,j)`` for global ``imap`` indices ``j``. The extent of ``src`` in its last dimension must be at least ``imap%onp_size``; any trailing elements are ignored. The ``dest`` array is only referenced on the ``imap`` root process, where its extent in the last dimension must be at least ``imap%global_size``; other processes can pass a 0-sized array. It is ``intent(inout)`` so that the value of any trailing elements is unchanged. This subroutine wraps ``MPI_Gatherv``. Off-Process Gather Subroutines ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each global index is owned by exactly one process (where it is on-process) but may exist on other processes as an off-process index. Applied to an array distributed according to an ``index_map`` object, the generic ``gather_offp`` subroutines replace the value of each off-process element with the value of its corresponding on-process element. These are collective procedures that must be called from all process. The arguments must have the same rank, type, and kind on all processes. Currently there are specific procedures for rank-1 arrays only. For ``scatter_offp_sum`` these support ``real`` and ``complex`` kinds ``real32`` and ``real64``, and ``integer`` kinds ``int32`` and ``int64``. For ``scatter_offp_min`` and ``scatter_offp_max`` they support ``real`` kinds ``real32`` and ``real64``, and ``integer`` kinds ``int32`` and ``int64``. For ``scatter_offp_or`` and ``scatter_offp_and`` they support default ``logical``. There are two forms of the generic ``gather_offp``. The most-used form is .. code-block:: fortran type(index_map) :: imap call imap%gather_offp(local_data) ``local_data`` is a rank-*n* array distributed according to ``imap``. Its extent in all but the last dimension, if any, must be the same on all processes, and its extent in the last dimension must be at least ``imap%local_size``. In the multi-rank case ``imap`` describes the indexing of the last dimension of ``local_data``, and the "elements" that are being gathered are rank-(*n-1*) sections ``local_data(:,...,j)`` for local ``imap`` indices ``j``. The second form simply splits the on-process and off-process array elements into separate array arguments: .. code-block:: fortran type(index_map) :: imap call imap%gather_offp(onp_data, offp_data) This is useful in less-common cases where the on and off-process elements are not stored contiguously in memory. In this form ``onp_data`` is ``intent(in)`` with extent in the last dimension at least ``imap%onp_size``, while ``offp_data`` is ``intent(inout)``, with extent in the last dimension at least ``imap%offp_size``. Off-Process Scatter-Reduce Subroutines ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These are in some sense the reverse of ``gather_offp``. However since an on-process global index may exist on multiple other processes as an off-process index, the operation of copying an off-process array element to its corresponding on-process array element is an ill-defined operation. Instead, the off-process elements are reduced with the on-process element using an associative, commutative binary operation. These are collective procedures that must be called from all process. The arguments must have the same rank, type, and kind on all processes. Currently there are specific procedures for rank-1, 2, and 3 arrays of types: ``real`` and ``complex`` kinds ``real32`` and ``real64``; ``integer`` kinds ``int32`` and ``int64``; and default ``logical``. There are two forms of the generic ``scatter_offp_`` subroutines. The most-used form is .. code-block:: fortran type(index_map) :: imap call imap%scatter_offp_(local_data) where ```` may be ``sum``, ``min``, or ``max`` when ``local_data`` is of integer, real, or complex type, and ``or`` or ``and`` when the array is of logical type. ``local_data`` is a rank-*n* array distributed according to ``imap``. Its extent in all but the last dimension, if any, must be the same on all processes, and its extent in the last dimension must be at least ``imap%local_size``. In the multi-rank case ``imap`` describes the indexing of the last dimension of ``local_data``, and the "elements" that are being scattered are rank-(*n-1*) sections ``local_data(:,...,j)`` for local ``imap`` indices ``j``. In this case the reduction operation is applied to corresponding section elements. The second form simply splits the on-process and off-process array elements into separate array arguments: .. code-block:: fortran type(index_map) :: imap call imap%scatter_offp_(onp_data, offp_data) This is useful in less-common cases where the on and off-process elements are not stored contiguously in memory. In this form ``onp_data`` is ``intent(inout)`` with extent in the last dimension at least ``imap%onp_size``, while ``offp_data`` is ``intent(in)``, with extent in the last dimension at least ``imap%offp_size``. Localization of Indirect Indexing Arrays ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Indirect indexing arrays are a necessary feature of any algorithm that deals with unstructured meshes. These are integer arrays that map one index set into another; for example, a rank-2 array that gives the node indices that are the vertices of each cell. When these refer to global indices and perhaps also exist only as a global indexing array on one process, as is usually the case, These need to "localized" for doing process-local computation on distributed data; that is, they need to be distributed, if necessary, according to a distributed mapping of their domain index set, and have their values converted to local indices according to a distributed mapping of their range index set. This procedure results in process-local indirect indexing arrays. It is usually the case that a distributed indirect indexing array will reference global indices that are not local to the process. In order to obtain closure for the local indexing array, these off-process references need to be included in the local index set. This is one of the key aspects of the localization procedure, and one of the main methods by which off-process indices are added to ``index_map`` objects. The generic ``localize_index_array`` subroutine has the following forms: .. code-block:: fortran type(index_map) :: range call range%localize_index_array(index [,stat]) ``index`` is a distributed indirect indexing array that takes values in the ``range`` global index set. The array may be rank-1 or rank-2. Each process "localizes" its ``intent(inout)`` ``index`` array by replacing its global index values with their corresponding local index values with respect to ``range``. When off-process global index values are encountered, which have no automatic corresponding local index as do on-process global indices, they are added as additional off-process indices to ``range`` if not already included as such. Thus ``range`` may be modified by this subroutine. An ``index`` value of 0 is treated specially and not modified by this subroutine. Otherwise all ``index`` values must belong to the interval ``[1, range%global_size]``. It is currently not possible to add new off-process indices to an ``index_map`` object that already includes some. Thus if ``range`` includes off-process indices before calling this subroutine and it is found that additional off-process indices need to be added in order to satisfy the values of ``index``, this will result in an error. If the optional integer ``stat`` argument is present, it will return 0 on success and a nonzero value for this unsupported case; otherwise the program will abort with an error message. .. code-block:: fortran type(index_map) :: domain call domain%localize_index_array(g_index, range, l_index [,stat]) ``range`` is an ``index_map`` type object and ``g_index`` is a serial indirect indexing array indexed by global indices from the ``domain`` global index set and taking values in the ``range`` global index set. The array may be rank-1 or rank-2. In the latter case it is the last dimension that is indexed by ``domain`` and the section ``g_index(:,j)`` is an array of ``range`` global indices associated with ``domain`` global index ``j``. ``g_index`` is only referenced on the root process of ``domain``, where its extent in the last dimension must be at least ``domain%global_size`` (any additional elements are ignored); on other processes a 0-sized array can be passed. ``l_index`` is an allocatable ``intent(out)`` array. Its rank must be the same as ``g_index``. In the first step each process allocates ``l_index`` to its proper shape: the same extents as ``g_index`` in all but the last dimension, and extent ``domain%local_size`` in the last dimension. ``g_index`` is then distributed to ``l_index`` according to ``domain``, including any off-process elements that ``domain`` may include. The last step is to localize ``l_array`` by calling the previous form of ``localize_index_array``. As a result, this subroutine may modify ``range``, and the same limitation on pre-existing off-process indices noted for the previous form applies here. NB: It is possible for an indirect indexing array to have the same global index set for both its domain and range. In such a case ``domain`` and ``range`` are aliased arguments, one ``intent(in)`` and the other ``intent(inout)``. This should not be a source of error, however, because in the first step ``range`` is not referenced, and in the last step ``domain`` is not referenced. .. code-block:: fortran type(index_map) :: domain call domain%localize_index_array(g_count, g_index, range, l_count, l_index [,stat]) This version handles the case of a ragged indirect indexing array that is represented by a pair of rank-1 integer arrays. In the pair (``g_count``, ``g_index``) ``g_count`` is indexed by the ``domain`` global index set and ``g_index`` takes values in the ``range`` global index set. The first ``g_count(1)`` elements of ``g_index`` are associated with ``domain`` global index 1, the next ``g_count(2)`` elements with global index 2, and so forth. If all values of ``g_count`` were equal to ``n``, say, then the indexing array could be more easily represented by a single rank-2 indexing array whose first dimension had extent equal to ``n``. A similar description hold for the output localized pair (``l_count``, ``l_index``) which is indexed by the ``domain`` local index set and takes values in the ``range`` local index set. Apart from the difference in representation of the indirect indexing arrays, this subroutine operates in exactly the same way as the preceding form.