Module index_map_type¶
The elements of a Fortran array are accessed by an index, which is an
integer in a contiguous range that usually starts at 1. To describe the
distribution of array elements across one or more parallel processes, it
suffices to describe how its index set is distributed across, or mapped to,
processes; the type of the array data itself is irrelevant. As its name
suggests, the index_map_type module defines the index_map derived
type which describes the mapping of a contiguous index set to one or more
parallel processes. The mapping allows for overlap between processes, and
provides array data gather and scatter procedures that are associated with
that overlap. The derived type also provides procedures for distributing
and collating array data based on the index set, and procedures for
localizing serial indirect indexing arrays.
Concepts¶
A global index set is a 1-based contiguous range of integers, {1, 2, …, N}. Given P processes, an index is assigned to a unique process according to a block partition of the index set: The first N_1 indices are assigned to process 1, the next N_2 indices to process 2, and so forth. The sum of the block sizes N_1 + N_2 + … + N_P = N, and a block size of 0 is allowed. An index so assigned to a process is said to be owned by that process. In addition to these, a process may include indices owned by other processes. The owned indices are said to be on-process, and these additional indices, if any, are said to be off-process. It is important to note that a global index is owned by a unique process; this is in contrast to other schemes (e.g. Trilinos/Tpetra) where a global index may be owned by multiple processes.
Process-local index sets.
For the purposes of process-local computation, the collection of all global indices known to a process are mapped to a 1-based, contiguous, local index set as follows. The contiguous block of on-process global indices are assigned, in order, consecutive local index numbers starting at 1, and the off-process indices are assigned, in order, consecutive local index numbers immediately following these.
The index_map Derived Type¶
An index_map type variable is a distributed parallel object. Except for
the type-bound function global_index, all type-bound procedures must be
called collectively by all processes.
There are two different implementations of the derived type: one using MPI and the other using Fortran coarrays (CAF). With just a couple exceptions, both have exactly the same interface.
The distribute and collate procedures involve a single distinguished process
called the root process. By default this is MPI rank 0 (or CAF image 1).
Although there seems little reason to choose a different rank (or image), one
can, by giving a different value for the optional root argument to the
init procedure.
The next sections describe the public components and type-bound procedures of the type.
Initialization¶
A variable
type(index_map) :: imap
must be initialized before being used by calling its init method. There
are several possible versions. All are collective procedures that must be
called by all processes. Most init versions take an MPI communicator
comm as an optional argument. If not specified, MPI_COMM_WORLD is
used. This communicator is used in all internal MPI calls. The comm
argument is omitted in the CAF implementation.
call imap%init(bsize [,comm] [,root])Each process gives its own block size using the scalar integer
bsize. A block size of 0 is allowed. The resultingimapincludes no off-process indices. These can be added later using theadd_offp_indexmethod.call imap%init(bsizes [,comm] [,root])The root process gives the block sizes for all processes using the rank-1 integer array
bsizeswhose size equals the number of processes. A block size of 0 is allowed.bsizeis only accessed on the root process; other processes may pass a 0-sized array. The resultingimapincludes no off-process indices. These can be added later using theadd_offp_indexmethod.call imap%init(bsize, offp_index [,comm] [,root])In this extension of version 1, each process also gives a list of its off-process global indices using the rank-1 integer array
offp_index. This will be a 0-sized array if the process has no off-process indices. The global indices given inoffp_indexmust belong to the global index set as defined by the collective values ofbsize, and must not be on-process indices for the process.call imap%init(bsizes, offp_counts, offp_indices [,comm] [,root])In this extension of version 2, the root process also gives lists of off-process global indices to include for all process using the rank-1 integer arrays
offp_countsandoffp_indices. The size ofoffp_countsmust equal the number of processes.offp_counts(n)is the number of off-process indices for the nth process; a value of 0 is allowed. The off-process indices are stored in theoff-process-indicesarray in order of process, so that its size equals the sum of the elements ofoffp_counts. The global off-process indices are subject to the same requirements as for version 3. The subroutine arguments are only accessed on the root process; other processes may pass 0-sized arrays.call imap%init(domain, g_count)This is a specialized version associated with ragged indexing arrays.
domainis anindex_maptype variable andg_countis a rank-1 integer array. Both areintent(in).g_countis only referenced on the root process ofdomain, where its size must be at leastdomain%global_size; other processes can pass a 0-sized array. The global index set forimapwill be derived from theg_countarray: starting with 1, the first consecutiveg_count(1)integers derive fromg_count(1), the next consecutiveg_count(2)integers derive fromg_count(2), and so forth. The size of this global index set is thus the sum of the firstdomain%global_sizeelements ofg_count.imapis initialized with the same MPI communicator asdomainand with the sum of the on-process elements ofg_count(were it distributed according todomain) as the block size for a process. Ifdomainincludes off-process indices, these give rise to off-process indices inimap: the global indices that derive from off-processg_countelements on this process are added as off-process indices toimap.
Data Components¶
An initialized index_map variable has the following public data
components. These are read-only and must never be modified by client
code. It will be possible to have the compiler enforce this in the future by
use of the protected attribute being introduced in the upcoming F202X
standard.
%onp_size: the number of on-process indices for this process%offp_size: the number of off-process indices for this process%local_size: the size of the local index set for this process. This is the sum of%onp_sizeand%offp_size%global_size: The size of the global index set.%first_gid: the global index of the first on-process index for this process%last_gid: the global index of the last on-process index for this process%comm: the communicator associated withimapin the MPI implementation%offp_index: the rank-1 integer array of off-process global indices included on this process
Miscellaneous Procedures¶
type(index_map) :: imap
imap%global_index(n)
Returns the global index corresponding to local index n with respect to
imap. This function is not collective and can be called from any single
process. This is also an elemental function.
type(index_map) :: imap
call imap%add_offp_index(offp_index)
offp_index is a rank-1 array of global off-process indices with respect
to imap to include on this process. This may be a 0-sized array. It is
an error if the given indices are not contained in the imap global index
set, or if any of the indices are on-process with respect to imap. This
is a collective procedure. It is not currently possible to add new
off-process indices to an index_map object that already includes some.
Thus it is an error to call this procedure if imap already includes
off-process indices.
Distribute and Collate Subroutines¶
These type-bound generic subroutines distribute data from the root process
to all processes and the reverse operation of collating data from all
processes onto the root process. These are collective procedures that must
be called from all processes. Both arguments must have the same type, kind
parameters, and rank. Currently there are specific procedures for real
and complex arguments of real32 and real64 kinds, integer
arguments of int32 and int64 kinds, and default logical
arguments, and array ranks up to 3.
type(index_map) :: imap
call imap%distribute(src, dest)
This distributes the elements of a global array src on the imap root
process to the local array dest on all processes according to imap.
Both arguments must have the same rank, and must have the same extent in all
but the last dimension. In the multi-rank case imap describes the
indexing of the last dimension, and the “elements” of src being
distributed to corresponding “elements” of dest are rank-(n-1) sections
src(:,...,j) for global imap indices j. src is only
referenced on the imap root process, where its extent in the last
dimension must be at least imap%global_size; other processes can pass a
0-sized array. The extent of dest in its last dimension must be at least
imap%onp_size. It is intent(inout) so that the value of any trailing
elements is unchanged. This subroutine wraps MPI_Scatterv.
If imap includes off-process indices it often desired that the
distributed array dest also include elements for these indices. To
accomplish this the dest array needs to be properly sized (extent in its
last dimension at least imap%local_size) and then simply make a
subsequent call to gather these values: call imap%gather_offp(dest).
type(index_map) :: imap
call imap%collate(src, dest)
This is the reverse of distribute. This collates the on-process elements
of the local src array on each process into the global array dest on
the imap root process according to imap. Both arguments must have the
same rank, and must have the same extent in all but the last dimension. In
the multi-rank case imap describes the indexing of the last dimension,
and the “elements” of src that are being collated into corresponding
“elements” of dest are rank-(n-1) sections src(:,...,j) for global
imap indices j. The extent of src in its last dimension must be
at least imap%onp_size; any trailing elements are ignored. The dest
array is only referenced on the imap root process, where its extent in
the last dimension must be at least imap%global_size; other processes can
pass a 0-sized array. It is intent(inout) so that the value of any
trailing elements is unchanged. This subroutine wraps MPI_Gatherv.
Off-Process Gather Subroutines¶
Each global index is owned by exactly one process (where it is on-process)
but may exist on other processes as an off-process index. Applied to an
array distributed according to an index_map object, the generic
gather_offp subroutines replace the value of each off-process element
with the value of its corresponding on-process element.
These are collective procedures that must be called from all process. The
arguments must have the same rank, type, and kind on all processes.
Currently there are specific procedures for rank-1 arrays only. For
scatter_offp_sum these support real and complex kinds
real32 and real64, and integer kinds int32 and int64.
For scatter_offp_min and scatter_offp_max they support real
kinds real32 and real64, and integer kinds int32 and
int64. For scatter_offp_or and scatter_offp_and they support
default logical.
There are two forms of the generic gather_offp. The most-used form is
type(index_map) :: imap
call imap%gather_offp(local_data)
local_data is a rank-n array distributed according to imap. Its
extent in all but the last dimension, if any, must be the same on all
processes, and its extent in the last dimension must be at least
imap%local_size. In the multi-rank case imap describes the indexing
of the last dimension of local_data, and the “elements” that are being
gathered are rank-(n-1) sections local_data(:,...,j) for local
imap indices j.
The second form simply splits the on-process and off-process array elements into separate array arguments:
type(index_map) :: imap
call imap%gather_offp(onp_data, offp_data)
This is useful in less-common cases where the on and off-process elements are
not stored contiguously in memory. In this form onp_data is
intent(in) with extent in the last dimension at least imap%onp_size,
while offp_data is intent(inout), with extent in the last dimension
at least imap%offp_size.
Off-Process Scatter-Reduce Subroutines¶
These are in some sense the reverse of gather_offp. However since an
on-process global index may exist on multiple other processes as an
off-process index, the operation of copying an off-process array element to
its corresponding on-process array element is an ill-defined operation.
Instead, the off-process elements are reduced with the on-process element
using an associative, commutative binary operation.
These are collective procedures that must be called from all process. The
arguments must have the same rank, type, and kind on all processes.
Currently there are specific procedures for rank-1, 2, and 3 arrays of
types: real and complex kinds real32 and real64;
integer kinds int32 and int64; and default logical.
There are two forms of the generic scatter_offp_<op> subroutines. The
most-used form is
type(index_map) :: imap
call imap%scatter_offp_<op>(local_data)
where <op> may be sum, min, or max when local_data is of
integer, real, or complex type, and or or and when the array is of
logical type. local_data is a rank-n array distributed according to
imap. Its extent in all but the last dimension, if any, must be the same
on all processes, and its extent in the last dimension must be at least
imap%local_size. In the multi-rank case imap describes the indexing
of the last dimension of local_data, and the “elements” that are being
scattered are rank-(n-1) sections local_data(:,...,j) for local
imap indices j. In this case the reduction operation is applied to
corresponding section elements.
The second form simply splits the on-process and off-process array elements into separate array arguments:
type(index_map) :: imap
call imap%scatter_offp_<op>(onp_data, offp_data)
This is useful in less-common cases where the on and off-process elements are
not stored contiguously in memory. In this form onp_data is
intent(inout) with extent in the last dimension at least
imap%onp_size, while offp_data is intent(in), with extent in the
last dimension at least imap%offp_size.
Localization of Indirect Indexing Arrays¶
Indirect indexing arrays are a necessary feature of any algorithm that deals
with unstructured meshes. These are integer arrays that map one index set
into another; for example, a rank-2 array that gives the node indices that
are the vertices of each cell. When these refer to global indices and
perhaps also exist only as a global indexing array on one process, as is
usually the case, These need to “localized” for doing process-local
computation on distributed data; that is, they need to be distributed, if
necessary, according to a distributed mapping of their domain index set, and
have their values converted to local indices according to a distributed
mapping of their range index set. This procedure results in process-local
indirect indexing arrays. It is usually the case that a distributed indirect
indexing array will reference global indices that are not local to the
process. In order to obtain closure for the local indexing array, these
off-process references need to be included in the local index set. This is
one of the key aspects of the localization procedure, and one of the main
methods by which off-process indices are added to index_map objects.
The generic localize_index_array subroutine has the following forms:
type(index_map) :: range
call range%localize_index_array(index [,stat])
index is a distributed indirect indexing array that takes values in the
range global index set. The array may be rank-1 or rank-2. Each process
“localizes” its intent(inout) index array by replacing its global
index values with their corresponding local index values with respect to
range. When off-process global index values are encountered, which have
no automatic corresponding local index as do on-process global indices, they
are added as additional off-process indices to range if not already
included as such. Thus range may be modified by this subroutine. An
index value of 0 is treated specially and not modified by this
subroutine. Otherwise all index values must belong to the interval
[1, range%global_size]. It is currently not possible to add new
off-process indices to an index_map object that already includes some.
Thus if range includes off-process indices before calling this
subroutine and it is found that additional off-process indices need to be
added in order to satisfy the values of index, this will result in an
error. If the optional integer stat argument is present, it will return
0 on success and a nonzero value for this unsupported case; otherwise the
program will abort with an error message.
type(index_map) :: domain
call domain%localize_index_array(g_index, range, l_index [,stat])
range is an index_map type object and g_index is a serial
indirect indexing array indexed by global indices from the domain global
index set and taking values in the range global index set. The array may
be rank-1 or rank-2. In the latter case it is the last dimension that is
indexed by domain and the section g_index(:,j) is an array of
range global indices associated with domain global index j.
g_index is only referenced on the root process of domain, where its
extent in the last dimension must be at least domain%global_size (any
additional elements are ignored); on other processes a 0-sized array can be
passed. l_index is an allocatable intent(out) array. Its rank must
be the same as g_index. In the first step each process allocates
l_index to its proper shape: the same extents as g_index in all but
the last dimension, and extent domain%local_size in the last dimension.
g_index is then distributed to l_index according to domain,
including any off-process elements that domain may include. The last step
is to localize l_array by calling the previous form of
localize_index_array. As a result, this subroutine may modify range,
and the same limitation on pre-existing off-process indices noted for the
previous form applies here.
NB: It is possible for an indirect indexing array to have the same global
index set for both its domain and range. In such a case domain and
range are aliased arguments, one intent(in) and the other
intent(inout). This should not be a source of error, however, because in
the first step range is not referenced, and in the last step domain
is not referenced.
type(index_map) :: domain
call domain%localize_index_array(g_count, g_index, range, l_count, l_index [,stat])
This version handles the case of a ragged indirect indexing array that is
represented by a pair of rank-1 integer arrays. In the pair
(g_count, g_index) g_count is indexed by the domain global
index set and g_index takes values in the range global index set.
The first g_count(1) elements of g_index are associated with
domain global index 1, the next g_count(2) elements with global
index 2, and so forth. If all values of g_count were equal to n,
say, then the indexing array could be more easily represented by a single
rank-2 indexing array whose first dimension had extent equal to n. A
similar description hold for the output localized pair (l_count,
l_index) which is indexed by the domain local index set and takes
values in the range local index set. Apart from the difference in
representation of the indirect indexing arrays, this subroutine operates in
exactly the same way as the preceding form.