Module index_map_type¶
The elements of a Fortran array are accessed by an index, which is an
integer in a contiguous range that usually starts at 1. To describe the
distribution of array elements across one or more parallel processes, it
suffices to describe how its index set is distributed across, or mapped to,
processes; the type of the array data itself is irrelevant. As its name
suggests, the index_map_type module defines the index_map derived
type which describes the mapping of a contiguous index set to one or more
parallel processes. The mapping allows for overlap between processes, and
provides array data gather and scatter procedures that are associated with
that overlap. The derived type also provides procedures for distributing
and collating array data based on the index set, and procedures for
localizing serial indirect indexing arrays.
Concepts¶
A global index set is a 1-based contiguous range of integers, \(\{1, 2, ..., N\}\). Given \(P\) processes, an index is assigned to a unique process according to a block partition of the index set: The first \(N_1\) indices are assigned to process 1, the next \(N_2\) indices to process 2, and so forth. The sum of the block sizes \(N_1 + N_2 + \cdots + N_P = N\), and a block size of 0 is allowed. An index so assigned to a process is said to be owned by that process. In addition to these, a process may include indices owned by other processes. The owned indices are said to be on-process, and these additional indices, if any, are said to be off-process. It is important to note that a global index is owned by a unique process; this is in contrast to other schemes (e.g. Trilinos/Tpetra) where a global index may be owned by multiple processes.
Process-local index sets.
For the purposes of process-local computation, the collection of all global indices known to a process are mapped to a 1-based, contiguous, local index set as follows. The contiguous block of on-process global indices are assigned, in order, consecutive local index numbers starting at 1, and the off-process indices are assigned, in order, consecutive local index numbers immediately following these.
The index_map Derived Type¶
An index_map type variable is a distributed parallel object. Except for
the type-bound function global_index, all type-bound procedures must be
called collectively by all processes.
There are two different implementations of the derived type: one using MPI and the other using Fortran coarrays (CAF). With just a couple exceptions, both have exactly the same interface.
The distribute and collate procedures involve a single distinguished process
called the root process. By default this is MPI rank 0 (or CAF image 1).
Although there seems little reason to choose a different rank (or image), one
can, by giving a different value for the optional root argument to the
init procedure.
The next sections describe the public components and type-bound procedures of the type.
Initialization¶
A variable
type(index_map) :: imap
must be initialized before being used by calling its init method. There
are several possible versions. All are collective procedures that must be
called by all processes. Most init versions take an MPI type(MPI_Comm)
communicator comm as an optional argument. If not specified,
MPI_COMM_WORLD is used. In the MPI implementation, imap stores and
owns a duplicate of this communicator, and that duplicate is used in all
internal MPI calls. The original communicator remains owned by the caller. The
comm argument is omitted in the CAF implementation. Calling init for
an already initialized imap first frees resources owned by the previous
map state.
call imap%init(bsize [,comm] [,root])Each process gives its own block size using the scalar integer
bsize. A block size of 0 is allowed. The resultingimapincludes no off-process indices. These can be added later using theadd_offp_indexmethod.call imap%init(bsizes [,comm] [,root])The root process gives the block sizes for all processes using the rank-1 integer array
bsizeswhose size equals the number of processes. A block size of 0 is allowed.bsizesis only accessed on the root process; other processes may pass a 0-sized array. The resultingimapincludes no off-process indices. These can be added later using theadd_offp_indexmethod.call imap%init(bsize, offp_index [,comm] [,root])In this extension of version 1, each process also gives a list of its off-process global indices using the rank-1 integer array
offp_index. This will be a 0-sized array if the process has no off-process indices. The global indices given inoffp_indexmust belong to the global index set as defined by the collective values ofbsize, and must not be on-process indices for the process.call imap%init(bsizes, offp_counts, offp_indices [,comm] [,root])In this extension of version 2, the root process also gives lists of off-process global indices to include for all processes using the rank-1 integer arrays
offp_countsandoffp_indices. The size ofoffp_countsmust equal the number of processes.offp_counts(n)is the number of off-process indices for the nth process; a value of 0 is allowed. The off-process indices are stored in theoffp_indicesarray in order of process, so that its size equals the sum of the elements ofoffp_counts. The global off-process indices are subject to the same requirements as for version 3. The subroutine arguments are only accessed on the root process; other processes may pass 0-sized arrays.call imap%init(domain, g_count)This is a specialized version associated with ragged indexing arrays.
domainis anindex_maptype variable andg_countis a rank-1 integer array. Both areintent(in).g_countis only referenced on the root process ofdomain, where its size must be at leastdomain%global_size; other processes can pass a 0-sized array. The global index set forimapwill be derived from theg_countarray: starting with 1, the first consecutiveg_count(1)integers derive from domain global index 1, the next consecutiveg_count(2)integers derive from domain global index 2, and so forth. The size of this global index set is thus the sum of the firstdomain%global_sizeelements ofg_count. In the MPI implementation,imapis initialized usingdomain’s communicator and stores its own duplicate; it does not sharedomain’s stored communicator. The root process or image is inherited fromdomain. The block size for a process is the sum of the on-process elements ofg_count(were it distributed according todomain). Ifdomainincludes off-process indices, these give rise to off-process indices inimap: the global indices that derive from off-processg_countelements on this process are added as off-process indices toimap.
Data Components¶
An initialized index_map variable has the following public data
components. These are read-only and must never be modified by client
code.
%onp_size: the number of on-process indices for this process%offp_size: the number of off-process indices for this process%local_size: the size of the local index set for this process. This is the sum of%onp_sizeand%offp_size%global_size: the size of the global index set%first_gid: the global index of the first on-process index for this process%last_gid: the global index of the last on-process index for this process%offp_index: the rank-1 integer array of off-process global indices included on this process
Miscellaneous Procedures¶
type(index_map) :: imap
imap%global_index(n)
Returns the global index corresponding to local index n with respect to
imap. This function is not collective and can be called from any single
process. This is also an elemental function.
type(index_map) :: imap
call imap%add_offp_index(offp_index)
offp_index is a rank-1 array of global off-process indices with respect
to imap to include on this process. This may be a 0-sized array. It is
an error if the given indices are not contained in the imap global index
set, or if any of the indices are on-process with respect to imap. This
is a collective procedure. It is not currently possible to add new
off-process indices to an index_map object that already includes some.
Thus it is an error to call this procedure if imap already includes
off-process indices.
type(index_map) :: imap
call imap%free()
Frees resources owned by imap and resets it to its default uninitialized
state. This may be called multiple times. In the MPI implementation this frees
the duplicated base communicator and any derived graph communicators created
for off-process gather/scatter operations. It does not free the caller’s
original communicator. This is a collective procedure when imap has been
initialized.
Distribute and Collate Subroutines¶
These type-bound generic subroutines distribute data from the root process
to all processes and the reverse operation of collating data from all
processes onto the root process. These are collective procedures that must
be called from all processes. Both arguments must have the same type, kind
parameters, and rank. Currently there are specific procedures for real
and complex arguments of real32 and real64 kinds, integer
arguments of int8, int32 and int64 kinds, and default logical
arguments, and array ranks up to 4. For multi-rank data arrays, the last
dimension is the distributed index dimension. The extents in all preceding
dimensions define the shape of the data associated with one index. These
leading extents must be consistent across the data arrays that actually
participate in the data transfer. This is a caller contract and is not
checked at run time.
type(index_map) :: imap
call imap%distribute(src, dest)
This distributes the elements of a global array src on the imap root
process to the local array dest on all processes according to imap.
Both arguments must have the same rank. In the multi-rank case imap
describes the indexing of the last dimension, and the “elements” of src
being distributed to corresponding “elements” of dest are rank-(n-1)
sections src(:,...,j) for global imap indices j. The leading
extents of src on the imap root process must equal the leading
extents of dest on every process. src is only referenced on the
imap root process, where its extent in the last dimension must be at
least imap%global_size; other processes can pass a 0-sized array. The
extent of dest in its last dimension must be at least imap%onp_size.
It is intent(inout) so that the value of any trailing elements is
unchanged. The MPI implementation wraps MPI_Scatterv.
If imap includes off-process indices it is often desired that the
distributed array dest also include elements for these indices. To
accomplish this the dest array needs to be properly sized (extent in its
last dimension at least imap%local_size) and then simply make a
subsequent call to gather these values: call imap%gather_offp(dest).
type(index_map) :: imap
call imap%collate(src, dest)
This is the reverse of distribute. This collates the on-process elements
of the local src array on each process into the global array dest on
the imap root process according to imap. Both arguments must have the
same rank. In the multi-rank case imap describes the indexing of the
last dimension, and the “elements” of src that are being collated into
corresponding “elements” of dest are rank-(n-1) sections
src(:,...,j) for global imap indices j. The leading extents of
src on every process must equal the leading extents of dest on the
imap root process. The extent of src in its last dimension must be
at least imap%onp_size; any trailing elements are ignored. The dest
array is only referenced on the imap root process, where its extent in
the last dimension must be at least imap%global_size; other processes can
pass a 0-sized array. It is intent(inout) so that the value of any
trailing elements is unchanged. The MPI implementation wraps MPI_Gatherv.
Off-Process Gather Subroutines¶
Each global index is owned by exactly one process (where it is on-process)
but may exist on other processes as an off-process index. Applied to an
array distributed according to an index_map object, the generic
gather_offp subroutines replace the value of each off-process element
with the value of its corresponding on-process element.
These are collective procedures that must be called from all processes. The
arguments must have the same rank, type, and kind on all processes.
Currently there are specific procedures for rank-1, 2, 3, and 4 arrays of
types: real and complex kinds real32 and real64;
integer kinds int8, int32 and int64; and default logical.
There are two forms of the generic gather_offp. The most-used form is
type(index_map) :: imap
call imap%gather_offp(local_data)
local_data is a rank-n array distributed according to imap. Its
extent in all but the last dimension, if any, must be the same on all
processes; this is a caller contract and is not checked at run time. Its
extent in the last dimension must be at least imap%local_size. In the
multi-rank case imap describes the indexing of the last dimension of
local_data, and the “elements” that are being gathered are rank-(n-1)
sections local_data(:,...,j) for local imap indices j.
The second form simply splits the on-process and off-process array elements into separate array arguments:
type(index_map) :: imap
call imap%gather_offp(onp_data, offp_data)
This is useful in less-common cases where the on and off-process elements are
not stored contiguously in memory. In this form onp_data is
intent(in) with extent in the last dimension at least imap%onp_size,
while offp_data is intent(inout), with extent in the last dimension
at least imap%offp_size. For multi-rank arguments, onp_data and
offp_data must have the same leading extents, and those leading extents
must be the same on all processes. This is a caller contract and is not
checked at run time.
Off-Process Scatter-Reduce Subroutines¶
These are in some sense the reverse of gather_offp. However since an
on-process global index may exist on multiple other processes as an
off-process index, the operation of copying an off-process array element to
its corresponding on-process array element is an ill-defined operation.
Instead, the off-process elements are reduced with the on-process element
using an associative, commutative binary operation.
These are collective procedures that must be called from all processes. The
arguments must have the same rank, type, and kind on all processes.
Currently there are specific procedures for rank-1, 2, 3, and 4 arrays. For
scatter_offp_sum these support real and complex kinds
real32 and real64, and integer kinds int32 and int64.
For scatter_offp_min and scatter_offp_max they support real
kinds real32 and real64, and integer kinds int32 and
int64. For scatter_offp_or and scatter_offp_and they support
default logical.
There are two forms of the generic scatter_offp_<op> subroutines. The
most-used form is
type(index_map) :: imap
call imap%scatter_offp_<op>(local_data)
where <op> may be sum, min, or max when local_data is of
integer, real, or complex type, and or or and when the array is of
logical type. local_data is a rank-n array distributed according to
imap. Its extent in all but the last dimension, if any, must be the same
on all processes; this is a caller contract and is not checked at run time.
Its extent in the last dimension must be at least imap%local_size. In
the multi-rank case imap describes the indexing of the last dimension of
local_data, and the “elements” that are being scattered are rank-(n-1)
sections local_data(:,...,j) for local imap indices j. In this
case the reduction operation is applied to corresponding section elements.
The second form simply splits the on-process and off-process array elements into separate array arguments:
type(index_map) :: imap
call imap%scatter_offp_<op>(onp_data, offp_data)
This is useful in less-common cases where the on and off-process elements are
not stored contiguously in memory. In this form onp_data is
intent(inout) with extent in the last dimension at least
imap%onp_size, while offp_data is intent(in), with extent in the
last dimension at least imap%offp_size. For multi-rank arguments,
onp_data and offp_data must have the same leading extents, and those
leading extents must be the same on all processes. This is a caller contract
and is not checked at run time.
Localization of Indirect Indexing Arrays¶
Indirect indexing arrays are a necessary feature of any algorithm that deals
with unstructured meshes. These are integer arrays that map one index set
into another; for example, a rank-2 array that gives the node indices that
are the vertices of each cell. When these refer to global indices and
perhaps also exist only as a global indexing array on one process, these
need to be localized for doing process-local computation on distributed
data; that is, they need to be distributed, if
necessary, according to a distributed mapping of their domain index set, and
have their values converted to local indices according to a distributed
mapping of their range index set. This procedure results in process-local
indirect indexing arrays. It is usually the case that a distributed indirect
indexing array will reference global indices that are not local to the
process. In order to obtain closure for the local indexing array, these
off-process references need to be included in the local index set. This is
one of the key aspects of the localization procedure, and one of the main
methods by which off-process indices are added to index_map objects.
The generic localize_index_array subroutine has the following forms:
In the MPI implementation, the forms involving both domain and range
require these two maps to use congruent communicators.
type(index_map) :: range
call range%localize_index_array(index)
index is a distributed indirect indexing array that takes values in the
range global index set. The array may be rank-1 or rank-2. Each process
“localizes” its intent(inout) index array by replacing its global
index values with their corresponding local index values with respect to
range. When off-process global index values are encountered, which have
no automatic corresponding local index as do on-process global indices, they
are added as additional off-process indices to range if not already
included as such. Thus range may be modified by this subroutine. An
index value of 0 is treated specially and not modified by this
subroutine. Otherwise all index values must belong to the interval
[1, range%global_size]. It is currently not possible to add new
off-process indices to an index_map object that already includes some.
Thus if range includes off-process indices before calling this
subroutine and it is found that additional off-process indices need to be
added in order to satisfy the values of index, this will result in an
unrecoverable error.
type(index_map) :: domain
call domain%localize_index_array(g_index, range, l_index)
range is an index_map type object and g_index is a serial
indirect indexing array indexed by global indices from the domain global
index set and taking values in the range global index set. The array may
be rank-1 or rank-2. In the latter case it is the last dimension that is
indexed by domain and the section g_index(:,j) is an array of
range global indices associated with domain global index j.
g_index is only referenced on the root process of domain, where its
extent in the last dimension must be at least domain%global_size (any
additional elements are ignored); on other processes a 0-sized array can be
passed. For a rank-2 g_index, a non-root 0-sized array must still have
the same first-dimension extent as g_index on the root process, because
that extent is used to allocate l_index and to distribute array sections.
l_index is an allocatable intent(out) array. Its rank must be the
same as g_index. In the first step each process allocates l_index to
its proper shape: the same extents as g_index in all but the last
dimension, and extent domain%local_size in the last dimension.
g_index is then distributed to l_index according to domain,
including any off-process elements that domain may include. The last step
is to localize l_index by calling the previous form of
localize_index_array. As a result, this subroutine may modify range,
and the same limitation on pre-existing off-process indices noted for the
previous form applies here.
NB: It is possible for an indirect indexing array to have the same global
index set for both its domain and range. In such a case domain and
range are aliased arguments, one intent(in) and the other
intent(inout). This should not be a source of error, however, because in
the first step range is not referenced, and in the last step domain
is not referenced.
type(index_map) :: domain
call domain%localize_index_array(g_count, g_index, range, l_count, l_index)
This version handles the case of a ragged indirect indexing array that is
represented by a pair of rank-1 integer arrays. In the pair
(g_count, g_index) g_count is indexed by the domain global
index set and g_index takes values in the range global index set.
The first g_count(1) elements of g_index are associated with
domain global index 1, the next g_count(2) elements with global
index 2, and so forth. If all values of g_count were equal to n,
say, then the indexing array could be more easily represented by a single
rank-2 indexing array whose first dimension had extent equal to n. A
similar description holds for the output localized pair (l_count,
l_index) which is indexed by the domain local index set and takes
values in the range local index set. Apart from the difference in
representation of the indirect indexing arrays, this subroutine operates in
exactly the same way as the preceding form.
Examples¶
The following examples are short fragments that illustrate common uses of the
index_map API. More substantial examples are provided in the repository’s
example directory.
Root I/O With Distributed Computation¶
This example shows the basic workflow for data that is read as a serial array on the root process, distributed for process-local computation, and then collated back to the root process for serial output.
use iso_fortran_env, only: real64
use index_map_type
type(index_map) :: imap
real(real64), allocatable :: global_input(:), global_output(:)
real(real64), allocatable :: local_input(:), local_output(:)
integer :: bsize, j
bsize = 10
call imap%init(bsize)
if (my_rank == 0) then
! Read global_input from a serial file here.
allocate(global_input(imap%global_size))
global_input = [(real(j, real64), j = 1, imap%global_size)]
allocate(global_output(imap%global_size))
else
allocate(global_input(0), global_output(0))
end if
allocate(local_input(imap%onp_size), local_output(imap%onp_size))
call imap%distribute(global_input, local_input)
local_output = 2.0_real64 * local_input
call imap%collate(local_output, global_output)
if (my_rank == 0) then
! Write global_output to a serial file here.
end if
The root process supplies the global input array and receives the global
output array. Each process owns bsize entries, so local_input and
local_output have extent imap%onp_size. No off-process indices are
included in this example.
Circular Forward Difference¶
This example starts with data that is already distributed. Each process owns
bsize consecutive entries of a periodic one-dimensional array and includes
one off-process entry: the next global entry after its owned block, wrapping
around at the end of the global index set. The off-process entry is gathered
before computing the forward difference.
use iso_fortran_env, only: real64
use index_map_type
type(index_map) :: imap
real(real64), allocatable :: u(:), du(:), cdu(:)
integer :: bsize, next_gid, j
bsize = 10
next_gid = 1 + modulo((my_rank + 1)*bsize, nproc*bsize)
call imap%init(bsize, [next_gid])
allocate(u(imap%local_size), du(imap%local_size), cdu(imap%local_size))
! Fill the on-process part of u from already-distributed data.
do j = 1, imap%onp_size
u(j) = real(imap%global_index(j), real64)
end do
call imap%gather_offp(u)
du = 0.0_real64
do j = 1, imap%onp_size
du(j) = u(j+1) - u(j)
end do
! Compute a centered difference to illustrate scatter_offp_sum.
cdu = 0.0_real64
cdu(1) = du(1)
do j = 2, imap%local_size
cdu(j) = du(j) + du(j-1)
end do
call imap%scatter_offp_sum(cdu)
The off-process value occupies u(imap%onp_size+1). Thus the same loop
computes the forward difference for every on-process index, including the
last owned index on each process and the periodic wraparound on the last
process. The cdu array illustrates a scatter-reduce pattern: the last
loop computes a centered difference from neighboring forward differences.
One of those contributions is for the off-process entry, and
scatter_offp_sum adds it back to the process that owns the corresponding
global index.
Localizing an Indirect Indexing Array¶
This example computes a centered difference using an indirect indexing array,
similar to the neighbor arrays used with unstructured meshes. Each process
constructs the global indices of the left and right neighbors for its owned
entries. The localization step converts those global indices to local indices
and adds any required off-process entries to range.
use iso_fortran_env, only: real64
use index_map_type
type(index_map) :: range
integer, allocatable :: index(:,:)
real(real64), allocatable :: u(:), dudx(:)
integer :: bsize, gid, left_gid, right_gid, n, j
bsize = 10
call range%init(bsize)
n = range%global_size
allocate(index(2, range%onp_size))
do j = 1, range%onp_size
gid = range%global_index(j)
left_gid = 1 + modulo(gid - 2, n)
right_gid = 1 + modulo(gid, n)
index(1,j) = left_gid
index(2,j) = right_gid
end do
call range%localize_index_array(index)
allocate(u(range%local_size), dudx(range%onp_size))
! Fill the on-process part of u from already-distributed data.
do j = 1, range%onp_size
u(j) = real(range%global_index(j), real64)
end do
call range%gather_offp(u)
do j = 1, range%onp_size
dudx(j) = 0.5_real64 * (u(index(2,j)) - u(index(1,j)))
end do
Before localization, index contains global indices in the range index
set. After localization, its entries are local indices into u. Any
neighbor value that is not owned by the process is included in the off-process
part of range and filled by gather_offp.
If the indirect indexing array is instead constructed as a global array on
the root process, use the second localization form. The following fragment
constructs the same periodic left/right neighbor array on the root process and
distributes/localizes it with respect to domain and range:
type(index_map) :: domain, range
integer, allocatable :: g_index(:,:), l_index(:,:)
integer :: bsize, gid, n
bsize = 10
call domain%init(bsize)
call range%init(bsize)
n = range%global_size
if (my_rank == 0) then
allocate(g_index(2, domain%global_size))
do gid = 1, domain%global_size
g_index(1,gid) = 1 + modulo(gid - 2, n)
g_index(2,gid) = 1 + modulo(gid, n)
end do
else
allocate(g_index(2, 0))
end if
call domain%localize_index_array(g_index, range, l_index)