-
Notifications
You must be signed in to change notification settings - Fork 27
Fortran API Reference
FTI Datatypes
FTI Constants
FTI_Init
FTI_InitType
FTI_Protect
FTI_Checkpoint
FTI_GetStoredSize
FTI_Realloc
FTI_Status
FTI_Recover
FTI_Snapshot
FTI_Finalize
FTI datatypes are used in the C-API function FTI_Protect
. With the count
parameter and the datatype, FTI is able to determine the size of the allocated memory region at ptr
.
The FTI Fortran interface defines a template of FTI_Protect
for all intrinsic data types. Hence the datatype definitions are not necessary here and are not avalable for the Fortran interface.
FTI_BUFS
: 256
FTI_DONE
: 1
FTI_SCES
: 0
FTI_NSCS
: -1
FTI_NREC
: -2
- Reads configuration file.
- Creates checkpoint directories.
- Detects topology of the system.
- Regenerates data upon recovery.
DEFINITION
subroutine FTI_Init ( config_file, global_comm, err )
ARGUMENTS
Variable | What for? | |
---|---|---|
character config_file |
IN | Path to the config file |
integer global_comm |
IN/OUT | MPI communicator used for the execution |
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
ierr | Reason |
---|---|
FTI_SCES |
Success |
FTI_NSCS |
No Success |
FTI_NREC |
FTI could not recover ckpt files |
DESCRIPTION
FTI_Init
initializes the FTI context. It must be called before any other FTI
function and after MPI_Init
. The MPI communicator passed, must be declared as integer, target
.
EXAMPLE
integer, target :: rank, nbProcs, err, FTI_comm_world
call MPI_Init(err)
FTI_comm_world = MPI_COMM_WORLD
call FTI_Init('config.fti', FTI_comm_world, err) ! modifies FTI_comm_world
call MPI_Comm_size(FTI_comm_world, nbProcs, err)
call MPI_Comm_rank(FTI_comm_world, rank, err)
- Initializes a data type.
DEFINITION
subroutine FTI_InitType ( type_F, size_F, err )
ARGUMENTS
Variable | What for? | |
---|---|---|
type(FTI_type) type_F |
IN | The data type to be initialized |
integer size_F |
IN | The size of the data type to be initialized |
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
err | Reason |
---|---|
FTI_SCES |
Success |
FTI_NSCS |
No Success |
DESCRIPTION
FTI_InitType initializes a FTI data-type. A data-type which is not Fortran intrinsic, must be defined using this function in order to protect variables of that type with FTI_Protect.
EXAMPLE
!...
type polar
real :: radius
real :: phi
end type
type(FTI_Type) :: FTI_Polar
type(polar), target :: choord
type(polar), pointer :: choord_ptr
type(c_ptr) :: choord_c_ptr
choord_ptr => choord
choord_c_ptr = c_loc(choord)
! ...
call FTI_InitType(FTI_Polar, int(sizeof(choord),4), ierr)
! ...
- Stores metadata concerning the variable to protect.
In the Fortran interface, FTI_Protect
comes with two different function headers.
One may be used for intrinsic Fortran types and the other must be used for derived data-types.
DEFINITION
subroutine FTI_Protect ( id, data, err ) !> For intrinsic data-types
subroutine FTI_Protect ( id, data_ptr, count_F, type_F, err ) !> For derived data-types
ARGUMENTS (intrinsic types)
Variable | What for? | |
---|---|---|
integer id |
IN | Unique ID of the variable to protect |
Fortran type, pointer data |
IN | Pointer to memory address of variable |
integer err |
OUT | Token for FTI error code. |
ARGUMENTS (derived types)
Variable | What for? | |
---|---|---|
integer id |
IN | Unique ID of the variable to protect |
type(c_ptr) data_ptr |
IN | Pointer to memory address of variable |
integer count_F |
IN | Number of elements. |
tape(FTI_Type) type_F |
IN | FTI_Type of Derived data-type. |
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
err | Reason |
---|---|
FTI_SCES |
Success |
FTI_NSCS |
Number of protected variables is > FTI_BUFS
|
DESCRIPTION
FTI_Protect
is used to add data fields to the list of protected
variables. Data, protected by this function will be stored during a call to FTI_Checkpoint or FTI_Snapshot and restored during a call to FTI_Recover.
If the dimension of a protected variable changes during the execution, a subsequent call to FTI_Protect
will update the meta-data whithin FTI in order to store the correct size during a successive call to FTI_Checkpoint or FTI_Snapshot.
EXAMPLE
For Fortran intrinsic data-types:
! ...
integer, target :: nbProcs, iter, row, col, err, FTI_comm_world
integer, pointer :: ptriter
real(8), pointer :: g(:,:)
call MPI_Init(err)
FTI_comm_world = MPI_COMM_WORLD
call FTI_Init('config.fti', FTI_comm_world, err) ! modifies FTI_comm_world
call MPI_Comm_size(FTI_comm_world, nbProcs, err)
row = sqrt((MEM_MB * 1024.0 * 512.0 * nbProcs)/8)
col = (row / nbProcs)+3
allocate( g(row, col) )
allocate( h(row, col) )
! INIT DATA ...
ptriter => iter
call FTI_Protect(0, ptriter, err)
call FTI_Protect(2, g, err)
! ...
For derived data-types
! ...
use iso_c_binding
type polar
real :: radius
real :: phi
end type
type(FTI_Type) :: FTI_Polar
integer, parameter :: N=128*1024*25 !> 25 MB / Process
integer, parameter :: N1 = 128
integer, parameter :: N2 = 1024
integer, parameter :: N3 = 25
integer, target :: FTI_COMM_WORLD
integer :: ierr, status
type(polar), dimension(:,:,:), pointer :: arr
type(c_ptr) :: arr_c_ptr
allocate(arr(N1,N2,N3))
shape = (/ N1, N2, N3 /)
arr_c_ptr = c_loc( arr( &
lbound(arr,1), &
lbound(arr,2), &
lbound(arr,3)))
!> INITIALIZE MPI AND FTI
call MPI_Init(ierr)
FTI_COMM_WORLD = MPI_COMM_WORLD
call FTI_Init('config.fti', FTI_COMM_WORLD, ierr)
call FTI_InitType(FTI_Polar, int(2*sizeof(1.0),4), ierr)
!> PROTECT DATA AND ITS SHAPE
call FTI_Protect(0, arr_c_ptr, size(arr), FTI_Polar, ierr)
! ...
- Writes values of protected runtime variables to a checkpoint file of requested level.
DEFINITION
subroutine FTI_Checkpoint ( id_F, level, err )
ARGUMENTS
Variable | What for? | |
---|---|---|
integer id_F |
IN | Unique checkpoint ID |
integer level |
IN | Checkpoint level (1=L1, 2=L2, 3=L3, 4=L4) |
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
err | Reason |
---|---|
FTI_DONE |
Success |
FTI_NSCS |
Failure |
DESCRIPTION
FTI_Checkpoint
is used to store the current values of protected variables into a
checkpoint of safety level level
(see Multilevel-Checkpointing for descritions of the particular levels).
EXAMPLE
The handling is identical to the C case, except that in Fortran it is a subroutine and not a function, hence:
! ...
!> LEVEL 2 CHECKPOINT, ID = 1
call FTI_Checkpoint(1, 2, err)
! ...
- Delivers the variable size in Bytes of a protected variable. The returned size is consistent to the FTI state, i.e. it might differ to the current variable size in the execution.
DEFINITION
subroutine FTI_GetStoredSize ( id_F, size_F )
ARGUMENTS
Variable | What for? | |
---|---|---|
integer id_F |
IN | Unique variable ID |
integer size_F |
OUT | Size of protected variable |
ERROR HANDLING
size | Reason |
---|---|
> 0 |
Success |
0 |
No size saved |
DESCRIPTION
FTI_GetStoredSize
returns the size of a protected variable with id
from the FTI metadata. The result may differ from the size of the variable known to the application at that moment. If the function is called on a restart, it returns the size stored in the metadata file. Called during the execution, it returns the value stored in the FTI runtime metadata, i.e. the size of the variable at the moment of the last checkpoint.
The function is needed to manually reallocate memory for protected variables with variable size on a recovery. Another possibility for the reallocation of memory is provided by FTI_Realloc.
EXAMPLE
! ...
integer, parameter :: N1=128*1024*25 !> 25 MB / Process
integer, parameter :: N2=128*1024*50 !> 50 MB / Process
integer :: varSizeMeta
integer, target :: FTI_COMM_WORLD
integer :: ierr, status
real(dp), dimension(:), pointer :: arr
real(dp), dimension(:), pointer :: tmp
allocate(arr(N1))
!> INITIALIZE MPI AND FTI
call MPI_Init(ierr)
FTI_COMM_WORLD = MPI_COMM_WORLD
call FTI_Init('config.fti', FTI_COMM_WORLD, ierr)
!> PROTECT DATA AND ITS SHAPE
call FTI_Protect(0, arr, ierr)
call FTI_Status(status)
!> EXECUTE ON RESTART
if ( status .eq. 1 ) then
!> REALLOCATE TO SIZE AT CHECKPOINT
call FTI_GetStoredSize(0, varSizeMeta)
if(varSizeMeta .ne. sizeof(arr)) then
deallocate(arr)
allocate(arr(varSizeMeta))
call FTI_Protect(0, arr, ierr) ! necessary to pass new address
end if
call MPI_Barrier(FTI_COMM_WORLD, ierr)
call FTI_recover(ierr)
! ...
end if
! ...
!> FIRST CHECKPOINT
call FTI_Checkpoint(1, 1, ierr)
! ...
!> CHANGE ARRAY DIMENSION
allocate(tmp(N2))
tmp(1:N11) = arr
deallocate(arr)
arr => tmp
!> TELL FTI ABOUT THE NEW DIMENSION
call FTI_Protect(0, arr, ierr)
! ...
!> SECOND CHECKPOINT
call FTI_Checkpoint(2,1, ierr)
! ...
- Provides the reallocation of memory on FTI API side for protected variables upon a restart.
DEFINITION
subroutine FTI_Realloc ( id, data, err ) !> For intrinsic data-types
subroutine FTI_Realloc ( id, data_ptr, err ) !> For derived data-types
ARGUMENTS (intrinsic types)
Variable | What for? | |
---|---|---|
integer id |
IN | Unique ID of the variable to protect |
Fortran type, pointer data |
IN/OUT | Pointer to memory address of variable |
integer err |
OUT | Token for FTI error code. |
ARGUMENTS (derived types)
Variable | What for? | |
---|---|---|
integer id |
IN | Unique ID of the variable to protect |
type(c_ptr) data_ptr |
IN/OUT | Pointer to memory address of variable |
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
err | Reason |
---|---|
FTI_SCES |
Success |
FTI_NSCS |
No success |
DESCRIPTION
FTI_Realloc
is called for protected variables with dynamic size on recovery. It reallocates sufficient memory to store the checkpoint data to the pointed memory address. It must be called before FTI_Recover to prevent segmentation faults. If the reallocation must/is wanted to be done within the application, FTI provides the function FTI_GetStoredSize to request the variable size of the
checkpoint to recover.
EXAMPLE
For intrinsic data-types:
! ...
integer, parameter :: N1=128*1024*25 !> 25 MB / Process
integer, parameter :: N2=128*1024*50 !> 50 MB / Process
integer, parameter :: N11 = 128
integer, parameter :: N12 = 1024
integer, parameter :: N13 = 25
integer, parameter :: N21 = 128
integer, parameter :: N22 = 1024
integer, parameter :: N23 = 50
integer, target :: FTI_COMM_WORLD
integer :: ierr, status
real(dp), dimension(:,:,:), pointer :: arr
type(c_ptr) :: arr_c_ptr
real(dp), dimension(:,:,:), pointer :: tmp
integer, dimension(:), pointer :: shape
allocate(arr(N11,N12,N13))
allocate(shape(3))
!> INITIALIZE MPI AND FTI
call MPI_Init(ierr)
FTI_COMM_WORLD = MPI_COMM_WORLD
call FTI_Init('config.fti', FTI_COMM_WORLD, ierr)
!> PROTECT DATA AND ITS SHAPE
call FTI_Protect(0, arr, ierr)
call FTI_Protect(1, shape, ierr)
call FTI_Status(status)
!> EXECUTE ON RESTART
if ( status .eq. 1 ) then
!> REALLOCATE TO SIZE AT CHECKPOINT
arr_c_ptr = c_loc(arr(1,1,1))
call FTI_Realloc(0, arr_c_ptr, ierr)
call FTI_recover(ierr)
!> RESHAPE ARRAY
call c_f_pointer(arr_c_ptr, arr, shape)
call FTI_Realloc(0, arr, ierr)
! ...
end if
! ...
!> FIRST CHECKPOINT
call FTI_Checkpoint(1, 1, ierr)
! ...
!> CHANGE ARRAY DIMENSION
!> AND STORE IN SHAPE ARRAY
shape = [N21,N22,N23]
allocate(tmp(N21,N22,N23))
tmp(1:N11,1:N12,1:N13) = arr
deallocate(arr)
arr => tmp
!> TELL FTI ABOUT THE NEW DIMENSION
call FTI_Protect(0, arr, ierr)
! ...
!> SECOND CHECKPOINT
call FTI_Checkpoint(2,1, ierr)
! ...
For derived data-types:
! ...
use iso_c_binding
! ...
type polar
real :: radius
real :: phi
end type
type(FTI_Type) :: FTI_Polar
integer, parameter :: N1=128*102*25 !> 25 MB / Process
integer, parameter :: N2=128*102*50 !> 50 MB / Process
integer, parameter :: N11 = 128
integer, parameter :: N12 = 102
integer, parameter :: N13 = 25
integer, parameter :: N21 = 128
integer, parameter :: N22 = 102
integer, parameter :: N23 = 50
integer, target :: FTI_COMM_WORLD
integer :: ierr, status
type(polar), dimension(:,:,:), pointer :: arr
type(c_ptr) :: arr_c_ptr
type(polar), dimension(:,:,:), pointer :: tmp
integer, dimension(:), pointer :: shape
allocate(arr(N11,N12,N13))
allocate(shape(3))
!> INITIALIZE C POINTER
arr_c_ptr = c_loc( arr( &
lbound(arr,1), &
lbound(arr,2), &
lbound(arr,3)))
! ...
!> PROTECT DATA AND ITS SHAPE
call FTI_Protect(0, arr_c_ptr, size(arr), FTI_Polar, ierr)
call FTI_Protect(1, shape, ierr)
call FTI_Status(status)
!> EXECUTE ON RESTART
if ( status .eq. 1 ) then
!> REALLOCATE TO SIZE AT CHECKPOINT
call FTI_Realloc(0, arr_c_ptr, ierr)
call FTI_recover(ierr)
!> RESHAPE ARRAY
call c_f_pointer(arr_c_ptr, arr, shape)
! ...
end if
! ...
!> FIRST CHECKPOINT
call FTI_Checkpoint(1, 1, ierr)
! ...
!> CHANGE ARRAY DIMENSION
!> AND STORE IN SHAPE ARRAY
shape = [N21,N22,N23]
allocate(tmp(N21,N22,N23))
tmp(1:N11,1:N12,1:N13) = arr
deallocate(arr)
arr => tmp
! ...
!> UPDATE C POINTER BEFORE CALL TO 'FTI_Protect'
arr_c_ptr = c_loc( arr( &
lbound(arr,1), &
lbound(arr,2), &
lbound(arr,3)))
!> TELL FTI ABOUT THE NEW DIMENSION
call FTI_Protect(0, arr_c_ptr, size(arr), FTI_Polar, ierr)
! ...
!> SECOND CHECKPOINT
call FTI_Checkpoint(2,1, ierr)
! ...
- Returns the current status of the recovery flag.
DEFINITION
subroutine FTI_Status ( status )
ARGUMENTS
Variable | What for? | |
---|---|---|
integer status |
OUT | Token for status flag. |
OUTPUT
Value | Reason |
---|---|
0 |
No checkpoints taken yet or recovered successfully |
1 |
At least one checkpoint is taken. If execution fails, the next start will be a restart |
2 |
The execution is a restart from checkpoint level L4 and keep_last_checkpoint was enabled during the last execution |
DESCRIPTION
FTI_Status
returns the current status of the recovery flag.
EXAMPLE
call FTI_Status(status)
!> EXECUTE ON RESTART
if ( status .eq. 1 ) then
! ...
call FTI_recover(ierr)
! ...
end if
- Recovers the data of the protected variables from the checkpoint file.
DEFINITION
subroutine FTI_Recover ( err )
ARGUMENTS
Variable | What for? | |
---|---|---|
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
Value | Reason |
---|---|
FTI_SCES |
Success |
FTI_NSCS |
Failure |
DESCRIPTION
FTI_Recover
loads the data from the checkpoint file to the protected variables. It only recovers variables which are protected by a preceeding call to FTI_Protect. If a variable changes its size during execution, the proper amount of memory has to be allocated for that variable before the call to FTI_Recover
. FTI provides the API functions
FTI_GetStoredSize and
FTI_Realloc for this case.
EXAMPLE
see example of FTI_Status.
- Invokes the recovery of protected variables on a restart.
- Writes multilevel checkpoints regarding their requested frequencies during execution.
DEFINITION
subroutine FTI_Snapshot ( err )
ARGUMENTS
Variable | What for? | |
---|---|---|
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
Value | Reason |
---|---|
FTI_SCES |
Successfull call (without checkpointing) or if recovery successful |
FTI_NSCS |
Failure of FTI_Checkpoint
|
FTI_DONE |
Success of FTI_Checkpoint
|
FTI_NREC |
Failure on recovery |
DESCRIPTION
On a restart, FTI_Snapshot
loads the data from the checkpoint file to the protected variables. During execution it performs checkpoints according to the checkpoint frequencies for the various safety levels. The frequencies may be set in the configuration file (see e.g.: ckpt_L1).
FTI_Snapshot
can only take care of variables which are protected by a preceding call to FTI_Protect.
EXAMPLE
! ...
ptriter => iter
call FTI_Protect(0, ptriter, err)
call FTI_Protect(2, g, err)
call FTI_Protect(1, h, err)
do iter = 1, ITER_TIMES
call FTI_Snapshot(err)
call doWork(nbProcs, rank, g, h, localerror)
! ...
enddo
if ( rank == 0 ) then
print '("Execution finished in ",F9.0," seconds.")', MPI_Wtime() - wtime
endif
! ...
- Frees the allocated memory.
- Communicates the end of the execution to dedicated threads.
- Cleans checkpoints and metadata.
DEFINITION
subroutine FTI_Finalize ( err )
ARGUMENTS
Variable | What for? | |
---|---|---|
integer err |
OUT | Token for FTI error code. |
ERROR HANDLING
Value | Reason |
---|---|
FTI_SCES |
For application process |
exit(0) |
For FTI process (only if head == 1) |
DESCRIPTION
FTI_Finalize
notifies the FTI processes that the execution is over, frees
FTI internal data structures and it performs a clean up of the checkpoint folders at a normal execution. If the setting keep_last_ckpt
is set, it flushes local checkpoint files (if present) to the PFS. If the setting head
is set to 1, it will also terminate the FTI processes. It should be called before MPI_Finalize()
.
EXAMPLE
! ...
deallocate(h)
deallocate(g)
call FTI_Finalize(err)
call MPI_Finalize(err)
! ...