HPC resources differ significantly in terms of the software stack. For example, different resources (and applications running on those resources) can utilize different compilers (gcc or icc) and MPI libraries (OpenMPI, MVAPICH, or a commercial variant). Furthermore, the same application can be compiled in a number of ways which can greatly vary in how they are executed. Therefore beside general application kernel configuration, each application kernel needs to be separately configured for each resource.
The overall strategy for deploying an application kernel to a resource is following:
First, each step will be described in general as it applies to all application kernels and the details on example individual application kernels deployment will follow.
AKRR comes with two flavors of application kernels. One is based on real-world applications and another is based on benchmarks. Real-world applications are often already installed system-wide on a resource for the use of regular users, here one of the purposes of application kernels is to monitor the performance of a standard often-used application on that resource. Benchmarks, however, are rarely installed system-wide and thus they need to be installed first.
The initial configuration file is generated.
akrr app add -r <resource_name> -a <appkernel_name>
It will generate an initial
configuration file and place it to _$AKRR_HOME/etc/resource/
The generated configuration file is fairly generic. Here you need to specify proper execution environment and specify how to execute this particular application kernel on this particular machine/resource.
The purpose of this step is to ensure that the configuration lead to a correct (and workable) batch job script. First the batch job script is generated as:
# only print batch job script
akrr task new --dry-run --gen-batch-job-only -r <resource_name> -a <appkernel_name> -n <number_of_nodes>
# generate batch job script and copy it to resource (without running it)
akrr task new --gen-batch-job-only -r <resource_name> -a <appkernel_name> -n <number_of_nodes>
Then this script is submitted manually ot executed in an interactive session (this improves the turn-around in case of errors). If the script fails to execute, the issues can be fixed first in that script itself followed by respective updates in configuration file.
This step is somewhat optional because it is very similar to the next step. However the opportunity to work in an interactive session will often improve the turn-around time because there is no need to stay in queue for each iteration.
This step validates application kernel installation on the resource.
akrr app validate -r <resource_name> -a <appkernel_name> -n <number_of_nodes>
It execute the application kernel and analyses its results. If it fails the problems need to be fixed and another round of validation should be performed
Finally, if validation was successful the application kernel can be submited for regular execution on that resource.
akrr task new -r <resource_name> -a <appkernel_name> -n <list of nodes counts> -p <periodicity> \
-s <first submit date-time>
Next: NAMD Deployment