Using → Adding GAMESS

AKRR: Deployment of GAMESS-US Application Kernel on a Resource

GAMESS is an ab initio computational chemistry software package developed by Professor Mark Gordon’s research group at Iowa State University.

GAMESS applications kernel can be used to monitory the performance of system-wide GAMESS installation.

It is also linked against linear algebra library and can use file-system for cache storage. Thus it also can work as a probe to monitor performance of these items.

If there is nobody uses GAMESS on your site you can skip it. However, unlike benchmarks (HPCC, IOR, IMB) which stress a particular subsystem GAMESS application kernel works more like an integrated test and the affects on it can be much milder.

For further convenience of application kernel deployment lets define APPKER and RESOURCE environment variable which will contain the HPC resource name:

export RESOURCE=<resource_name>
export APPKER=gamess

Install Application or Identify where it is Installed

GAMESS is very often installed system-wide. One of the purposes of application kernels is to monitor the performance of application which is used by regular users. If GAMESS is not installed then you might need to install this application on your resource or opt to not use it.

Majority of HPC resources utilize some kind of module system. Execute it and see is GAMESS already installed

On resource

module avail gamess
------------------------- /util/academic/modulefiles/Core ------------------------------------
   gamess/5dec2014R1-ddi    gamess/5dec2014R1    gamess/11Nov2017R3    gamess/18Aug2016R1 (D)

We’ll use gamess/11Nov2017R3. Running GAMESS is somewhat different from other application. We need to find binary location and its version. module show whould have some information

module show gamess/11Nov2017R3
whatis(" GAMESS")
prepend_path("PATH","/util/academic/gamess/11Nov2017R3/impi/gamess")
load("intel/18.1")
load("intel-mpi/2018.1")
load("mkl/2018.1")

By checking path prepended directory:

ls /util/academic/gamess/11Nov2017R3/impi/gamess

we can find executable is gamess.01.x and it lives in /util/academic/gamess/11Nov2017R3/impi/gamess and version number is 01 (whatever characters between dots in gamess.01.x).

GAMESS is started with rungms script and it needed to be changed almost always, usually for scratch locations. GAMESS stores there final results and in case of rerun (which is AKRR intention) it would not run.

Copy rungms to app kernel directory on remote resource:

# AKRR_APPKER_DIR should be initiated in .bashrc and placed there during initial deployment on resource
cd $AKRR_APPKER_DIR/execs
mkdir gamess
cd gamess
cp /util/academic/gamess/11Nov2017R3/impi/gamess/rungms ./

Set SCR and USERSCR to pwd in your copy of rungms (located in the begining of script):

set SCR=`pwd`
set USERSCR=`pwd`

AKRR starts GAMESS from temporary directory and it is deleted after downloading results.

Generate Initiate Configuration File

Generate Initiate Configuration File:

On AKRR server

akrr app add -a $APPKER -r $RESOURCE

Sample output:

[INFO] Generating application kernel configuration for gamess on ub-hpc
[INFO] Application kernel configuration for gamess on ub-hpc is in: 
        /home/akrruser/akrr/etc/resources/ub-hpc/gamess.app.conf

Edit Configuration File

Below is listing of example configuration file located at ~/akrr/etc/resources/$RESOURCE/gamess.app.conf

~/akrr/etc/resources/$RESOURCE/gamess.app.conf

appkernel_run_env_template = """
# Load application enviroment
module load gamess
module list

# set executable location
VERNO=01
EXE=$GAMESS_DIR/gamess.$VERNO.x

# set how to run app kernel
RUN_APPKERNEL="$AKRR_APPKER_DIR/execs/gamess/rungms $INPUT $VERNO $AKRR_CORES"
"""

It contain only one parameter appkernel_run_env_template which need to be edited:

1) First part is “Load application environment”, here you need to set proper enviroment. For example:

# Load application environment
module load gamess/11Nov2017R3
module list

2) Second part is “set executable location”, it set the location of executables absolute path to gamess should be placed to EXE variable (application signature will be calculated for that executable) as well as version (VERNO). Use values which we find earlier. For example:

# set executable location
VERNO=01
EXE=/util/academic/gamess/11Nov2017R3/impi/gamess/gamess.$VERNO.x

3)Fourth part is “set how to run app kernel”, it set RUN_APPKERNEL, which specify how to execute GAMESS:

#Set how to ran app kernel
RUN_APPKERNEL="$AKRR_APPKER_DIR/execs/gamess/rungms $INPUT $VERNO $AKRR_NODES $AKRR_CORES_PER_NODE"

Most likely you don’t need to modify anything here, just ensure that rungms refer to your version which was modified earlied..

Generate Batch Job Script and Execute it Manually (Optional) 

The purpose of this step is to ensure that the configuration lead to correct workable batch job script. Here, at first batch job script is generated with ‘akrr_ctl.sh batch_job’. Then this script is executed in interactive session (this improves the turn-around in case of errors). If script fails to execute, the issues can be fixed first in that script itself and then merged to configuration file.

This step is somewhat optional because it is very similar to next step. However the opportunity to work in interactive session improve turn-around time because there is no need to stay in queue for each iteration.

First generate the script to standard output and examine it:

akrr task new --dry-run --gen-batch-job-only -n 2 -r $RESOURCE -a $APPKER

Portion of “akrr task new –dry-run –gen-batch-job-only -n 2 -r $RESOURCE -a $APPKER” output showing generated batch script

#!/bin/bash
#SBATCH --partition=general-compute 
#SBATCH --qos=general-compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --time=00:20:00
#SBATCH --output=/projects/ccrstaff/general/nikolays/huey/akrr_data/gamess/2019.04.29.20.03.58.405597/stdout
#SBATCH --error=/projects/ccrstaff/general/nikolays/huey/akrr_data/gamess/2019.04.29.20.03.58.405597/stderr
#SBATCH --constraint="CPU-L5520"
#SBATCH --exclusive


#Common commands
export AKRR_NODES=2
export AKRR_CORES=16
export AKRR_CORES_PER_NODE=8
export AKRR_NETWORK_SCRATCH="/projects/ccrstaff/general/nikolays/huey/tmp"
export AKRR_LOCAL_SCRATCH="/tmp"
export AKRR_TASK_WORKDIR="/projects/ccrstaff/general/nikolays/huey/akrr_data/gamess/2019.04.29.20.03.58.405597"
export AKRR_APPKER_DIR="/projects/ccrstaff/general/nikolays/huey/appker"
export AKRR_AKRR_DIR="/projects/ccrstaff/general/nikolays/huey/akrr_data"

export AKRR_APPKER_NAME="gamess"
export AKRR_RESOURCE_NAME="ub-hpc"
export AKRR_TIMESTAMP="2019.04.29.20.03.58.405597"
export AKRR_APP_STDOUT_FILE="$AKRR_TASK_WORKDIR/appstdout"

export AKRR_APPKERNEL_INPUT="/projects/ccrstaff/general/nikolays/huey/appker/inputs/gamess/c8h10-cct-mp2.inp"
export AKRR_APPKERNEL_EXECUTABLE="/projects/ccrstaff/general/nikolays/huey/appker/execs"

source "$AKRR_APPKER_DIR/execs/bin/akrr_util.bash"

#Populate list of nodes per MPI process
export AKRR_NODELIST=`srun -l --ntasks-per-node=$AKRR_CORES_PER_NODE -n $AKRR_CORES hostname -s|sort -n| awk '{printf "%s ",$2}' `

export PATH="$AKRR_APPKER_DIR/execs/bin:$PATH"

cd "$AKRR_TASK_WORKDIR"

#run common tests
akrr_perform_common_tests

#Write some info to gen.info, JSON-Like file
akrr_write_to_gen_info "start_time" "`date`"
akrr_write_to_gen_info "node_list" "$AKRR_NODELIST"


#create working dir
export AKRR_TMP_WORKDIR=`mktemp -d /projects/ccrstaff/general/nikolays/huey/tmp/gamess.XXXXXXXXX`
echo "Temporary working directory: $AKRR_TMP_WORKDIR"
cd $AKRR_TMP_WORKDIR

#Copy inputs
cp /projects/ccrstaff/general/nikolays/huey/appker/inputs/gamess/c8h10-cct-mp2.inp ./
INPUT=$(echo /projects/ccrstaff/general/nikolays/huey/appker/inputs/gamess/c8h10-cct-mp2.inp | xargs basename )



#Load application enviroment
module load gamess/11Nov2017R3
module list

#set executable location
VERNO=01
EXE=/util/academic/gamess/11Nov2017R3/impi/gamess/gamess.$VERNO.x

#set how to run app kernel
RUN_APPKERNEL="$AKRR_APPKER_DIR/execs/gamess/rungms $INPUT $VERNO $AKRR_NODES $AKRR_CORES_PER_NODE"


#Generate AppKer signature
appsigcheck.sh $EXE $AKRR_TASK_WORKDIR/.. > $AKRR_APP_STDOUT_FILE


ATTEMPTS_TO_LAUNCH=0
while ! grep -q "EXECUTION OF GAMESS TERMINATED NORMALLY" $AKRR_APP_STDOUT_FILE
do
    echo "Attempt to launch GAMESS: $ATTEMPTS_TO_LAUNCH" >> $AKRR_APP_STDOUT_FILE 2>&1
    echo "Attempt to launch GAMESS: $ATTEMPTS_TO_LAUNCH"
    rm -rf *
    mkdir scr
    mkdir supout
    cp /projects/ccrstaff/general/nikolays/huey/appker/inputs/gamess/c8h10-cct-mp2.inp ./
    $RUN_APPKERNEL >> $AKRR_APP_STDOUT_FILE 2>&1
    
    if [ "$ATTEMPTS_TO_LAUNCH" -ge 6 ]; then
        break
    fi
    
    ((ATTEMPTS_TO_LAUNCH++))
done
akrr_write_to_gen_info "attemptsToLaunch" "$ATTEMPTS_TO_LAUNCH"
echo "Total attempt to launch GAMESS is $ATTEMPTS_TO_LAUNCH"




#clean-up
cd $AKRR_TASK_WORKDIR
if [ "${AKRR_DEBUG=no}" = "no" ]
then
        echo "Deleting temporary files"
        rm -rf $AKRR_TMP_WORKDIR
else
        echo "Copying temporary files"
        cp -r $AKRR_TMP_WORKDIR workdir
        rm -rf $AKRR_TMP_WORKDIR
fi



akrr_write_to_gen_info "end_time" "`date`"

Next generate the script on resource:

akrr task new --gen-batch-job-only -n 2 -r $RESOURCE -a $APPKER
[INFO] Creating task directory: /home/akrruser/akrr/log/data/ub-hpc/gamess/2019.04.29.20.04.39.697226
[INFO] Creating task directories: 
        /home/akrruser/akrr/log/data/ub-hpc/gamess/2019.04.29.20.04.39.697226/jobfiles
        /home/akrruser/akrr/log/data/ub-hpc/gamess/2019.04.29.20.04.39.697226/proc
[INFO] Creating batch job script and submitting it to remote machine
[INFO] Directory huey:/projects/ccrstaff/general/nikolays/huey/akrr_data/gamess does not exists, will try to create it
[INFO] Directory huey:/projects/ccrstaff/general/nikolays/huey/akrr_data/gamess/2019.04.29.20.04.39.697226 does not exists, will try to create it
[INFO] auto_walltime_limit is on, trying to estimate walltime limit...
[INFO] There are only 0 previous run, need at least 5 for walltime limit autoset
[INFO] Local copy of batch job script is /home/akrruser/akrr/log/data/ub-hpc/gamess/2019.04.29.20.04.39.697226/jobfiles/gamess.job

[INFO] Application kernel working directory on ub-hpc is /projects/ccrstaff/general/nikolays/huey/akrr_data/gamess/2019.04.29.20.04.39.697226
[INFO] Batch job script location on ub-hpc is /projects/ccrstaff/general/nikolays/huey/akrr_data/gamess/2019.04.29.20.04.39.697226/gamess.job

The output contains the working directory for this task on remote resource. On remote resource get to that directory and start interactive session (request same number of nodes, in example above the script was generated for 2 nodes).

On remote resource

#get to working directory
cd /projects/ccrstaff/general/nikolays/huey/akrr_data/gamess/2019.04.29.20.04.39.697226
#check hpcc.job is there
ls
#start interactive session
salloc --nodes=2 --ntasks-per-node=8 --time=01:00:00 --exclusive --constraint="CPU-L5520"
#wait till you get access to interactive session
#run ior application kernel
bash gamess.job

# or submit as normal batch script
sbatch gamess.job

#examine output
cat appstdout

Examine appstdout file, which contains application kernel output (appstdout sample). If it looks ok you can move to the next step

Perform Validation Run

On this step appkernel_validation.py utility is used to validate application kernel installation on particular resource. It execute application kernel and analyses its’ results. If it fails the problems need to be fixed and another round of validation should be performed.

akrr app validate -n 2 -r $RESOURCE -a $APPKER 

See validation output sample

Schedule regular execution of application kernel.

Now this application kernel can be submitted for regular execution:

Perform a test run on all nodes count

#Perform a test run on all nodes count
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8

#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1

# Run on all nodes count 20 times (default number of runs to establish baseline)
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20

see Scheduling and Rescheduling Application Kernels and  Setup Walltime Limit for more details.

Up: Deployment of Application Kernels on Resource