Running GEOS on Prism's GH200

Basics

Go to gh node
- ssh into prism
- ssh into gpulogin1
- salloc with salloc -p grace --nodes=1 --gpus-per-node=1
Use preset modules at `/explore/nobackup/people/fgdeconi/work/modules/module

⚠️ Always salloc on a GH node - login nodes are x86 but GH are ARM64 ⚠️ ⚠️ Loads module after ssh in the GH box! ⚠️

Building GEOS

Module dsl/build_geos loads all required modules for building GEOS:

Compilers are gcc/g++/gfortran
nvidia suite is not loaded on purpose, or else the mpicc/c++/f90 will default to Nvidia's compiler which are not capable of compiling GEOS

Running GEOS

Once GEOS is built, running GEOS + NDSL requires to fiddle with the compilers.

First we load & install the stack w/ proper nvidia support:

load nvidia/12.8: your default compilers are now the NVHPC suite (nvc/nvc++/nvcc). Those are faulty for C/C++, we will need to enforce GNU
load/create a conda environment with python 3.9. Once the stack is installed, re-installed MPI4Py with the following

# To link the correct MPI to be linked in
CC=nvc CFLAGS="-noswitcherror" pip install --force --no-cache-dir --no-binary=mpi4py mpi4py

We can then run GEOS with the following env modifications:

In your gcm_run.j (or before running) export as follows to force compiler & fix UCX interface

For GPU backends

#CSH-style
setenv CC gcc # restore C compiler to GNU
setenv CXX g++ # restore C++ compiler to GNU
setenv CUDA_HOST_CXX g++ # force nvcc host compiler to GNU (GT4Py specific)
setenv CUDA_HOME $NVHPC_ROOT/cuda # help GT4Py found the nvcc binary
setenv UCX_NET_DEVICES mlx5_2:1

For CPU backends, the same but comment out the CUDA_HOST_CXX.

To run nsys with a low overhead you can prefix the mpirun call with

nsys profile --output report_%h_%p.nsys-rep \ # Output file name, unique
             --trace="cuda,nvtx" \ # Trace only the cuda and nvtx event
             --cuda-event-trace=false \ # Deactivate to reduce overhead
             --sample=none \ # No CPU sampling (overhead--)
             --cpuctxsw=none \ # No process switch tracking (overhead--)
             --stats=true \ # Optional, bigger files but print some stats

Common issues

I hit a @GLIBCXX_3.4.32 cannot load library error: your GCC module is the x86 one, you need to reload module reload gcc/14.2.0
I hit a --ccbin unknown options: you have CUDA_HOST_CXX set in your env, and GT4Py is misdirecting the GPU host linker flag onto GCC. Unset variable.