Running GEOS on Prism's GH200
Basics
- Go to
gh
nodessh
intoprism
ssh
intogpulogin1
salloc
withsalloc -p grace --nodes=1 --gpus-per-node=1
- Use preset modules at `/explore/nobackup/people/fgdeconi/work/modules/module
⚠️ Always salloc
on a GH node - login nodes are x86 but GH are ARM64 ⚠️
⚠️ Loads module after ssh in the GH box! ⚠️
Building GEOS
Module dsl/build_geos
loads all required modules for building GEOS:
- Compilers are gcc/g++/gfortran
nvidia
suite is not loaded on purpose, or else thempicc/c++/f90
will default to Nvidia's compiler which are not capable of compiling GEOS
Running GEOS
Once GEOS is built, running GEOS + NDSL requires to fiddle with the compilers.
First we load & install the stack w/ proper nvidia support:
- load
nvidia/12.8
: your default compilers are now theNVHPC
suite (nvc/nvc++/nvcc). Those are faulty for C/C++, we will need to enforce GNU - load/create a conda environment with python
3.9
. Once the stack is installed, re-installed MPI4Py with the following
# To link the correct MPI to be linked in
CC=nvc CFLAGS="-noswitcherror" pip install --force --no-cache-dir --no-binary=mpi4py mpi4py
We can then run GEOS with the following env modifications:
- In your
gcm_run.j
(or before running) export as follows to force compiler & fix UCX interface
For GPU backends
#CSH-style
setenv CC gcc # restore C compiler to GNU
setenv CXX g++ # restore C++ compiler to GNU
setenv CUDA_HOST_CXX g++ # force nvcc host compiler to GNU (GT4Py specific)
setenv CUDA_HOME $NVHPC_ROOT/cuda # help GT4Py found the nvcc binary
setenv UCX_NET_DEVICES mlx5_2:1
For CPU backends, the same but comment out the CUDA_HOST_CXX
.
To run nsys
with a low overhead you can prefix the mpirun
call with
nsys profile --output report_%h_%p.nsys-rep \ # Output file name, unique
--trace="cuda,nvtx" \ # Trace only the cuda and nvtx event
--cuda-event-trace=false \ # Deactivate to reduce overhead
--sample=none \ # No CPU sampling (overhead--)
--cpuctxsw=none \ # No process switch tracking (overhead--)
--stats=true \ # Optional, bigger files but print some stats
Common issues
-
I hit a
@GLIBCXX_3.4.32 cannot load library
error: your GCC module is the x86 one, you need to reloadmodule reload gcc/14.2.0
-
I hit a
--ccbin unknown options
: you haveCUDA_HOST_CXX
set in your env, and GT4Py is misdirecting the GPU host linker flag onto GCC. Unset variable.
GH200 Timings on C180 Problem
This is the timing breakdown within the GEOS run region executing Fortran-only code.
--Run 1 30131.367 99.87 1.082 0.00
----EXTDATA 1011 0.050 0.00 0.050 0.00
----GCM 2022 29574.395 98.02 3.537 0.01
------AGCM 2022 29561.707 97.98 193.294 0.64
--------SUPERDYNAMICS 3033 8878.824 29.43 11.887 0.04
----------DYN 3033 8866.937 29.39 8866.937 29.39
--------PHYSICS 2022 20489.447 67.91 176.392 0.58
----------GWD 2022 331.073 1.10 331.073 1.10
----------MOIST 2022 7782.298 25.79 7782.298 25.79
----------TURBULENCE 3033 1054.626 3.50 1054.626 3.50
----------CHEMISTRY 3033 1505.943 4.99 2.289 0.01
------------CHEMENV 3033 124.704 0.41 124.704 0.41
------------HEMCO 3033 188.383 0.62 188.383 0.62
------------PCHEM 2022 424.569 1.41 424.569 1.41
------------ACHEM 2022 106.343 0.35 106.343 0.35
------------GOCART 3033 38.127 0.13 38.127 0.13
------------GOCART2G 3033 29.287 0.10 0.536 0.00
--------------DU.data 2022 7.089 0.02 7.089 0.02
--------------SS.data 2022 7.081 0.02 7.081 0.02
--------------CA.oc.data 2022 2.933 0.01 2.933 0.01
--------------CA.bc.data 2022 2.915 0.01 2.915 0.01
--------------CA.br.data 2022 2.914 0.01 2.914 0.01
--------------SU.data 2022 1.510 0.01 1.510 0.01
--------------NI.data 2022 4.309 0.01 4.309 0.01
------------TR 2022 592.240 1.96 592.240 1.96
----------SURFACE 3033 738.635 2.45 83.314 0.28
------------SALTWATER 3033 37.756 0.13 3.160 0.01
--------------SEAICETHERMO 3033 11.346 0.04 11.346 0.04
--------------OPENWATER 3033 23.251 0.08 23.251 0.08
------------LAKE 3033 5.131 0.02 5.131 0.02
------------LANDICE 3033 1.075 0.00 1.075 0.00
------------LAND 3033 611.358 2.03 0.337 0.00
--------------VEGDYN 2022 5.277 0.02 5.277 0.02
--------------CATCH 3033 605.743 2.01 605.743 2.01
----------RADIATION 2022 8900.481 29.50 28.077 0.09
------------SOLAR 2022 4970.295 16.47 4970.295 16.47
------------IRRAD 2022 3407.428 11.29 3407.428 11.29
------------SATSIM 2022 494.682 1.64 494.682 1.64
--------ORBIT 2022 0.142 0.00 0.142 0.00
------AIAU 1011 0.057 0.00 0.057 0.00
------ADFI 2022 0.247 0.00 0.247 0.00
------OGCM 2022 8.847 0.03 3.462 0.01
--------ORAD 2022 0.782 0.00 0.782 0.00
--------SEAICE 2022 1.151 0.00 0.390 0.00
----------DATASEAICE 2022 0.761 0.00 0.761 0.00
--------OCEAN 2022 3.451 0.01 0.862 0.00
----------DATASEA 2022 2.589 0.01 2.589 0.01
----HIST 2022 555.841 1.84 555.841 1.84