Skip to content

NDSL powered GEOS: CPU & GPU validation & benchmark

TL;DR

Bug

A one-liner of bench & validation results

Table of results

Science Code Numerical Validation(C24) Scientific Validation(C180) CPU Benchmark (C180) GPU Benchmark (C180) GPU Benchmark (C720)
Dynamics (Pace) ❌: details
Dynamics (GEOS v11.4.2)
Moist - Microphysics
Moist - Shallow Convection

C24 resolutions have 72 levels. All other resolutions have 137 levels.

Methodology

Bug

Expand

Validation

Numerical validation:

Numerical validation is done comparing a single time step between Fortran and NDSL on the CPU. It was the base of the porting.

  • Both codes run with -O0, e.g., with compiler optimization turned off.
  • A multi-modal metric is used for measuring difference that combines absolute and relative differences, and ULP measurement.
  • Differences are expected (compilers, different codes) and reasonable thresholding will be used.

Scientific validation:

Scientific validation is done by comparing the results of 7 days of GEOS runs.

See more details and discussion in the overview

Benchmark

Benchmark is done both on CPU and GPU at C180 L137. To showcase the difference in device bandwidth, we also run GPU on C720 L137.

Benchmark are done online in GEOS but measure several performance which are all interconnected

Hardware

NCCS's Discover A100 partition, referred as "Discover", per node:

  • 4x A100 GPUs – 40 GB (released 2021)
  • 1x EPYC 7402 – 96 cores (released 2020)
  • Dual HDR Infiniband 2x200 Gbps

NCCS's PRISM GH partition, referred as "Prism GH", per node:

  • GH200: 1x H100 (96 GB HBME3) + 1 Grace (72 cores @ 2GHz- 480 GB LPPDR5) on the same die (released 2023)
  • Dual HDR Infiniband 2x100 Gbps

Software stack

Bug

TODO

Data

The "numerical validation" data (a.k.a translate test) are available on NCCS datashare:

Overviews and earlier results