Skip to content

SMT Knowledge Base

HPC Metrics

SMT Knowledge Base

Home
Project 2426
Project 2426
- Overview
- Milestone 1
- Milestone 2
- Milestone 3
- Milestone 4
- Results
  Results
Technical
Technical
- Overview
- The Laundry List 👕
- Dev setup
  Dev setup
- Frontend
  Frontend
  - Overview
  - NDSL features
  - Experimental "physics" features
  - ADRs
    ADRs
    
    Index
    
    Fields Bundle
- Porting
  Porting
  - Translate tests
- Backend work
  Backend work
  - Overview
  - Schedule Tree bridge
  - Coding guidelines
  - Orchestration
  - Repositories
    Repositories
    
    NDSL
    
    GT4Py
    
    DaCe
  - ADRs
    ADRs
    
    Index
    
    Schedule tree
    
    Schedule tree (DaCe version)
    
    Schedule tree (NDSL integration)
  - Archive
    Archive
    
    GT4Py/DaCe bridge via expansion
GEOS
GEOS
- Overview
- Local build
- Discover
- Validation
- Component documentation
  Component documentation
  - Overview
  - Dynamics
    Dynamics
    
    pyFV3
  - Moist
    Moist
    
    Overview
    
    GF
    
    GFDL_1M
    
    RAS
    
    UW
Tutorials
Tutorials
- Introduction
- NDSL
  NDSL
- GEOS
  GEOS
  - Introduction to GEOS
- Model Development with NDSL
  Model Development with NDSL
  - Model Development with NDSL
The Code Nebula
The Code Nebula
Satellite work
Satellite work
- Pace

HPC metrics

In collaboration with the NCCS, we are listing below a range of metrics that should be evaluated for each benchmark.

Time to solution

Despite not aiming for production-ready code by the end of the project, we will still keep an eye on the "job-level" turn around and document improvement and potential non-numerics slowdown due to the technology swap.

Energy

Light software sampling to document amplitude of TPU’s chip: imprecise but can be easily ran with little overhead.
Hardware monitoring on selected runs for precise measure: precise but requires close cooperation with NCCS sys admin and IT.

Node-to-node

Compare CPU node with GPU nodes
Minimize generation difference for valid comparison

Node usage

Chip usage: measure in % of theoretical throughput rather than FLOP/s
Chip idle time: important for hybrid work

Minimal hardware requirements

For developments
For scientific runs