Backend overview
NDSL has multiple backends, each serving different purposes:
- Debug backend: easy to read python code. Playground for new DSL features. Really, really slow.
- Numpy backend: "fast" python-based backend. Playground for new science code.
- DaCe backends: performance backends for running at scale in production. Has the ability to optimize code in between stencils, see orchestration.
Note: DaCe backends is plural because that backend can either generate code that targets CPUs or GPUs.
---
title: GT4Py IR overview
---
flowchart LR
stencil["`
Stencil code
(NDSL)
`"]
defir["Definition IR"]
gtir["GT script IR"]
oir["`
Optimization IR
(per stencil)
`"]
debug["Debug backend"]
numpy["Numpy backend"]
dace["DaCe backends"]
stencil --> defir --> gtir --> oir
oir --> debug
oir --> numpy
oir --> dace
DaCe backend details
The DaCe backend workflow cleanly separates scheduling choices (i.e. macro-level optimizations like loop order, kernel merges, ...) from DaCe dataflow optimization.
---
title: DaCe backend workflow
---
flowchart LR
oir["`
OIR
(GT4Py)
`"]
stree["`
Schedule Tree
(DaCe)
`"]
sdfg["`SDFG`"]
cpu["`
CPU
codegen
`"]
gpu["`
GPU
codegen
`"]
oir --> stree --> sdfg
sdfg --> cpu
sdfg --> gpu
All scheduling choices happen in the schedule tree. After the schedule is fixed, we create a Stateful Dataflow multiGraphs (SDFG), DaCe's representation for data-driven optimizations. Code generation (to multiple targets) follows from the SDFG.
Schedule tree
- current state of the feature
- choice of representation
- choice of DaCe version to work against (for the first version)
- choice of integration point (for the first version)
Orchestration
DaCe backends can either optimize per-stencil or do whole program optimization through orchestration. Orchestration is a system that bring GT4Py stencils and regular python code together. This opens up the potential for whole program optimization. Orchestration enables the most potent wide-context optimizations. Details and Limitations.
Repositories
Backend work is split into multiple repositories, each having their quirks to work with. In particular,
- NDSL - Pulls everything together and exposes "the DSL". In particular, orchestration code lives in this repository.
- GT4Py - The component that defines the frontend, has all the intermediate representations (IR) and dispatches to multiple backends.
- DaCe - Data-driven optimization framework used in GT4Py as performance backend. Full program optimizer driving orchestration.
NDSL is then used in the following repositories: