Backend overview
NDSL has multiple backends, each serving different purposes:
- Debug backend: easy to read python code. Playground for new DSL features. Really, really slow.
 - Numpy backend: "fast" python-based backend. Playground for new science code.
 - DaCe backends: performance backends for running at scale in production. DaCe backends have the ability to do full program optimization (e.g. analyze and optimize even code in between stencils), see orchestration.
 
Note: DaCe backends is plural because that backend can either generate code that targets CPUs or GPUs.
---
title: GT4Py IR overview
---
flowchart LR
    stencil["`
        Stencil code
        (NDSL)
    `"]
    defir["Definition IR"]
    gtir["GT script IR"]
    oir["`
        Optimization IR
        (per stencil)
    `"]
    debug["Debug backend"]
    numpy["Numpy backend"]
    dace["DaCe backends"]
    stencil --> defir --> gtir --> oir
    oir --> debug
    oir --> numpy
    oir --> dace
DaCe backend details
The DaCe backend workflow cleanly separates scheduling choices (i.e. macro-level optimizations like loop order, kernel merges, ...) from DaCe dataflow optimization.
---
title: DaCe backend workflow
---
flowchart LR
    oir["`
        OIR
        (GT4Py)
    `"]
    stree["`
        Schedule Tree
        (DaCe)
    `"]
    sdfg["`SDFG`"]
    cpu["`
        CPU
        codegen
    `"]
    gpu["`
        GPU
        codegen
    `"]
    oir --> stree --> sdfg
    sdfg --> cpu
    sdfg --> gpu
All scheduling choices happen in the schedule tree. After the schedule is fixed, we create a Stateful Dataflow multiGraphs (SDFG), DaCe's representation for data-driven optimizations. Code generation (to multiple targets) follows from the SDFG.
Schedule tree
- current state of the feature
 - choice of representation
 - choice of DaCe version to work against (for the first version)
 - choice of integration point (for the first version)
 
Orchestration
DaCe backends can either optimize per-stencil or do full program optimization through orchestration. Orchestration is a system that bring GT4Py stencils and regular python code together. This opens up the potential for full program optimization. Orchestration enables the most potent wide-context optimizations. Details and Limitations.
Repositories
Backend work is split into multiple repositories, each having their quirks to work with. In particular,
- NDSL - Pulls everything together and exposes "the DSL". In particular, orchestration code lives in this repository.
 - GT4Py - The component that defines the frontend, has all the intermediate representations (IR) and dispatches to multiple backends.
 - DaCe - Data-driven optimization framework used in GT4Py as performance backend. Full program optimizer driving orchestration.
 
NDSL is then used in the following repositories: