Current state of schedule tree feature
This is a quick overview of the current state of the schedule tree feature.
Working branch
Work is being done in the branch romanc/stree-to-sdfg
on Roman's fork. The branch branches off from v1/maintenance
and includes Tal's work from the branch stree-to-sdfg
. For a quick overview of the changes, look at
https://github.com/spcl/dace/compare/v1/maintenance...romanc:romanc/stree-to-sdfg
Schedule tree to SDFG
In big terms, schedule tree to SDFG conversion has the following steps:
- Setup a new SDFG and initialize it's descriptor repository from the schedule tree.
- Insert artificial state boundary nodes in the schedule tree:
- Visitor on the schedule tree, translating every node into the new SDFG, see class
StreeToSDFG
. - Memlet propagation through the newly crated SDFG.
- Run
simplify()
on the newly created SDFG (optional).
Hacks and shortcuts
StreeToSDFG
has many visitors raising aNonImplementedError
. I've implemented these visitors on an as-needed basis.- I've added additional state boundaries around nested SDFGs (needed for state changes, e.g.
IfScope
, insideMapNodes
) to force correct execution order. - I've added additional state boundaries after inter-state assigns to ensure the symbols are defined before they are accessed. As far as I understand, that shouldn't be necessary. However, I've had SDFGs (todo: which ones?) with unused assigns at the end of the main visitor.
- I've written tests for some things as a way of developing the main visitor. For simple schedule trees, I've already added checks on the resulting SDFG, but pretty fast I ended up validating by "looking at the resulting SDFG".
Current issues
- Running a roundtrip for
tmp_UpdateDzD.sdfgz
(smaller) andtmp_D_SW.sdfgz
(bigger), I end up with a duplicatedt
, which is found in the symbols as well as in the arrays (as scalar). Since we copy both when building the descriptor repository (in step 1), assuming the SDFG was de-duplicated, the resulting SDFG fails validation after the roundtrip. - Running a roundtrip for
tmp_D_SW.sdfgz
, node validation fails for `g_self__column_namelist__d_con' - Missing implementation of
NView
fortmp_Fillz.sdfgz
(small) andtmp_Ray_Fast.sdfgz
(bigger). - Performance issue with
_insert_memory_dependency_state_boundaries()
. Known sources are:MemletDict
checks for subset coverage andsubset.covers(other_subset)
is slow. While a cache is in place, this remains the number one issue according topy-spy
.node.input_memlets()
is number two on the list ofpy-spy
.
For working on the performance issue, the following script can be handy:
import sys
import time
from dace import SDFG
import dace.sdfg.analysis.schedule_tree.sdfg_to_tree as s2t
import dace.sdfg.analysis.schedule_tree.tree_to_sdfg as t2s
if __name__ == '__main__':
s = time.time()
sdfg = SDFG.from_file(sys.argv[1])
print(f"Loaded SDFG in {(time.time() - s):.3f} seconds.")
s = time.time()
stree = s2t.as_schedule_tree(sdfg, in_place=True)
print(f"Created schedule tree in {(time.time() - s):.3f} seconds.")
# after WAW, before label, etc.
s = time.time()
stree = t2s.insert_state_boundaries_to_tree(stree)
print(f"Inserted state boundaries in {(time.time() - s):.3f} seconds.")
in combination with SDFGs from this GitHub issue.
State of translate tests
Roundtrip SDFG validating | Translate test passing | |
---|---|---|
XPPM | yes | yes |
DelnFluxNoSG | yes | yes |
DelnFlux | yes | yes |
FvTp2d | yes | yes |
FxAdv | yes | yes |
Fillz | no (NView) | - |
Ray_Fast | no (NView) | - |
D_SW | yes (no ConstantPropagation) | yes |
UpdateDzD | yes | yes |
SDFGs for roundtrip validation can be downloaded from this GitHub issue.