The compiler currently supports two heuristics for flattening the graph into a schedule.
This heuristic uses the branch and bound paradigm to find the solutions. With the default timeout of 0, the first found schedule is returned. This allows quick generation of schedules which are reproducible. Running again with same input yaml should produce the same schedule. For a longer and more extensive search, use a bigger timeout value. By default, one worker is spawned for each core in the system used for search. Use stmcompiler tool flags to change the number of workers.
This heuristic is designed for isolated profiling of the workload. It creates a critical-path dominant sequence and schedules the sequence serially while putting in fences between sequential runnables. This ensures that only one runnable is active at any point in time on any hardware engine. Note that profiling results from this heuristic will not uncover runnable-runnable performance interference in the system.