The application requirements are captured by a compute graph, where each node in the compute graph specifies an atomic task on a single hardware engine. This section will walk you through the graph specification as captured by our YAML schema. Note that YAML treats everything as a list or a dictionary. Ordering of fields does not matter; the compiler is built to accept the fields in any order specified if the expected nesting hierarchy is satisfied. Note that IDs specified cannot contain a period (.) symbol as that symbol is used internally by the Framework.
This field specifies the input specification version number. This is required to ensure that incompatible features are disabled/flagged. Ideally it should match the version of the provided compiler package. This field is mandatory.
Version: 3.0.0 # Input specification version - currently 3.0.0
The graph ID is the second top-level entry in the YAML (first being the Version). The graph definition is nested under the graph ID.
Version: 3.0.0 # Input specification version - currently 3.0.0 SimpleGraph: <..Graph Description..>
The identifier is a numeric ID for the graph that is used for tracking schedules across schedule switches. It is a required parameter under the graph ID.
SimpleGraph: Identifier: 101 # A unique integer ID specified by the user
Resources that are used system-wide are modeled under the global resources section. Hardware resources like CPUs, GPUs, or PVAs should go under this section. Any system-wide virtual scheduling mutexes can also be listed here. The compiler models each resource as a timeline on which only one runnable can execute at any time. A runnable can, however, use more than one resource. There are some limitations on the types of resources which can be simultaneously used by a runnable, which are covered in this section. Generally, resources are specified in the following format. Certain types of resources have additional features which are described in the respective sections. Global resources are nested under a Resources section under the graph ID. To define a resource, a resource type needs to be specified. Resource instances are grouped under the appropriate resource type. YAML supports two ways of specifying lists, which are shown in the example below.
SimpleGraph: Resources: Resource_Type0: [Rsrc_Type0_Instance0, Rsrc_Type0_InstanceN] Resource_Type1: - Rsrc_Type1_Instance0 - Rsrc_Type1_Instance1
Hardware resources specified in the CPU Resource Type and Hardware Accelerator Resource Types sections are known resource types for the compiler and require specialized steps to schedule runnables on these resources. Other resource types are considered as scheduling mutexes, and they do not have any naming restrictions.
To specify CPUs in the system, the resource type should be set to CPU and the resource instances should be named as CPUX, where X is a valid CPU number.
SimpleGraph: Resources: CPU: [CPU0, CPU1, CPU2]
The following hardware accelerators can be used to offload computation from the CPUs:
Supported device IDs are GPUX and VPUX, where X is a valid instance number. While specifying the device instances, an optional limit can be specified to enforce a limitation on the number of streams/queues mapped to that device instance. To specify a Stream/Queue limit on an instance, append the instance ID with: Y, where Y is the limit. In the following example, instance GPU0 allows unlimited CUDA Streams, whereas GPU1 allows only 8 Streams.
SimpleGraph: Resources: GPU: - GPU0 # Unlimited Streams - GPU1: 8 # 8 Streams
Any resource type not known by the compiler is modeled as a scheduling mutex. There are no naming conventions associated with either the resource type or the resource ID for a scheduling mutex. Interfering runnables can specify a scheduling mutex as a resource requirement to prevent the compiler from scheduling them concurrently.
SimpleGraph: Resources: # Can be used to mutually exclude memory-intensive tasks MEMORY_BUS: [MEMORY_BUS0] # Scheduling mutexes MUTEX: [SCHED_MUTEX0, SCHED_MUTEX1]
A hyperepoch is a resource partition that runs a fixed configuration of epochs that share the resources in that partition. It is periodic in nature, and it respawns the contained epochs at the specified period. This relationship between the hyperepoch and its member epochs will be covered in the Epochs section. To define a hyperepoch, the required fields are Resources, Period and Epochs. In certain configurations, some fields can be omitted as specified in the respective sections. Hyperepochs are specified in a list under the ‘Hyperepochs’ keyword inside the Graph specification as shown below. Hyperepoch0 is the ID of the hyperepoch that is defined in the following graph.
SimpleGraph: Hyperepochs: - Hyperepoch0: # Hyperepoch ID Period: 100ms Resources: - MEMORY_BUS0 - GPU0 - CPU0 - CPU1
The period for a hyperepoch specifies the rate at which the contained epochs are spawned. This field can be omitted if the hyperepoch has only one epoch, and the periodicity of the hyperepoch is equal to that of the contained epoch. The period field is nested under the hyperepochs ID.
Each hyperepoch is associated with a mutually exclusive set of resources. Resources are mapped to hyperepochs by specifying the resource IDs in a list under the Resources heading inside the hyperepoch specification as shown in the example above. There, the resources MEMORY BUS0, GPU0, CPU0 and CPU1 are mapped to the hyperepoch Hyperepoch0. If there is only one hyperepoch in the system, this resource-mapping can be omitted and the hyperepoch is assumed to have access to all the resources in the system.
Epochs are time bases at which rate constituent runnables spawn confined to the boundaries of the hyperepoch. Each epoch is a member of a hyperepoch, and has two attributes associated with it -Period and Frames. For specifying epochs, list epoch IDs under the Epochs heading in a hyperepoch as shown below.
Hyperepochs: - Hyperepoch0: Period: 100ms Epochs: - Epoch0: Period: 10ms Frames: 8 - Epoch1: Period: 100ms - Epoch2: Period: 33ms Frames: 3
The period specified for the epoch specifies the rate at which a frame of runnables is spawned, up to the number of frames specified, in the hyperepochs period. By default, if not specified, the number of frames is 1. In the example given above, Epoch0 spawns 8 frames in 100ms. Each frame is spawned 10ms apart. Epoch1 spawns once, as the number of frames defaults to 1, in the hyperepoch. Epoch2 spawns thrice, 33ms apart. If periodicity is not required at the epoch level, it can be omitted, and the number of frames would specify the number of times the epoch’s set of runnables needs to be spawned. This can be used to figure out the number of frames that can fit inside the hyperepochs period. The following example shows how a system can use hyperepochs to define different frequency domains.
Version: 3.0.0 Drive: # Graph ID Resources: # Global Resources CPU: [CPU0, CPU1, CPU2] GPU: [GPU0] Hyperepochs: - Perception: # Hyperepoch ID Period: 100ms # Hyperepoch period Resources: [CPU1, CPU2, GPU0] # Resource mapping Epochs: - Camera: # Epoch ID Period: 33.33ms Frames: 3 - Radar: # Epoch ID Period: 100ms Frames: 1 - Control: # Hyperepoch ID; Hyperepoch Resources: [CPU0] # period inferred from epoch. Epochs: - VDC: # Epoch ID Period: 10ms # Epoch frames = 1 (default)
This configuration has been visualized in the following figure. Note that Camera and Radar frames are synchronized with each other at the hyperepoch boundary, VDC frames are not aligned with either the Camera or Radar frames as they are in a separate hyperepoch with a different time base.
Hyperepochs and epochs define the timing boundaries for tasks (runnables). Clients define the data boundaries. A client is an operating system process that contains software resources (like CUDA streams) and runnables. Clients are specified in the graph specification section under the Clients header. Each client specifies contained software resources (if any). Clients also list the epochs contained in that client and runnables associated with each epoch. In general, a typical client would be specified as follows:
Version: 3.0.0 Drive: # Graph ID Clients: - Client0: # Client ID Resources: # Client0s internal resources # Resource Definition Epochs: # Epochs present in this client - Perception.Camera: # Epoch Global ID - <HyperepochID.EpochID> Runnables: # Runnables present in Perception.Camera - ReadCamera: # Runnable ID (Unique inside a client) # Runnable specification... - RunnableN: # Runnable specification... - Perception.Radar: Runnables: # Runnables present in Perception.Radar - ProcessRadar: # Runnable specification...
Clients can specify resources that are visible to runnables locally. These resources cannot be accessed by runnables in other clients. Global resources are visible to all runnables. Process-specific resources like CUDA streams, and PVA streams are some examples of client resources that cannot be shared across different clients. Also, internal scheduling mutexes can also be modeled here. These resources are specified in a format like that of Global Resources.
Clients: - Client1: Resources: ResourceType0: - ResourceType0_Instance0 - ResourceType0_InstanceN ResourceTypeN: - ResourceTypeN_Instance0 - ResourceTypeN_InstanceN
The following are a list of software resources that map to the corresponding hardware engines:
The hardware engine mapping is conveyed to the compiler when specifying the resource instances as shown in the example below. The specified hardware resource instances should be specified under the corresponding hardware resource type in the Global Resources section. Note that the compiler will throw an error if the limits on the mapped resource (as specified in the Hardware Accelerator Resource Types section) are violated.
Clients: - Client0: Resources: CUDA_STREAM: - CUDA_STREAM0: GPU0 # CUDA_STREAM0 mapped to GPU0 - CUDA_STREAM1: GPU0 # CUDA_STREAM1 mapped to GPU0 PVA_STREAM: #A client can have one unique stream per VPU - PVA_STREAM0: VPU0 # PVA_STREAM0 mapped to VPU0
Resource types other than those specified in Global Resources section above are treated as local scheduling mutexes. These cannot be mapped to a hardware resource.
Clients: - Client0: Resources: LOCAL_SCHED_MUTEX: - LOCAL_SCHED_MUTEX0 LOCAL_RESOURCE_MUTEX: - RESOURCE_MUTEX0
Here's a visual representation of the data, timing and resource boundries in STM:
Tasks in the system executed on hardware engines are known as runnables. Ideally each runnable should only use a single hardware engine. Currently synchronous runnables (runnable that use multiple hardware engines simultaneously) are not supported. Runnables require resources for execution and can be dependent on other runnables. Care must be taken to ensure that dependencies do not introduce a cycle in the graph. Depending on the type of engine used, the compiler classifies each runnable into one of the three following classes:
For each runnable, the following parameters can be set:
ClientID.RunnableID
. Dependencies can only be specified for runnables belonging to the same Epoch.Clients: - Client0: Resources: CUDA_STREAM: - CUDA_STREAM0: GPU0 Epochs: - Perception.Camera: # Camera epoch in the perception hyperepoch Runnables: - ReadCamera: # Normal runnable WCET: 10us Resources: - CPU # This runnable runs on a CPU Priority: 2 - PreProcessImage: # Submitter runnable WCET: 20ms StartTime: 1ms # Starts 1ms after the camera epoch Resources: # GPU Submitter needs CPU0 and a stream - CPU0 - CUDA_STREAM Dependencies: [Client0.ReadCamera] # Depends on ReadCamera Submits: Client0.PreProcessGPUWork # Mentions submittee Priority: 2 # Has a relative priority of 2 - PreProcessGPUWork: # Submittee runnable WCET: 5000ns Resources: [GPU] Dependencies: - Client0.PreProcessImage # Optional for submittees - Perception.Radar: # Radar epoch in Perception Hyperepoch Runnables: - ProcessRadar: # Runnable specification...
STM can round-robin between multiple runnables within an execution slot. This mode of operation is useful when performance constraints prevent the execution of all the runnables in that epoch. Further, it allows users to reduce the frequency of running any particular runnable. When runnables are round-robinned against each other, STM will use the union of their dependencies and schedule a slot that satisfies all of them. The time slot allocated to the round-robinned slot is set to the largest WCET value specified for the round-robinned runnables. Roundrobin groups are specified in the Epochs section for the corresponding Hyperepoch using the AliasGroup construct.
For each AliasGroup, the Steps parameter specifies the IDs of the runnables in the desired round-robin sequence.
Round robinned runnables are subject to the following constraints:
The following example shows a use case where two cameras are round-robinned against each other. In the even frames of the Camera Epoch, Client0.PreProcessCamera1 and Client0.ProcessCamera1GPUWork will run. In the odd frames, Client0.PreProcessCamera2 and Client0.ProcessCamera2GPUWork will run. Their parent and child dependencies are automatically taken care of - ReadCameras1And2 and PostProcessCameras will run for all frames and all these runnables will automatically wait for the correct dependencies.
Version: 3.0.0 Drive: # Graph ID Identifier: 101 Resources: # Global Resources CPU: [CPU0, CPU1, CPU2] GPU: [GPU0] Hyperepochs: - Perception: # Hyperepoch ID Period: 100ms # Hyperepoch period Resources: [CPU0, GPU0, Client0.CUDA_STREAM0] # Resource mapping Epochs: - Camera: # Epoch ID AliasGroups: # Define Round Robin Groups - PreProcessRoundRobinGroup: # AliasGroup's ID Steps: # This group round robins between - Client0.PreProcessCamera1 # Client0.PreProcessCamera1 and Client0.PreProcessCamera2 - Client0.PreProcessCamera2 # in alternate frames. - ProcessGPUWorkRoundRobinGroup: # AliasGroup's ID Steps: # This group round robins between - Client0.ProcessCamera1GPUWork # Client0.ProcessCamera1GPUWork and Client0.ProcessCamera2GPUWork - Client0.ProcessCamera2GPUWork # in alternate frames. Period: 14ms Frames: 2 Clients: - Client0: Resources: CUDA_STREAM: [CUDA_STREAM0 : GPU0] Epochs: - Perception.Camera: # Camera epoch in Perception Hyperepoch Runnables: - ReadCameras1And2: WCET: 3ms Resources: - CPU # This runnable runs on a CPU - PreProcessCamera1: # Submitter runnable WCET: 3ms Resources: # GPU Submitter needs CPU and a stream - CPU - CUDA_STREAM Dependencies: [Client0.ReadCameras1And2] # Depends on ReadCameras1And2 Submits: Client0.ProcessCamera1GPUWork # Mentions submittee - ProcessCamera1GPUWork: # Submittee runnable WCET: 4ms Resources: [GPU] Dependencies: - Client0.PreProcessCamera1 # Optional for submittees - PreProcessCamera2: # Submitter runnable WCET: 3ms Resources: # GPU Submitter needs CPU and a stream - CPU - CUDA_STREAM Dependencies: [Client0.ReadCameras1And2] # Depends on ReadCameras1And2 Submits: Client0.ProcessCamera2GPUWork # Mentions submittee - ProcessCamera2GPUWork: # Submittee runnable WCET: 4ms Resources: [GPU] Dependencies: - Client0.PreProcessCamera2 # Optional for submittees - PostProcessCameras: WCET: 3ms Resources: [CPU] Dependencies: - Client0.ProcessCamera1GPUWork - Client0.ProcessCamera2GPUWork
An illustration of the execution sequence for this example in steady state is shown below:
STM allows users to switch schedules at runtime through two mechanisms. If the operation is needed, the hyperepoch composition in terms of the number of hyperepochs and their mapped hardware resources is required to be the same across all the DAGs that participate in that operation. This restriction does not apply to the
operation.