The application requirements are captured by a compute graph, where each node in the compute graph specifies an atomic task on a single hardware engine. This section will walk you through the graph specification as captured by our YAML schema. Note that YAML treats everything as a list or a dictionary. Ordering of fields does not matter; the compiler is built to accept the fields in any order specified if the expected nesting hierarchy is satisfied. Note that IDs specified cannot contain a period (.) symbol as that symbol is used internally by the Framework.
This field specifies the input specification version number. This is required to ensure that incompatible features are disabled/flagged. Ideally it should match the version of the provided compiler package. This field is mandatory.
Version: 2.0.0 # Input specification version - currently 2.0.0
The graph ID is the second top-level entry in the YAML (first being the Version). The graph definition is nested under the graph ID.
Version: 2.0.0 # Input specification version - currently 2.0.0 SimpleGraph: <..Graph Description..>
Resources that are used system-wide are modeled under the global resources section. Hardware resources like CPUs or GPUs should go under this section. Any system-wide virtual scheduling mutexes can also be listed here. The compiler models each resource as a timeline on which only one runnable can execute at any time. A runnable can, however, use more than one resource. There are some limitations on the types of resources which can be simultaneously used by a runnable, which are covered in this section. Generally, resources are specified in the following format. Certain types of resources have additional features which are described in the respective sections. Global resources are nested under a Resources section under the graph ID. To define a resource, a resource type needs to be specified. Resource instances are grouped under the appropriate resource type. YAML supports two ways of specifying lists, which are shown in the example below.
SimpleGraph: Resources: Resource_Type0: [Rsrc_Type0_Instance0, Rsrc_Type0_InstanceN] Resource_Type1: - Rsrc_Type1_Instance0 - Rsrc_Type1_Instance1
The CPU, GPU and VPU resource types are known resource types for the compiler, and it will take specialized scheduling steps for runnables scheduled on those resources. Other resource types are considered as scheduling mutexes, and they do not have any naming restrictions.
To specify CPUs in the system, the resource type should be set to CPU and the resource instances should be named as CPUX, where X is a valid CPU number.
SimpleGraph: Resources: CPU: [CPU0, CPU1, CPU2]
Graphical Processing Units (GPUs) and Vector Processing Units (VPUs) are special hardware accelerators that can be used to offload computation from the CPUs. Work is submitted to these engines through CUDA Streams and PVA Streams respectively. Supported device IDs are GPUX and VPUX, where X is a valid instance number. While specifying the device instances, an optional limit can be specified to enforce a limitation on the number of streams/queues mapped to that device instance. To specify a Stream/Queue limit on an instance, append the instance ID with: Y, where Y is the limit. In the following example, instance GPU0 allows unlimited CUDA Streams, whereas GPU1 allows only 8 Streams.
SimpleGraph: Resources: GPU: - GPU0 # Unlimited Streams - GPU1: 8 # 8 Streams
Any resource type not known by the compiler is modeled as a scheduling mutex. There are no naming conventions associated with either the resource type or the resource ID for a scheduling mutex. Interfering runnables can specify a scheduling mutex as a resource requirement to prevent the compiler from scheduling them concurrently.
SimpleGraph: Resources: # Can be used to mutually exclude memory-intensive tasks MEMORY_BUS: [MEMORY_BUS0] # Scheduling mutexes MUTEX: [SCHED_MUTEX0, SCHED_MUTEX1]
A hyperepoch is a resource partition that runs a fixed configuration of epochs that share the resources in that partition. It is periodic in nature, and it respawns the contained epochs at the specified period. This relationship between the hyperepoch and its member epochs will be covered in the Epochs section. To define a hyperepoch, the required fields are Resources, Period and Epochs. In certain configurations, some fields can be omitted as specified in the respective sections. Hyperepochs are specified in a list under the ‘Hyperepochs’ keyword inside the Graph specification as shown below. Hyperepoch0 is the ID of the hyperepoch that is defined in the following graph.
SimpleGraph: Hyperepochs: - Hyperepoch0: # Hyperepoch ID Period: 100ms Resources: - MEMORY_BUS0 - GPU0 - CPU0 - CPU1
The period for a hyperepoch specifies the rate at which the contained epochs are spawned. This field can be omitted if the hyperepoch has only one epoch, and the periodicity of the hyperepoch is equal to that of the contained epoch. The period field is nested under the hyperepochs ID.
Each hyperepoch is associated with a mutually exclusive set of resources. Resources are mapped to hyperepochs by specifying the resource IDs in a list under the Resources heading inside the hyperepoch specification as shown in the example above. There, the resources MEMORY BUS0, GPU0, CPU0 and CPU1 are mapped to the hyperepoch Hyperepoch0. If there is only one hyperepoch in the system, this resource-mapping can be omitted and the hyperepoch is assumed to have access to all the resources in the system.
Epochs are time bases at which rate constituent runnables spawn confined to the boundaries of the hyperepoch. Each epoch is a member of a hyperepoch, and has two attributes associated with it -Period and Frames. For specifying epochs, list epoch IDs under the Epochs heading in a hyperepoch as shown below.
Hyperepochs: - Hyperepoch0: Period: 100ms Epochs: - Epoch0: Period: 10ms Frames: 8 - Epoch1: Period: 100ms - Epoch2: Period: 33ms Frames: 3
The period specified for the epoch specifies the rate at which a frame of runnables is spawned, up to the number of frames specified, in the hyperepochs period. By default, if not specified, the number of frames is 1. In the example given above, Epoch0 spawns 8 frames in 100ms. Each frame is spawned 10ms apart. Epoch1 spawns once, as the number of frames defaults to 1, in the hyperepoch. Epoch2 spawns thrice, 33ms apart. If periodicity is not required at the epoch level, it can be omitted, and the number of frames would specify the number of times the epoch’s set of runnables needs to be spawned. This can be used to figure out the number of frames that can fit inside the hyperepochs period. The following example shows how a system can use hyperepochs to define different frequency domains.
Version: 2.0.0 Drive: # Graph ID Resources: # Global Resources CPU: [CPU0, CPU1, CPU2] GPU: [GPU0] Hyperepochs: - Perception: # Hyperepoch ID Period: 100ms # Hyperepoch period Resources: [CPU1, CPU2, GPU0] # Resource mapping Epochs: - Camera: # Epoch ID Period: 33.33ms Frames: 3 - Radar: # Epoch ID Period: 100ms Frames: 1 - Control: # Hyperepoch ID; Hyperepoch Resources: [CPU0] # period inferred from epoch. Epochs: - VDC: # Epoch ID Period: 10ms # Epoch frames = 1 (default)
This configuration has been visualized in the following figure. Note that Camera and Radar frames are synchronized with each other at the hyperepoch boundary, VDC frames are not aligned with either the Camera or Radar frames as they are in a separate hyperepoch with a different time base.
Hyperepochs and epochs define the timing boundaries for tasks (runnables). Clients define the data boundaries. A client is an operating system process that contains software resources (like CUDA streams) and runnables. Clients are specified in the graph specification section under the Clients header. Each client specifies contained software resources (if any). Clients also list the epochs contained in that client and runnables associated with each epoch. In general, a typical client would be specified as follows:
Version: 2.0.0 Drive: # Graph ID Clients: - Client0: # Client ID Resources: # Client0s internal resources # Resource Definition Epochs: # Epochs present in this client - Perception.Camera: # Epoch Global ID - <HyperepochID.EpochID> Runnables: # Runnables present in Perception.Camera - ReadCamera: # Runnable ID (Unique inside a client) # Runnable specification... - RunnableN: # Runnable specification... - Perception.Radar: Runnables: # Runnables present in Perception.Radar - ProcessRadar: # Runnable specification...
Clients can specify resources that are visible to runnables locally. These resources cannot be accessed by runnables in other clients. Global resources are visible to all runnables. Process-specific resources like CUDA streams, and PVA streams are some examples of client resources that cannot be shared across different clients. Also, internal scheduling mutexes can also be modeled here. These resources are specified in a format like that of Global Resources.
Clients: - Client1: Resources: ResourceType0: - ResourceType0_Instance0 - ResourceType0_InstanceN ResourceTypeN: - ResourceTypeN_Instance0 - ResourceTypeN_InstanceN
CUDA streams, and PVA streams are client-specific software resources that are mapped to corresponding hardware engines (GPU and VPU respectively). To specify these resources, the resource types should be set to CUDA_STREAM or PVA_STREAM respectively. The hardware engine mapping is conveyed to the compiler when specifying the resource instances as shown in the example below. The specified hardware resource instances should be specified under the corresponding hardware resource type in the Global Resources section. Note that the compiler will throw an error if the limits on the mapped resource (as specified in the GPU VPU Global Resource section) are violated.
Clients: - Client0: Resources: CUDA_STREAM: - CUDA_STREAM0: GPU0 # CUDA_STREAM0 mapped to GPU0 - CUDA_STREAM1: GPU0 # CUDA_STREAM1 mapped to GPU0 PVA_STREAM: #A client can have one unique stream per VPU - PVA_STREAM0: VPU0 # PVA_STREAM0 mapped to VPU0
Resource types other than those specified in Global Resources section above are treated as local scheduling mutexes. These cannot be mapped to a hardware resource.
Clients: - Client0: Resources: LOCAL_SCHED_MUTEX: - LOCAL_SCHED_MUTEX0 LOCAL_RESOURCE_MUTEX: - RESOURCE_MUTEX0
Here's a visual representation of the data, timing and resource boundries in STM:
Tasks in the system executed on hardware engines are known as runnables. Ideally each runnable should only use a single hardware engine. Currently synchronous runnables (runnable that use multiple hardware engines simultaneously) are not supported. Runnables require resources for execution and can be dependent on other runnables. Care must be taken to ensure that dependencies do not introduce a cycle in the graph. Depending on the type of engine used, the compiler classifies each runnable into one of the three following classes:
For each runnable, the following parameters can be set:
Clients: - Client0: Resources: CUDA_STREAM: - CUDA_STREAM0: GPU0 Epochs: - Perception.Camera: # Camera epoch in Perception Hyperepoch Runnables: - ReadCamera: # Normal runnable WCET: 10us Resources: - CPU # This runnable runs on a CPU - PreProcessImage: # Submitter runnable WCET: 20ms StartTime: 1ms # Starts 1ms after the camera epoch Resources: # GPU Submitter needs CPU0 and a stream - CPU0 CUDA_STREAM Dependencies: [Client0.ReadCamera] # Depends on ReadCamera Submits: Client0.PreProcessGPUWork # Mentions submittee - PreProcessGPUWork: # Submittee runnable WCET: 5000ns Deadline: 30ms # Hint to schedule this before 30ms Resources: [GPU] Dependencies: - Client0.PreProcessImage # Optional for submittees # Note: Inter-epoch dependencies are currently not # supported. Inter-client dependencies are supported. - Perception.Radar: # Radar epoch in Perception Hyperepoch Runnables: - ProcessRadar: # Runnable specification...
The runnables inside an epoch can be round robinned, such that different runnables execute in the same slot for different frames.
Version: 2.0.0 Drive: # Graph ID Resources: # Global Resources CPU: [CPU0, CPU1, CPU2] GPU: [GPU0] Hyperepochs: - Perception: # Hyperepoch ID Period: 100ms # Hyperepoch period Resources: [CPU1, CPU2, GPU0] # Resource mapping Epochs: - Camera: # Epoch ID AliasGroups: - parent_round_robin: Steps: [client0.n2, client0.n4] - child_round_robin: Steps: [client0.n3, client0.n5] Period: 33.33ms Frames: 3 - Radar: # Epoch ID Period: 100ms Frames: 1 - Control: # Hyperepoch ID; Hyperepoch Resources: [CPU0] # period inferred from epoch. Epochs: - VDC: # Epoch ID Period: 10ms # Epoch frames = 1 (default)
In the above example, n2 and n3 will execute in even frames while n4 and n5 will execute in odd frames. These conditions must hold for steps:
This section covered the skeleton for timing specification in the graph. In the following sections, we will cover the specification of tasks that adhere to these timing specs.