• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • System Task Manager SDK Reference  5.10
    All Classes Files Functions Variables Typedefs Enumerations Enumerator Pages
    Compute Graph And Constraints

    Compute Graph And Constraints

    The application requirements are captured by a compute graph, where each node in the compute graph specifies an atomic task on a single hardware engine. This section will walk you through the graph specification as captured by our YAML schema. Note that YAML treats everything as a list or a dictionary. Ordering of fields does not matter; the compiler is built to accept the fields in any order specified if the expected nesting hierarchy is satisfied. Note that IDs specified cannot contain a period (.) symbol as that symbol is used internally by the Framework.

    STM Version

    This field specifies the input specification version number. This is required to ensure that incompatible features are disabled/flagged. Ideally it should match the version of the provided compiler package. This field is mandatory.

    Version: 2.0.0 # Input specification version - currently 2.0.0
    

    Graph ID

    The graph ID is the second top-level entry in the YAML (first being the Version). The graph definition is nested under the graph ID.

    Version: 2.0.0 # Input specification version - currently 2.0.0
    SimpleGraph: <..Graph Description..>
    

    Global Resources

    Resources that are used system-wide are modeled under the global resources section. Hardware resources like CPUs or GPUs should go under this section. Any system-wide virtual scheduling mutexes can also be listed here. The compiler models each resource as a timeline on which only one runnable can execute at any time. A runnable can, however, use more than one resource. There are some limitations on the types of resources which can be simultaneously used by a runnable, which are covered in this section. Generally, resources are specified in the following format. Certain types of resources have additional features which are described in the respective sections. Global resources are nested under a Resources section under the graph ID. To define a resource, a resource type needs to be specified. Resource instances are grouped under the appropriate resource type. YAML supports two ways of specifying lists, which are shown in the example below.

    SimpleGraph:
    Resources:
    Resource_Type0: [Rsrc_Type0_Instance0, Rsrc_Type0_InstanceN]
    Resource_Type1:
        - Rsrc_Type1_Instance0
        - Rsrc_Type1_Instance1
    

    The CPU, GPU and VPU resource types are known resource types for the compiler, and it will take specialized scheduling steps for runnables scheduled on those resources. Other resource types are considered as scheduling mutexes, and they do not have any naming restrictions.

    CPU Resource Type

    To specify CPUs in the system, the resource type should be set to CPU and the resource instances should be named as CPUX, where X is a valid CPU number.

    SimpleGraph:
    Resources:
        CPU: [CPU0, CPU1, CPU2]
    

    GPU and VPU Resource Type

    Graphical Processing Units (GPUs) and Vector Processing Units (VPUs) are special hardware accelerators that can be used to offload computation from the CPUs. Work is submitted to these engines through CUDA Streams and PVA Streams respectively. Supported device IDs are GPUX and VPUX, where X is a valid instance number. While specifying the device instances, an optional limit can be specified to enforce a limitation on the number of streams/queues mapped to that device instance. To specify a Stream/Queue limit on an instance, append the instance ID with: Y, where Y is the limit. In the following example, instance GPU0 allows unlimited CUDA Streams, whereas GPU1 allows only 8 Streams.

    SimpleGraph:
    Resources:
        GPU:
            - GPU0 # Unlimited Streams
            - GPU1: 8 # 8 Streams
    

    Scheduling Mutex Resource Type

    Any resource type not known by the compiler is modeled as a scheduling mutex. There are no naming conventions associated with either the resource type or the resource ID for a scheduling mutex. Interfering runnables can specify a scheduling mutex as a resource requirement to prevent the compiler from scheduling them concurrently.

    SimpleGraph:
    Resources:
        # Can be used to mutually exclude memory-intensive tasks
        MEMORY_BUS: [MEMORY_BUS0]
        # Scheduling mutexes
        MUTEX: [SCHED_MUTEX0, SCHED_MUTEX1]
    

    Hyperepochs

    A hyperepoch is a resource partition that runs a fixed configuration of epochs that share the resources in that partition. It is periodic in nature, and it respawns the contained epochs at the specified period. This relationship between the hyperepoch and its member epochs will be covered in the Epochs section. To define a hyperepoch, the required fields are Resources, Period and Epochs. In certain configurations, some fields can be omitted as specified in the respective sections. Hyperepochs are specified in a list under the ‘Hyperepochs’ keyword inside the Graph specification as shown below. Hyperepoch0 is the ID of the hyperepoch that is defined in the following graph.

    SimpleGraph:
    Hyperepochs:
    - Hyperepoch0: # Hyperepoch ID
        Period: 100ms
        Resources:
        - MEMORY_BUS0
        - GPU0
        - CPU0
        - CPU1
    

    Period

    The period for a hyperepoch specifies the rate at which the contained epochs are spawned. This field can be omitted if the hyperepoch has only one epoch, and the periodicity of the hyperepoch is equal to that of the contained epoch. The period field is nested under the hyperepochs ID.

    Resources

    Each hyperepoch is associated with a mutually exclusive set of resources. Resources are mapped to hyperepochs by specifying the resource IDs in a list under the Resources heading inside the hyperepoch specification as shown in the example above. There, the resources MEMORY BUS0, GPU0, CPU0 and CPU1 are mapped to the hyperepoch Hyperepoch0. If there is only one hyperepoch in the system, this resource-mapping can be omitted and the hyperepoch is assumed to have access to all the resources in the system.

    Epochs

    Epochs are time bases at which rate constituent runnables spawn confined to the boundaries of the hyperepoch. Each epoch is a member of a hyperepoch, and has two attributes associated with it -Period and Frames. For specifying epochs, list epoch IDs under the Epochs heading in a hyperepoch as shown below.

    Hyperepochs:
    - Hyperepoch0:
        Period: 100ms
        Epochs:
        - Epoch0:
            Period: 10ms
            Frames: 8
        - Epoch1:
            Period: 100ms
        - Epoch2:
            Period: 33ms
            Frames: 3
    

    Frames and Period for Epochs

    The period specified for the epoch specifies the rate at which a frame of runnables is spawned, up to the number of frames specified, in the hyperepochs period. By default, if not specified, the number of frames is 1. In the example given above, Epoch0 spawns 8 frames in 100ms. Each frame is spawned 10ms apart. Epoch1 spawns once, as the number of frames defaults to 1, in the hyperepoch. Epoch2 spawns thrice, 33ms apart. If periodicity is not required at the epoch level, it can be omitted, and the number of frames would specify the number of times the epoch’s set of runnables needs to be spawned. This can be used to figure out the number of frames that can fit inside the hyperepochs period. The following example shows how a system can use hyperepochs to define different frequency domains.

    Version: 2.0.0
    Drive: # Graph ID
    Resources: # Global Resources
        CPU: [CPU0, CPU1, CPU2]
        GPU: [GPU0]
    Hyperepochs:
        - Perception: # Hyperepoch ID
            Period: 100ms # Hyperepoch period
            Resources: [CPU1, CPU2, GPU0] # Resource mapping
            Epochs:
                - Camera: # Epoch ID
                    Period: 33.33ms
                    Frames: 3
                - Radar: # Epoch ID
                    Period: 100ms
                    Frames: 1
        - Control: # Hyperepoch ID; Hyperepoch
            Resources: [CPU0] # period inferred from epoch.
            Epochs:
                - VDC: # Epoch ID
                    Period: 10ms # Epoch frames = 1 (default)
    

    This configuration has been visualized in the following figure. Note that Camera and Radar frames are synchronized with each other at the hyperepoch boundary, VDC frames are not aligned with either the Camera or Radar frames as they are in a separate hyperepoch with a different time base.

    Visualization of the Hyperepoch configuration

    Clients

    Hyperepochs and epochs define the timing boundaries for tasks (runnables). Clients define the data boundaries. A client is an operating system process that contains software resources (like CUDA streams) and runnables. Clients are specified in the graph specification section under the Clients header. Each client specifies contained software resources (if any). Clients also list the epochs contained in that client and runnables associated with each epoch. In general, a typical client would be specified as follows:

    Version: 2.0.0
    Drive: # Graph ID
    Clients:
    - Client0: # Client ID
        Resources: # Client0s internal resources
        # Resource Definition
        Epochs: # Epochs present in this client
        - Perception.Camera: # Epoch Global ID - <HyperepochID.EpochID>
            Runnables: # Runnables present in Perception.Camera
            - ReadCamera: # Runnable ID (Unique inside a client)
            # Runnable specification...
            - RunnableN:
            # Runnable specification...
        - Perception.Radar:
            Runnables: # Runnables present in Perception.Radar
            - ProcessRadar:
            # Runnable specification...
    

    Client Resources

    Clients can specify resources that are visible to runnables locally. These resources cannot be accessed by runnables in other clients. Global resources are visible to all runnables. Process-specific resources like CUDA streams, and PVA streams are some examples of client resources that cannot be shared across different clients. Also, internal scheduling mutexes can also be modeled here. These resources are specified in a format like that of Global Resources.

    Clients:
        - Client1:
        Resources:
            ResourceType0:
            - ResourceType0_Instance0
            - ResourceType0_InstanceN
            ResourceTypeN:
            - ResourceTypeN_Instance0
            - ResourceTypeN_InstanceN
    

    Resource Type: CUDA Stream, PVA Stream

    CUDA streams, and PVA streams are client-specific software resources that are mapped to corresponding hardware engines (GPU and VPU respectively). To specify these resources, the resource types should be set to CUDA_STREAM or PVA_STREAM respectively. The hardware engine mapping is conveyed to the compiler when specifying the resource instances as shown in the example below. The specified hardware resource instances should be specified under the corresponding hardware resource type in the Global Resources section. Note that the compiler will throw an error if the limits on the mapped resource (as specified in the GPU VPU Global Resource section) are violated.

    Clients:
    - Client0:
        Resources:
            CUDA_STREAM:
            - CUDA_STREAM0: GPU0 # CUDA_STREAM0 mapped to GPU0
            - CUDA_STREAM1: GPU0 # CUDA_STREAM1 mapped to GPU0
            PVA_STREAM: #A client can have one unique stream per VPU
            - PVA_STREAM0: VPU0 # PVA_STREAM0 mapped to VPU0
    

    Resource Type: Local Scheduling Mutex

    Resource types other than those specified in Global Resources section above are treated as local scheduling mutexes. These cannot be mapped to a hardware resource.

    Clients:
        - Client0:
            Resources:
            LOCAL_SCHED_MUTEX:
            - LOCAL_SCHED_MUTEX0
            LOCAL_RESOURCE_MUTEX:
            - RESOURCE_MUTEX0
    

    Here's a visual representation of the data, timing and resource boundries in STM:

    Boundaries in NVIDIA System Task Manager

    Runnables

    Tasks in the system executed on hardware engines are known as runnables. Ideally each runnable should only use a single hardware engine. Currently synchronous runnables (runnable that use multiple hardware engines simultaneously) are not supported. Runnables require resources for execution and can be dependent on other runnables. Care must be taken to ensure that dependencies do not introduce a cycle in the graph. Depending on the type of engine used, the compiler classifies each runnable into one of the three following classes:

    1. Submitter: A runnable that runs on a CPU and submits another runnable that runs on a different hardware resource. E.g.: A CUDA kernel that launches work on the GPU.
    2. Submittee: A runnable that is submitted by a submitter to be run on a particular hardware resource. Following the example above, the CUDA task launched on the GPU is termed as a Submittee runnable.
    3. Runnable: Any task that cannot be classified as a Submitter or a Submittee.

    For each runnable, the following parameters can be set:

    1. WCET (Worst Case Execution Time): The framework assumes that runnables have a bounded runtime. This runtime is captured by the WCET parameter. [Required Parameter]
    2. StartTime: The start time of a runnable can be offset by this value from the beginning of its epoch.
    3. Deadline: Suggested deadline to the compiler. The compiler will attempt to schedule this runnable before this deadline.
    4. Resources: List of resources that a runnable needs. Currently, a runnable can request only one hardware resource. However, there are no limitations on the number of software resources that a runnable can ask for. The requirement can be specified as a resource type (e.g., GPU) if the developer does not care for a specific instance of a resource, or as a resource instance (e.g., GPU0). [Required Parameter]
    5. Dependencies: List of runnables that need to be completed before this runnable can be scheduled. These runnables are specified as ClientID.RunnableID.
    6. Submits: This field is present in Submitter runnable specification. It specifies the Submittee runnable ID. In a Submitter-Submittee pair, the Submits field should be populated. It is not necessary to add the Submitter runnable in the Submittees dependencies list.
      Clients:
          - Client0:
              Resources:
              CUDA_STREAM:
              - CUDA_STREAM0: GPU0
              Epochs:
              - Perception.Camera: # Camera epoch in Perception Hyperepoch
                  Runnables:
                  - ReadCamera: # Normal runnable
                      WCET: 10us
                      Resources:
                      - CPU # This runnable runs on a CPU
                  - PreProcessImage: # Submitter runnable
                      WCET: 20ms
                      StartTime: 1ms # Starts 1ms after the camera epoch
                      Resources: # GPU Submitter needs CPU0 and a stream
                      - CPU0
                      CUDA_STREAM
                      Dependencies: [Client0.ReadCamera] # Depends on
                      ReadCamera
                      Submits: Client0.PreProcessGPUWork # Mentions
                      submittee
                  - PreProcessGPUWork: # Submittee runnable
                      WCET: 5000ns
                      Deadline: 30ms # Hint to schedule this before 30ms
                      Resources: [GPU]
                      Dependencies:
                          - Client0.PreProcessImage # Optional for submittees
                      # Note: Inter-epoch dependencies are currently not
                      # supported. Inter-client dependencies are supported.
              - Perception.Radar: # Radar epoch in Perception Hyperepoch
                      Runnables:
                      - ProcessRadar:
                      # Runnable specification...
      

    Round Robining

    The runnables inside an epoch can be round robinned, such that different runnables execute in the same slot for different frames.

    Version: 2.0.0
    Drive: # Graph ID
    Resources: # Global Resources
        CPU: [CPU0, CPU1, CPU2]
        GPU: [GPU0]
    Hyperepochs:
        - Perception: # Hyperepoch ID
            Period: 100ms # Hyperepoch period
            Resources: [CPU1, CPU2, GPU0] # Resource mapping
            Epochs:
            - Camera: # Epoch ID
                AliasGroups:
                - parent_round_robin:
                    Steps: [client0.n2, client0.n4]
                - child_round_robin:
                    Steps: [client0.n3, client0.n5]
                Period: 33.33ms
                Frames: 3
            - Radar: # Epoch ID
                Period: 100ms
                Frames: 1
            - Control: # Hyperepoch ID; Hyperepoch
                Resources: [CPU0] # period inferred from epoch.
                Epochs:
                - VDC: # Epoch ID
                    Period: 10ms # Epoch frames = 1 (default)
    

    In the above example, n2 and n3 will execute in even frames while n4 and n5 will execute in odd frames. These conditions must hold for steps:

    1. The runnables must belong to the same client
    2. The runnables must belong to same epoch
    3. The runnables must have the same resources
    4. Any other round robinned runnable that is a direct parent or child must have the same step length.

    This section covered the skeleton for timing specification in the graph. In the following sections, we will cover the specification of tasks that adhere to these timing specs.

    人人超碰97caoporen国产