In software development, testing is crucial for ensuring the quality and reliability of the final product. However, creating test plans and specifications can be time-consuming and labor-intensive, especially when managing multiple requirements and diverse test types in complex systems. Many of these tasks are traditionally performed manually by test engineers.
To streamline this process, the DriveOS team at NVIDIA developed Hephaestus (HEPH), an internal generative AI framework for automatic test generation. HEPH automates design and implementation for various tests, including integration and unit tests. It uses large language models (LLMs) for input analysis and code generation, significantly reducing the time spent on creating test cases. By generating context-aware tests based on input documentation, code samples, and feedback loops, HEPH makes testing faster and more efficient.
This post provides an overview of how an agent framework was built to generate various types of software tests. It covers how an LLM agent is used to ensure document traceability and create executable tests based on software requirements. You also get ideas for improving the agent’s test-generation capabilities.
An agentic framework for automatic test case creation
Development, security, and QA teams often face the labor-intensive task of manual test case creation during the software development process.
The test creation process involves time-intensive steps, including retrieving requirement information, tracing those requirements to relevant documentation, and ultimately generating tests based on the aligned requirements and documentation.
To simplify this workflow, the DriveOS team designed HEPH, a custom, internal framework for automating the entire testing process, significantly reducing the time spent on creating test specifications and implementations. HEPH uses LLMs to analyze input documentation and code samples, generating tests that are tailored to the provided requirements.
Value of test automation
HEPH uses an LLM agent for every step in the test generation process—from document traceability to code generation. This enables the automation of the entire testing workflow and saves engineering teams many hours.
- Time savings: HEPH dramatically accelerates the test creation process. In trials with multiple pilot teams at NVIDIA, teams reported saving up to 10 weeks of development time.
- Context-aware test generation: HEPH uses project documentation and interface specifications to generate test specifications and implementations. Each test is compiled, executed, and verified for correctness. Test coverage data is fed back into the model to further refine test generation.
- Multi-format support and modularity: HEPH supports various input formats, including PDF, RST, RSTI, and HTML, and integrates with internal tools like Confluence and JIRA.
How does HEPH work?
HEPH takes in software requirements, software architecture documents (SWADs), interface control documents (ICDs), and test examples as input. The output is a set of test specifications and implementations for the given requirements (Figure 1).

The test generation process includes the following main steps:
- Data preparation: Input documents like SWADs and ICDs are indexed and stored in an embedding database, which is later used to query relevant information.
- Requirements extraction: Requirement details are retrieved from the requirement storage system (for example, Jama). If the input requirements lack the necessary details for test generation, HEPH automatically connects to the storage service, locates the requirement, and downloads the missing information.
- Data traceability: HEPH searches the embedding database to trace information related to the input requirements. The output is a mapped connection between the requirement and the relevant SWAD and ICD fragments.
- Test specification generation: Based on the verification steps from requirements and the identified SWAD and ICD fragments (traceability), HEPH generates both positive and negative test specifications to cover all aspects of the requirement.
- Test implementation generation: HEPH uses the ICD fragments (traceability) and the generated test specifications to create the tests in C/C++. The ICD provides context such as function names, data types, enumerations, and return codes, which the LLM uses during code generation. This step results in executable tests.
- Test execution: The generated tests are compiled and executed and coverage data is collected. The HEPH agent analyzes test coverage results and repeats the generation of test specifications and implementation for the missing cases.
A real-world scenario
Here’s a real-world example to understand how HEPH works.
NVIDIA DriveOS uses the QNX operating system. To demonstrate the capabilities of HEPH, I use one of the QNX BSP drivers as an example and a requirement for the thermal functionality in QNX BSP. Given the input requirements and driver documentation, HEPH extracts requirement information from Jama, traces it to the corresponding documentation fragments, and generates test specifications and implementations.
- Requirement extraction
- SWAD data traceability
- ICD data traceability
- Test specification generation
- Test implementation generation
Requirement extraction
HEPH extracts details from requirements storage (Jama) using a requirement identifier. The resulting requirement is formatted as JSON:
{ ? "requirement_id" : "DOSBSP60-REQ-3874" , ? "name" : "Get Temperature for Thermal Zones in the SOC_THERM Domain" , ? ? "description" : "When a request to retrieve the temperature of a thermal zone in the SOC_THERM domain is made via TMON_GUEST_LIF#GetTempSocTherm, QNX_BSP shall retrieve and return two consecutive temperature values for the thermal zone via TEGRA_CTRL_LIF#GetTemp, preserving the temperature resolution, along with two corresponding timestamps via DRIVE_OS::QNX_System::OS_LIBC_LIF#ReadClockCycles." } |
SWAD data traceability
HEPH searches through all architecture documentation (SWAD) to locate information relevant to the requirement. The output is a collection of architectural details, including unit requirements, methods, error handling, and verification criteria.
### Extracted Information for Requirement: DOSBSP60-REQ-3874 #### Unit Requirements and Functional Blocks - **UREQ_NvThermmon_Library_0601**: ? ? ? If the thermal zone bound to **Handle** is in the SOC_THERM domain, Thermal_API#GetTempSocTherm shall retrieve the bound zone ID, pass it to BPMP_Comm_API#GetZoneTemp to get the zone temperature, populate **TempVal1** with the zone temperature, use libc_api#QnxLibcReadClockCycles to populate **Timestamp1**, call BPMP_Comm_API#GetZoneTemp to get the zone temperature again, populate **TempVal2** with the zone temperature, use libc_api#QnxLibcReadClockCycles to populate **Timestamp2**, and return success as per Thermal_API ICD. ? ? - **UREQ_NvThermmon_Library_0602**: ? ? ? If BPMP_Comm_API#GetZoneTemp fails, Thermal_API#GetTempSocTherm shall return error as per Thermal_API ICD and perform no other action. #### Methods and Functions - **Functional Block FUNC_NvThermmon_01**: ? FUNC_NvThermmon_01 ? ? ? Thermal_API provides the Thermal_API#GetTempSmTmon and Thermal_API#GetTempSocTherm interface to query the temperature of the thermal zone bound to **Handle** and return the result in **zone_temp**. #### Error Handling - **UREQ_NvThermmon_Resmgr_1201**: ? ? ? If NvThermmon_Resmgr encounters a SW diagnostic error, the NvThermmon_Resmgr returns error as per Thermal_devctl_API ICD for all current and future Thermal_devctl_API requests and performs no other action. - **UREQ_NvThermmon_Resmgr_1202**: ? ? ? If libc_api#QnxLibcIoctl is called on ``/dev/<ZoneName>`` and the command is not Thermal_devctl_API#GetTemp or Thermal_devctl_API#SetAlert, NvThermmon_Resmgr shall return error as per Thermal_devctl_API ICD. #### Verification Criteria (Verification Environment: AV+Q Safety prod-debug) Verification Environment: AV+Q Safety prod-debug Pre-Condition: N/A Constraints: N/A Verification Steps: - Run interface tests on API - Pass invalid thermal zones and verify the API fails - Pass valid thermal zones from a process with required privileges and permissions and verify that the temperature can be read. - Inject error such that I2C_API#SendReceive returns error and verify that Thermal_API#GetTempSmTmon fails as expected. - Inject error such that BPMP_Comm_API#GetZoneTemp returns error and verify that Thermal_API#GetTempSocTherm fails as expected. |
ICD data traceability
After searching SWADs, HEPH looks for information in the ICDs. The result includes details on methods, functions, data structures, error codes, and function dependencies.
### Extracted Information from ICDs #### Methods and Functions: 1. **NvThermmonOpen** ? ? **Description**: Opens a connection to the nvthermmon driver for the thermal sensor/zone user is interested in. ? ? **Parameters**: ? ? - ` const char * const ZoneName`: Thermal sensor/zone name ? ? - `NvThermmonHandle * const Handle`: User handle for further API calls ? ? **Returns**: ? ? - `NV_THERMMON_ERR_CODE_NO_ERROR`: Success ? ? - `NV_THERMMON_ERR_CODE_INVALID_PARAM`: Invalid parameters ? ? - `NV_THERMMON_ERR_CODE_DT_FAILED`: Device tree failure ? ? - `NV_THERMMON_ERR_CODE_INVALID_PERM`: Permission error ? ? - `NV_THERMMON_ERR_CODE_INVALID_STATE`: Invalid system state ? ? - `NV_THERMMON_ERR_CODE_ERROR`: BPMP Comm error ? ? **Precondition**: None ? ? **Usage Considerations**: ? ? - Allowed context for the API call ? ? - Sync, not re-entrant 2. **NvThermmonGetZoneTemp** ? ? **Description**: Read temperature from selected thermal zone. Populates memory pointed to by `temp` with temperature of thermal zone in millidegrees Celsius and returns status. ? ? **Parameters**: ? ? - ` const NvThermmonHandle Handle`: Handle for execution context bound in `NvThermmonOpen` ? ? - ` int32_t * const TempVal`: Pointer to save temperature value in millidegrees Celsius. ? ? **Returns**: ? ? - `NvThermmonErrCode`: Error Code (e.g., `NV_THERMMON_ERR_CODE_NO_ERROR`, `NV_THERMMON_ERR_CODE_INVALID_PARAM`) #### Data Structures and Fields: 1. **zone_temp_t Structure** ? ? **Fields**: ? ? - ` int32_t temp_1`: First temperature reading ? ? - ` int32_t temp_2`: Second temperature reading ? ? - ` uint64_t ts_1`: Timestamp for first temp reading ? ? - ` uint64_t ts_2`: Timestamp for second temp reading #### Enumerations: 1. **NvThermmonErrCode** ? ? **Error Codes**: ? ? - `NV_THERMMON_ERR_CODE_NO_ERROR` ? ? - `NV_THERMMON_ERR_CODE_INVALID_PARAM` ? ? - `NV_THERMMON_ERR_CODE_DT_FAILED` ? ? - `NV_THERMMON_ERR_CODE_INVALID_PERM` ? ? - `NV_THERMMON_ERR_CODE_INVALID_STATE` ? ? - `NV_THERMMON_ERR_CODE_ERROR` ### Integration of Information for Testing: 1. **NvThermmonOpen()**: ? ? **Precondition**: Ensure that `NvThermmonOpen` returns success for the thermal zone. ? ? **Dependencies**: ? ? - Handle for execution context 2. **NvThermmonGetZoneTemp()**: ? ? **Dependencies**: ? ? - `zone_temp_t` structure to store temperature readings and timestamps. ? ? - Error codes to handle different edge cases and validation steps. |
Test specification generation
Based on the verification steps from requirements and the traced SWAD and ICD fragments, HEPH generates test specifications. The following code blocks are examples of generated test cases.
###### Test Case ID: TC_DOSBSP60-REQ-3874_001 Test Case Name: Verify successful temperature query for a supported thermal zone Test Type: Positive Test Design: Ensure the temperature for a thermal zone in the SOC_THERM domain can be queried successfully. Pre-condition: Thermal zone is supported and NvThermmonOpen() returns success for the thermal zone. Test Procedure: 1. Open a connection to the thermal zone using NvThermmonOpen(). 2. Query the temperature for the thermal zone using NvThermmonGetZoneTemp(). 3. Retrieve two consecutive temperature values. Expected Result: The API should return success with two temperature values that differ by no more than the delta determined by the thermal characteristics of the board. ###### Test Case ID: TC_DOSBSP60-REQ-3874_002 Test Case Name: Verify accompanying timestamp for temperature queries Test Type: Positive Test Design: Ensure that two timestamp values are returned with temperature queries. Pre-condition: Thermal zone is supported and NvThermmonOpen() returns success for the thermal zone. Test Procedure: 1. Open a connection to the thermal zone using NvThermmonOpen(). 2. Query the temperature for the thermal zone using NvThermmonGetZoneTemp(). 3. Retrieve two consecutive temperature values along with two corresponding timestamps. Expected Result: The API should return success with two timestamp values that do not exceed T microseconds between them. ###### Test Case ID: TC_DOSBSP60-REQ-3874_003 Test Case Name: Verify temperature variation over successive calls Test Type: Positive Test Design: Ensure that the temperature for the thermal zone varies over successive queries. Pre-condition: Thermal zone is supported and NvThermmonOpen() returns success for the thermal zone. Test Procedure: 1. Open a connection to the thermal zone using NvThermmonOpen(). 2. Query the temperature for the thermal zone using NvThermmonGetZoneTemp() multiple times. 3. Compare temperature values over successive calls. Expected Result: The temperature values should show a difference of >0 and should vary with the resolution supported by the temperature sensor. |
Test implementation generation
Based on the test specifications and ICD details, HEPH generates the tests in C.
#include <stdio.h> #include <stdint.h> #include "NvThermmonAPI.h" // Test Case TC_DOSBSP60-REQ-3874_001 // Verify successful temperature query for a supported thermal zone void TC_DOSBSP60_REQ_3874_001( void ) { ? ? NvThermmonHandle handle; ? ? zone_temp_t zoneTemp; ? ? // Step 1: Open a connection to the thermal zone using NvThermmonOpen() ? ? NvThermmonErrCode status = NvThermmonOpen( "CPU-therm" , &handle); ? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_001: NvThermmonOpen failed\n" ); ? ? ? ? return ; ? ? } ? ? // Step 2: Query the temperature for the thermal zone using NvThermmonGetZoneTemp() ? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp); ? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_001: NvThermmonGetZoneTemp failed\n" ); ? ? ? ? return ; ? ? } ? ? // Step 3: Retrieve two consecutive temperature values ? ? int32_t delta = zoneTemp.temp_2 - zoneTemp.temp_1; ? ? if (delta > 100) { // Assume 100 is the delta determined by the thermal characteristics of the board ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_001: Temperature delta is too large\n" ); ? ? } else { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_001: Passed\n" ); ? ? } } // Test Case TC_DOSBSP60-REQ-3874_002 // Verify accompanying timestamp for temperature queries void TC_DOSBSP60_REQ_3874_002( void ) { ? ? NvThermmonHandle handle; ? ? zone_temp_t zoneTemp; ? ? // Step 1: Open a connection to the thermal zone using NvThermmonOpen() ? ? NvThermmonErrCode status = NvThermmonOpen( "CPU-therm" , &handle); ? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_002: NvThermmonOpen failed\n" ); ? ? ? ? return ; ? ? } ? ? // Step 2: Query the temperature for the thermal zone using NvThermmonGetZoneTemp() ? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp); ? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_002: NvThermmonGetZoneTemp failed\n" ); ? ? ? ? return ; ? ? } ? ? // Step 3: Retrieve two consecutive temperature values along with two corresponding timestamps ? ? uint64_t deltaTs = zoneTemp.ts_2 - zoneTemp.ts_1; ? ? if (deltaTs > 1000) { // Assume 1000 microseconds is the maximum allowed delta ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_002: Timestamp delta is too large\n" ); ? ? } else { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_002: Passed\n" ); ? ? } } // Test Case TC_DOSBSP60-REQ-3874_003 // Verify temperature variation over successive calls void TC_DOSBSP60_REQ_3874_003( void ) { ? ? NvThermmonHandle handle; ? ? zone_temp_t zoneTemp1, zoneTemp2; ? ? // Step 1: Open a connection to the thermal zone using NvThermmonOpen() ? ? NvThermmonErrCode status = NvThermmonOpen( "CPU-therm" , &handle); ? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_003: NvThermmonOpen failed\n" ); ? ? ? ? return ; ? ? } ? ? // Step 2: Query the temperature for the thermal zone using NvThermmonGetZoneTemp() multiple times ? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp1); ? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_003: NvThermmonGetZoneTemp (1st call) failed\n" ); ? ? ? ? return ; ? ? } ? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp2); ? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_003: NvThermmonGetZoneTemp (2nd call) failed\n" ); ? ? ? ? return ; ? ? } ? ? // Step 3: Compare temperature values over successive calls ? ? int32_t temperatureDifference = zoneTemp2.temp_1 - zoneTemp1.temp_1; ? ? if (temperatureDifference <= 0) { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_003: Temperature did not vary as expected\n" ); ? ? } else { ? ? ? ? printf ( "TC_DOSBSP60-REQ-3874_003: Passed\n" ); ? ? } } |
Future enhancements
There are a few core things to focus on when designing a test generation framework:
- Supporting different test workflows
- Integrating real-time human feedback
Supporting different test workflows
HEPH is designed to support most test-generation use cases for software teams. Still, there are instances when teams require a custom test framework or an unsupported test creation workflow.
To address these challenges, potential future improvements to HEPH could include a modular design, enabling software teams to define custom modules for non-standard workflows. For example, teams might benefit from the ability to generate tests directly from the input code rather than documentation or extend LLM prompts. This modular approach could help tackle the complexities of custom and non-standard test generation scenarios.
Integrating real-time human feedback
While the latest LLMs perform well in understanding the development context and generating high-quality code, there are cases where generated test sets may require improvement.
Possible future enhancements to HEPH could include the introduction of an interactive mode alongside the current automatic mode. In this interactive mode, you’d interact with the HEPH agent at each step of the test generation process, reviewing results, providing feedback, and refining outputs before proceeding. For instance, if the generated test specifications lacked details about register mapping, you could add this context, and HEPH would regenerate more accurate test specifications.
Start HEPHing your automatic test generation
Hephaestus (HEPH) automates test generation in software development by using LLMs to create comprehensive and context-aware tests. This automation reduces manual effort, accelerates development, and improves the quality and reliability of the final product.
Upcoming enhancements include modularity, allowing for custom modules to support non-standard testing workflows, and an interactive mode enabling you to provide feedback and refine the test generation process for higher accuracy and relevance.
Build your AI agent application
For more information about using NVIDIA generative AI technologies and tools to create your own AI agents and applications, see ai.nvidia.com or try out NVIDIA NIM APIs.
If you’re new to this area, explore our beginner-friendly series, Building Your First LLM Agent Application.
Acknowledgments
Thanks to the DriveOS QNX BSP team for piloting HEPH.