Building AI Agents to Automate Software Test Case Creation

In software development, testing is crucial for ensuring the quality and reliability of the final product. However, creating test plans and specifications can be time-consuming and labor-intensive, especially when managing multiple requirements and diverse test types in complex systems. Many of these tasks are traditionally performed manually by test engineers.

This post is part of the NVIDIA Chat Labs series, which shares insights and best practices developed from the internal generative AI projects that we create to help others navigate AI adoption.

To streamline this process, the DriveOS team at NVIDIA developed Hephaestus (HEPH), an internal generative AI framework for automatic test generation. HEPH automates design and implementation for various tests, including integration and unit tests. It uses large language models (LLMs) for input analysis and code generation, significantly reducing the time spent on creating test cases. By generating context-aware tests based on input documentation, code samples, and feedback loops, HEPH makes testing faster and more efficient.

This post provides an overview of how an agent framework was built to generate various types of software tests. It covers how an LLM agent is used to ensure document traceability and create executable tests based on software requirements. You also get ideas for improving the agent’s test-generation capabilities.

An agentic framework for automatic test case creation

Development, security, and QA teams often face the labor-intensive task of manual test case creation during the software development process.

The test creation process involves time-intensive steps, including retrieving requirement information, tracing those requirements to relevant documentation, and ultimately generating tests based on the aligned requirements and documentation.

To simplify this workflow, the DriveOS team designed HEPH, a custom, internal framework for automating the entire testing process, significantly reducing the time spent on creating test specifications and implementations. HEPH uses LLMs to analyze input documentation and code samples, generating tests that are tailored to the provided requirements.

Value of test automation

HEPH uses an LLM agent for every step in the test generation process—from document traceability to code generation. This enables the automation of the entire testing workflow and saves engineering teams many hours.

Time savings: HEPH dramatically accelerates the test creation process. In trials with multiple pilot teams at NVIDIA, teams reported saving up to 10 weeks of development time.
Context-aware test generation: HEPH uses project documentation and interface specifications to generate test specifications and implementations. Each test is compiled, executed, and verified for correctness. Test coverage data is fed back into the model to further refine test generation.
Multi-format support and modularity: HEPH supports various input formats, including PDF, RST, RSTI, and HTML, and integrates with internal tools like Confluence and JIRA.

How does HEPH work?

HEPH takes in software requirements, software architecture documents (SWADs), interface control documents (ICDs), and test examples as input. The output is a set of test specifications and implementations for the given requirements (Figure 1).

The diagram shows various inputs for analysis that undergo a feedback loop, resulting in relevant outputs for test generation. — *Figure 1. HEPH technical architecture*

The test generation process includes the following main steps:

Data preparation: Input documents like SWADs and ICDs are indexed and stored in an embedding database, which is later used to query relevant information.
Requirements extraction: Requirement details are retrieved from the requirement storage system (for example, Jama). If the input requirements lack the necessary details for test generation, HEPH automatically connects to the storage service, locates the requirement, and downloads the missing information.
Data traceability: HEPH searches the embedding database to trace information related to the input requirements. The output is a mapped connection between the requirement and the relevant SWAD and ICD fragments.
Test specification generation: Based on the verification steps from requirements and the identified SWAD and ICD fragments (traceability), HEPH generates both positive and negative test specifications to cover all aspects of the requirement.
Test implementation generation: HEPH uses the ICD fragments (traceability) and the generated test specifications to create the tests in C/C++. The ICD provides context such as function names, data types, enumerations, and return codes, which the LLM uses during code generation. This step results in executable tests.
Test execution: The generated tests are compiled and executed and coverage data is collected. The HEPH agent analyzes test coverage results and repeats the generation of test specifications and implementation for the missing cases.

A real-world scenario

Here’s a real-world example to understand how HEPH works.

NVIDIA DriveOS uses the QNX operating system. To demonstrate the capabilities of HEPH, I use one of the QNX BSP drivers as an example and a requirement for the thermal functionality in QNX BSP. Given the input requirements and driver documentation, HEPH extracts requirement information from Jama, traces it to the corresponding documentation fragments, and generates test specifications and implementations.

Requirement extraction
SWAD data traceability
ICD data traceability
Test specification generation
Test implementation generation

Requirement extraction

HEPH extracts details from requirements storage (Jama) using a requirement identifier. The resulting requirement is formatted as JSON:

{
? "requirement_id": "DOSBSP60-REQ-3874",
? "name": "Get Temperature for Thermal Zones in the SOC_THERM Domain",
? ? "description": "When a request to retrieve the temperature of a thermal zone in the SOC_THERM domain is made via TMON_GUEST_LIF#GetTempSocTherm, QNX_BSP shall retrieve and return two consecutive temperature values for the thermal zone via TEGRA_CTRL_LIF#GetTemp, preserving the temperature resolution, along with two corresponding timestamps via DRIVE_OS::QNX_System::OS_LIBC_LIF#ReadClockCycles."
}

SWAD data traceability

HEPH searches through all architecture documentation (SWAD) to locate information relevant to the requirement. The output is a collection of architectural details, including unit requirements, methods, error handling, and verification criteria.

### Extracted Information for Requirement: DOSBSP60-REQ-3874
 
#### Unit Requirements and Functional Blocks
- **UREQ_NvThermmon_Library_0601**:
? ? ? If the thermal zone bound to **Handle** is in the SOC_THERM domain, Thermal_API#GetTempSocTherm shall retrieve the bound zone ID, pass it to BPMP_Comm_API#GetZoneTemp to get the zone temperature, populate **TempVal1** with the zone temperature, use libc_api#QnxLibcReadClockCycles to populate **Timestamp1**, call BPMP_Comm_API#GetZoneTemp to get the zone temperature again, populate **TempVal2** with the zone temperature, use libc_api#QnxLibcReadClockCycles to populate **Timestamp2**, and return success as per Thermal_API ICD.
? ?
- **UREQ_NvThermmon_Library_0602**:
? ? ? If BPMP_Comm_API#GetZoneTemp fails, Thermal_API#GetTempSocTherm shall return error as per Thermal_API ICD and perform no other action.
 
#### Methods and Functions
- **Functional Block FUNC_NvThermmon_01**:
? FUNC_NvThermmon_01
? ? ? Thermal_API provides the Thermal_API#GetTempSmTmon and Thermal_API#GetTempSocTherm interface to query the temperature of the thermal zone bound to **Handle** and return the result in **zone_temp**.
 
#### Error Handling
- **UREQ_NvThermmon_Resmgr_1201**:
? ? ? If NvThermmon_Resmgr encounters a SW diagnostic error, the NvThermmon_Resmgr returns error as per Thermal_devctl_API ICD for all current and future Thermal_devctl_API requests and performs no other action.
 
- **UREQ_NvThermmon_Resmgr_1202**:
? ? ? If libc_api#QnxLibcIoctl is called on ``/dev/<ZoneName>`` and the command is not Thermal_devctl_API#GetTemp or Thermal_devctl_API#SetAlert, NvThermmon_Resmgr shall return error as per Thermal_devctl_API ICD.
 
#### Verification Criteria (Verification Environment: AV+Q Safety prod-debug)
Verification Environment: AV+Q Safety prod-debug
Pre-Condition: N/A
Constraints: N/A
Verification Steps:
- Run interface tests on API
- Pass invalid thermal zones and verify the API fails
- Pass valid thermal zones from a process with required privileges and permissions and verify that the temperature can be read.
- Inject error such that I2C_API#SendReceive returns error and verify that Thermal_API#GetTempSmTmon fails as expected.
- Inject error such that BPMP_Comm_API#GetZoneTemp returns error and verify that Thermal_API#GetTempSocTherm fails as expected.

ICD data traceability

After searching SWADs, HEPH looks for information in the ICDs. The result includes details on methods, functions, data structures, error codes, and function dependencies.

### Extracted Information from ICDs
 
#### Methods and Functions:
 
1. **NvThermmonOpen**
? ? **Description**: Opens a connection to the nvthermmon driver for the thermal sensor/zone user is interested in.
 
? ? **Parameters**:
? ? - `const char *const ZoneName`: Thermal sensor/zone name
? ? - `NvThermmonHandle *const Handle`: User handle for further API calls
 
? ? **Returns**:
? ? - `NV_THERMMON_ERR_CODE_NO_ERROR`: Success
? ? - `NV_THERMMON_ERR_CODE_INVALID_PARAM`: Invalid parameters
? ? - `NV_THERMMON_ERR_CODE_DT_FAILED`: Device tree failure
? ? - `NV_THERMMON_ERR_CODE_INVALID_PERM`: Permission error
? ? - `NV_THERMMON_ERR_CODE_INVALID_STATE`: Invalid system state
? ? - `NV_THERMMON_ERR_CODE_ERROR`: BPMP Comm error
 
? ? **Precondition**: None
 
? ? **Usage Considerations**:
? ? - Allowed context for the API call
? ? - Sync, not re-entrant
2. **NvThermmonGetZoneTemp**
? ? **Description**: Read temperature from selected thermal zone. Populates memory pointed to by `temp` with temperature of thermal zone in millidegrees Celsius and returns status.
 
? ? **Parameters**:
? ? - `const NvThermmonHandle Handle`: Handle for execution context bound in `NvThermmonOpen`
? ? - `int32_t * const TempVal`: Pointer to save temperature value in millidegrees Celsius.
 
? ? **Returns**:
? ? - `NvThermmonErrCode`: Error Code (e.g., `NV_THERMMON_ERR_CODE_NO_ERROR`, `NV_THERMMON_ERR_CODE_INVALID_PARAM`)
 
#### Data Structures and Fields:
 
1. **zone_temp_t Structure**
? ? **Fields**:
? ? - `int32_t temp_1`: First temperature reading
? ? - `int32_t temp_2`: Second temperature reading
? ? - `uint64_t ts_1`: Timestamp for first temp reading
? ? - `uint64_t ts_2`: Timestamp for second temp reading
 
#### Enumerations:
1. **NvThermmonErrCode**
 
? ? **Error Codes**:
? ? - `NV_THERMMON_ERR_CODE_NO_ERROR`
? ? - `NV_THERMMON_ERR_CODE_INVALID_PARAM`
? ? - `NV_THERMMON_ERR_CODE_DT_FAILED`
? ? - `NV_THERMMON_ERR_CODE_INVALID_PERM`
? ? - `NV_THERMMON_ERR_CODE_INVALID_STATE`
? ? - `NV_THERMMON_ERR_CODE_ERROR`
 
### Integration of Information for Testing:
 
1. **NvThermmonOpen()**:
? ? **Precondition**: Ensure that `NvThermmonOpen` returns success for the thermal zone.
? ? **Dependencies**:
? ? - Handle for execution context
 
2. **NvThermmonGetZoneTemp()**:
? ? **Dependencies**:
? ? - `zone_temp_t` structure to store temperature readings and timestamps.
? ? - Error codes to handle different edge cases and validation steps.

Test specification generation

Based on the verification steps from requirements and the traced SWAD and ICD fragments, HEPH generates test specifications. The following code blocks are examples of generated test cases.

######
Test Case ID: TC_DOSBSP60-REQ-3874_001
Test Case Name: Verify successful temperature query for a supported thermal zone
Test Type: Positive
Test Design: Ensure the temperature for a thermal zone in the SOC_THERM domain can be queried successfully.
Pre-condition: Thermal zone is supported and NvThermmonOpen() returns success for the thermal zone.
Test Procedure:
1. Open a connection to the thermal zone using NvThermmonOpen().
2. Query the temperature for the thermal zone using NvThermmonGetZoneTemp().
3. Retrieve two consecutive temperature values.
Expected Result: The API should return success with two temperature values that differ by no more than the delta determined by the thermal characteristics of the board.
 
######
Test Case ID: TC_DOSBSP60-REQ-3874_002
Test Case Name: Verify accompanying timestamp for temperature queries
Test Type: Positive
Test Design: Ensure that two timestamp values are returned with temperature queries.
Pre-condition: Thermal zone is supported and NvThermmonOpen() returns success for the thermal zone.
Test Procedure:
1. Open a connection to the thermal zone using NvThermmonOpen().
2. Query the temperature for the thermal zone using NvThermmonGetZoneTemp().
3. Retrieve two consecutive temperature values along with two corresponding timestamps.
Expected Result: The API should return success with two timestamp values that do not exceed T microseconds between them.
 
######
Test Case ID: TC_DOSBSP60-REQ-3874_003
Test Case Name: Verify temperature variation over successive calls
Test Type: Positive
Test Design: Ensure that the temperature for the thermal zone varies over successive queries.
Pre-condition: Thermal zone is supported and NvThermmonOpen() returns success for the thermal zone.
Test Procedure:
1. Open a connection to the thermal zone using NvThermmonOpen().
2. Query the temperature for the thermal zone using NvThermmonGetZoneTemp() multiple times.
3. Compare temperature values over successive calls.
Expected Result: The temperature values should show a difference of >0 and should vary with the resolution supported by the temperature sensor.

Test implementation generation

Based on the test specifications and ICD details, HEPH generates the tests in C.

#include <stdio.h>
#include <stdint.h>
#include "NvThermmonAPI.h"
 
// Test Case TC_DOSBSP60-REQ-3874_001
// Verify successful temperature query for a supported thermal zone
void TC_DOSBSP60_REQ_3874_001(void) {
? ? NvThermmonHandle handle;
? ? zone_temp_t zoneTemp;
 
? ? // Step 1: Open a connection to the thermal zone using NvThermmonOpen()
? ? NvThermmonErrCode status = NvThermmonOpen("CPU-therm", &handle);
? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_001: NvThermmonOpen failed\n");
? ? ? ? return;
? ? }
 
? ? // Step 2: Query the temperature for the thermal zone using NvThermmonGetZoneTemp()
? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp);
? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_001: NvThermmonGetZoneTemp failed\n");
? ? ? ? return;
? ? }
 
? ? // Step 3: Retrieve two consecutive temperature values
? ? int32_t delta = zoneTemp.temp_2 - zoneTemp.temp_1;
? ? if (delta > 100) { // Assume 100 is the delta determined by the thermal characteristics of the board
? ? ? ? printf("TC_DOSBSP60-REQ-3874_001: Temperature delta is too large\n");
? ? } else {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_001: Passed\n");
? ? }
}
 
// Test Case TC_DOSBSP60-REQ-3874_002
// Verify accompanying timestamp for temperature queries
void TC_DOSBSP60_REQ_3874_002(void) {
? ? NvThermmonHandle handle;
? ? zone_temp_t zoneTemp;
 
? ? // Step 1: Open a connection to the thermal zone using NvThermmonOpen()
? ? NvThermmonErrCode status = NvThermmonOpen("CPU-therm", &handle);
? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_002: NvThermmonOpen failed\n");
? ? ? ? return;
? ? }
 
? ? // Step 2: Query the temperature for the thermal zone using NvThermmonGetZoneTemp()
? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp);
? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_002: NvThermmonGetZoneTemp failed\n");
? ? ? ? return;
? ? }
 
? ? // Step 3: Retrieve two consecutive temperature values along with two corresponding timestamps
? ? uint64_t deltaTs = zoneTemp.ts_2 - zoneTemp.ts_1;
? ? if (deltaTs > 1000) { // Assume 1000 microseconds is the maximum allowed delta
? ? ? ? printf("TC_DOSBSP60-REQ-3874_002: Timestamp delta is too large\n");
? ? } else {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_002: Passed\n");
? ? }
}
 
// Test Case TC_DOSBSP60-REQ-3874_003
// Verify temperature variation over successive calls
void TC_DOSBSP60_REQ_3874_003(void) {
? ? NvThermmonHandle handle;
? ? zone_temp_t zoneTemp1, zoneTemp2;
 
? ? // Step 1: Open a connection to the thermal zone using NvThermmonOpen()
? ? NvThermmonErrCode status = NvThermmonOpen("CPU-therm", &handle);
? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_003: NvThermmonOpen failed\n");
? ? ? ? return;
? ? }
 
? ? // Step 2: Query the temperature for the thermal zone using NvThermmonGetZoneTemp() multiple times
? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp1);
? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_003: NvThermmonGetZoneTemp (1st call) failed\n");
? ? ? ? return;
? ? }
 
? ? status = NvThermmonGetZoneTemp(handle, &zoneTemp2);
? ? if (status != NV_THERMMON_ERR_CODE_NO_ERROR) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_003: NvThermmonGetZoneTemp (2nd call) failed\n");
? ? ? ? return;
? ? }
 
? ? // Step 3: Compare temperature values over successive calls
? ? int32_t temperatureDifference = zoneTemp2.temp_1 - zoneTemp1.temp_1;
? ? if (temperatureDifference <= 0) {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_003: Temperature did not vary as expected\n");
? ? } else {
? ? ? ? printf("TC_DOSBSP60-REQ-3874_003: Passed\n");
? ? }
}

Future enhancements

There are a few core things to focus on when designing a test generation framework:

Supporting different test workflows
Integrating real-time human feedback

Supporting different test workflows

HEPH is designed to support most test-generation use cases for software teams. Still, there are instances when teams require a custom test framework or an unsupported test creation workflow.

To address these challenges, potential future improvements to HEPH could include a modular design, enabling software teams to define custom modules for non-standard workflows. For example, teams might benefit from the ability to generate tests directly from the input code rather than documentation or extend LLM prompts. This modular approach could help tackle the complexities of custom and non-standard test generation scenarios.

Integrating real-time human feedback

While the latest LLMs perform well in understanding the development context and generating high-quality code, there are cases where generated test sets may require improvement.

Possible future enhancements to HEPH could include the introduction of an interactive mode alongside the current automatic mode. In this interactive mode, you’d interact with the HEPH agent at each step of the test generation process, reviewing results, providing feedback, and refining outputs before proceeding. For instance, if the generated test specifications lacked details about register mapping, you could add this context, and HEPH would regenerate more accurate test specifications.

Start HEPHing your automatic test generation

Hephaestus (HEPH) automates test generation in software development by using LLMs to create comprehensive and context-aware tests. This automation reduces manual effort, accelerates development, and improves the quality and reliability of the final product.

Upcoming enhancements include modularity, allowing for custom modules to support non-standard testing workflows, and an interactive mode enabling you to provide feedback and refine the test generation process for higher accuracy and relevance.

Build your AI agent application

For more information about using NVIDIA generative AI technologies and tools to create your own AI agents and applications, see ai.nvidia.com or try out NVIDIA NIM APIs.

If you’re new to this area, explore our beginner-friendly series, Building Your First LLM Agent Application.

Acknowledgments

Thanks to the DriveOS QNX BSP team for piloting HEPH.

Building AI Agents to Automate Software Test Case Creation

An agentic framework for automatic test case creation

Value of test automation

How does HEPH work?

A real-world scenario

Requirement extraction

SWAD data traceability

ICD data traceability

Test specification generation

Test implementation generation

Future enhancements

Supporting different test workflows

Integrating real-time human feedback

Start HEPHing your automatic test generation

Build your AI agent application

Acknowledgments

Related resources

Tags

About the Authors

Building AI Agents to Automate Software Test Case Creation

An agentic framework for automatic test case creation

Value of test automation

How does HEPH work?

A real-world scenario

Requirement extraction

SWAD data traceability

ICD data traceability

Test specification generation

Test implementation generation

Future enhancements

Supporting different test workflows

Integrating real-time human feedback

Start HEPHing your automatic test generation

Build your AI agent application

Acknowledgments

Related resources

Tags

About the Authors

Comments

Related posts

Rapidly Triage Container Security with the Vulnerability Analysis NVIDIA NIM Agent Blueprint

Evolving AI-Powered Game Development with Retrieval-Augmented Generation

How to Take a RAG Application from Pilot to Production in Four Steps

Deploy an AI Coding Assistant with NVIDIA TensorRT-LLM and NVIDIA Triton

Mastering LLM Techniques: LLMOps

Related posts

Three Building Blocks for Creating AI Virtual Assistants for Customer Service with an NVIDIA NIM Agent Blueprint

Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes

IBM’s New Granite 3.0 Generative AI Models Are Small, Yet Highly Accurate and Efficient

Train Highly Accurate LLMs with the Zyda-2 Open 5T-Token Dataset Processed with NVIDIA NeMo Curator

DataStax Announces New AI Development Platform, Built with NVIDIA AI