What is it about?

Software developers rely on unit tests to make sure their code works correctly. However, writing these tests by hand takes time and effort. With the rise of AI systems like ChatGPT, there is growing interest in using AI to automatically generate unit tests. In this study, we systematically evaluate how well ChatGPT performs at generating unit tests for real-world software projects. We examine whether the generated tests compile, run successfully, and correctly check program behavior. Our findings show that while ChatGPT can produce readable and structured tests, many of them contain errors or incorrect checks. To address these limitations, we propose an automated improvement framework called ChatTester. Instead of generating tests only once, ChatTester iteratively refines them based on feedback from compilation and execution results. This process significantly improves the reliability and correctness of AI-generated tests. Our work highlights both the potential and the current limitations of AI-driven test generation, and provides practical insights for making AI-assisted software development more reliable.

Featured Image

Read the Original

This page is a summary of: Evaluating and Improving ChatGPT for Unit Test Generation, Proceedings of the ACM on Software Engineering, July 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3660783.
You can read the full text:

Read

Contributors

The following have contributed to this page