ChatGPT Incorrectness Detection in Software Reviews

Minaoar Hossain Tanzil; Junaed Younus Khan; Gias Uddin

doi:10.1145/3597503.3639194

What is it about?

As much as 20% of ChatGPT responses can be incorrect. In this study, we first survey software industry professionals to understand how they are using ChatGPT and tackling the reliability issues. Then we developed a tool named ChatGPT incorrectness detector (CID) inspired by the survey and criminal psychology. In an evaluation study of software reviews, CID could detect incorrect responses with 75% accuracy.

Photo by Mariia Shalabaieva on Unsplash

Why is it important?

This is the first approach that could detect the incorrect responses of ChatGPT based on regular Chat or API interfaces. As the use of ChatGPT is growing, the developed technique can be used by any regular user or by researchers to improve ChatGPT trustworthiness.

Perspectives

This can be extended by researchers for improving ChatGPT or any such other large language models (like Google BARD). Users will also be able use the similar technique.
Minaoar Hossain Tanzil

This page is a summary of: ChatGPT Incorrectness Detection in Software Reviews, April 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3597503.3639194.
You can read the full text:

Read

Resources

Presentation
ChatGPT Incorrectness Detection in Software Reviews
Presented at ICSE 2024

Contributors

The following have contributed to this page

Minaoar Hossain Tanzil

ChatGPT Incorrectness Detection in Software Reviews

What is it about?

Why is it important?

Perspectives

Resources