Replication Package: Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs
Replication package of "Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs", Ziyu Li and Donghwan Shin, to appear in the Proceedings of the 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN 2024).
In this paper, we propose a novel method to systematically assess the code understanding performance of LLMs, particularly focusing on subtle differences between code and its descriptions, by introducing code mutations to existing code generation datasets. Code mutations are small changes that alter the semantics of the original code, creating a mismatch with the natural language description. We apply different types of code mutations, such as operator replacement and statement deletion, to generate inconsistent code-description pairs. We then use these pairs to test the ability of LLMs to correctly detect the inconsistencies.
We propose a new LLM testing method, called Mutation-based Consistency Testing (MCT), and conduct a case study on the two popular LLMs, GPT-3.5 and GPT-4, using the state-of-the-art code generation benchmark, HumanEval-X, which consists of six programming languages (Python, C++, Java, Go, JavaScript, and Rust). We compare the performance of the LLMs across different types of code mutations and programming languages and analyze the results. We find that the LLMs show significant variation in their code understanding performance and that they have different strengths and weaknesses depending on the mutation type and language.
History
Ethics
- There is no personal data or any that requires ethical approval
Policy
- The data complies with the institution and funders' policies on access and sharing
Sharing and access restrictions
- The uploaded data can be shared openly
Data description
- The file formats are open or commonly used
Methodology, headings and units
- Headings and units are explained in the files
Responsibility
- The depositor is responsible for the content and sharing of the attached files