Replication Package: Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs
Replication package of "Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs", Ziyu Li and Donghwan Shin, to appear in the Proceedings of the 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN 2024).
In this paper, we propose a novel method to systematically assess the code understanding performance of LLMs, particularly focusing on subtle differences between code and its descriptions, by introducing code mutations to existing code generation datasets. Code mutations are small changes that alter the semantics of the original code, creating a mismatch with the natural language description. We apply different types of code mutations, such as operator replacement and statement deletion, to generate inconsistent code-description pairs. We then use these pairs to test the ability of LLMs to correctly detect the inconsistencies.
- There is no personal data or any that requires ethical approval
- The data complies with the institution and funders' policies on access and sharing
Sharing and access restrictions
- The uploaded data can be shared openly
- The file formats are open or commonly used
Methodology, headings and units
- Headings and units are explained in the files