The University of Sheffield
Browse
- No file added yet -

Replication Package: Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs

Download (7.33 MB)
software
posted on 2024-02-01, 23:59 authored by Donghwan ShinDonghwan Shin, Ziyu Li

Replication package of "Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs", Ziyu Li and Donghwan Shin, to appear in the Proceedings of the 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN 2024).

In this paper, we propose a novel method to systematically assess the code understanding performance of LLMs, particularly focusing on subtle differences between code and its descriptions, by introducing code mutations to existing code generation datasets. Code mutations are small changes that alter the semantics of the original code, creating a mismatch with the natural language description. We apply different types of code mutations, such as operator replacement and statement deletion, to generate inconsistent code-description pairs. We then use these pairs to test the ability of LLMs to correctly detect the inconsistencies.

We propose a new LLM testing method, called Mutation-based Consistency Testing (MCT), and conduct a case study on the two popular LLMs, GPT-3.5 and GPT-4, using the state-of-the-art code generation benchmark, HumanEval-X, which consists of six programming languages (Python, C++, Java, Go, JavaScript, and Rust). We compare the performance of the LLMs across different types of code mutations and programming languages and analyze the results. We find that the LLMs show significant variation in their code understanding performance and that they have different strengths and weaknesses depending on the mutation type and language.

History

Ethics

  • There is no personal data or any that requires ethical approval

Policy

  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The uploaded data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • Headings and units are explained in the files

Responsibility

  • The depositor is responsible for the content and sharing of the attached files

Usage metrics

    School of Computer Science

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC