<p dir="ltr">We introduce a <b>multilingual parallel corpus</b> specifically curated to evaluate the <b>figurative quality</b> of machine translation (MT) outputs. The dataset consists of English sentences containing metaphors, along with their corresponding <b>human post-edited translations</b> in multiple target languages, including <b>Chinese</b> and <b>Italian</b>.</p><p dir="ltr">The corpus was constructed through a two-stage process:</p><ol><li><b>Initial Translation</b>: Sentences containing figurative expressions were translated using standard MT systems (e.g., LLM-based or neural MT).</li><li><b>Human Post-editing</b>: These translations were subsequently <b>post-edited by native speakers or professional translators</b> with linguistics training to improve fluency, semantic accuracy, and especially fidelity to the figurative meaning.</li></ol><p dir="ltr">Each example in the dataset includes:</p><ul><li>The <b>original English sentence</b> with the figurative expression.</li><li>The <b>raw machine translation</b> in the target language.</li><li>The <b>post-edited version</b> of the translation.</li><li><b>Annotations</b> based on four human evaluation metrics:</li><li><ul><li><b>Quality</b>: Considering fluency, intelligibility, fidelity, and overall quality.</li><li><b>Metaphorical Equivalence</b>: How well the metaphorical meaning is preserved.</li><li><b>Emotion</b>: Whether the emotional tone or affect is maintained.</li><li><b>Authenticity</b>: Whether the translation sounds natural and idiomatic in the target language.</li></ul></li></ul><p dir="ltr">Each sample is <b>triple-annotated</b>, with disagreements resolved through professional translator review, ensuring high inter-annotator reliability.</p><p dir="ltr">This dataset supports research in:</p><ul><li>Evaluating figurative language in MT.</li><li>Improving translation systems’ handling of metaphors.</li><li>Developing automatic metrics that align better with human judgements of figurative quality.</li></ul><p></p>
Funding
Shun Wang
History
Methodology, headings and units
Headings and units are explained in the files
Policy
The data complies with the institution and funders' policies on access and sharing
Sharing and access restrictions
The uploaded data can be shared openly
Data description
The file formats are open or commonly used
Responsibility
The depositor is responsible for the content and sharing of the attached files
Ethics
There is no personal data or any that requires ethical approval