The University of Sheffield

Noun Compound Synonym Substitution in Books – NCSSB datasets

posted on 2024-02-26, 15:35 authored by Thomas PickardThomas Pickard, Aline VillavicencioAline Villavicencio, Agne Knietaite, Adam Allsebrook, Anton Minkov, Adam Tomaszewski, Norbert Slinko, Richard Johnson

The Noun Compound Synonym Substitution in Books (NCSSB) datasets contain in-context instances of potentially idiomatic English noun compounds, obtained by substituting idioms for synonyms occurring in public domain books forming part of the Project Gutenberg corpus.



  • There is no personal data or any that requires ethical approval


  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The uploaded data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • There is a file including methodology, headings and units, such as a readme.txt