The University of Sheffield
7 files

Noun Compound Synonym Substitution in Books – NCSSB datasets

posted on 2024-02-26, 15:35 authored by Thomas PickardThomas Pickard, Aline Villavicencio, Agne Knietaite, Adam Allsebrook, Anton Minkov, Adam Tomaszewski, Norbert Slinko, Richard Johnson

The Noun Compound Synonym Substitution in Books (NCSSB) datasets contain in-context instances of potentially idiomatic English noun compounds, obtained by substituting idioms for synonyms occurring in public domain books forming part of the Project Gutenberg corpus.



  • There is no personal data or any that requires ethical approval


  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The uploaded data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • There is a file including methodology, headings and units, such as a readme.txt