The University of Sheffield
Browse
.CSV
bronze_filtered.csv (433.61 MB)
DATASET
gold.csv (919.74 kB)
.CSV
bronze_unfiltered.csv (472.4 MB)
DATASET
silver_1.csv (6.77 MB)
DATASET
silver_5.csv (32.99 MB)
DATASET
silver_10.csv (64.92 MB)
TEXT
README.txt (2.67 kB)
1/0
7 files

Noun Compound Synonym Substitution in Books – NCSSB datasets

dataset
posted on 2024-02-26, 15:35 authored by Thomas PickardThomas Pickard, Aline VillavicencioAline Villavicencio, Agne Knietaite, Adam Allsebrook, Anton Minkov, Adam Tomaszewski, Norbert Slinko, Richard Johnson

The Noun Compound Synonym Substitution in Books (NCSSB) datasets contain in-context instances of potentially idiomatic English noun compounds, obtained by substituting idioms for synonyms occurring in public domain books forming part of the Project Gutenberg corpus.

History

Ethics

  • There is no personal data or any that requires ethical approval

Policy

  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The uploaded data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • There is a file including methodology, headings and units, such as a readme.txt