Noun Compound Synonym Substitution in Books – NCSSB datasets

dataset

posted on 2024-02-26, 15:35 authored by Thomas PickardThomas Pickard, Aline VillavicencioAline Villavicencio, Agne Knietaite, Adam Allsebrook, Anton Minkov, Adam Tomaszewski, Norbert Slinko, Richard Johnson

The Noun Compound Synonym Substitution in Books (NCSSB) datasets contain in-context instances of potentially idiomatic English noun compounds, obtained by substituting idioms for synonyms occurring in public domain books forming part of the Project Gutenberg corpus.

History

Ethics

There is no personal data or any that requires ethical approval

Policy

The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

The uploaded data can be shared openly

Data description

The file formats are open or commonly used

Methodology, headings and units

There is a file including methodology, headings and units, such as a readme.txt

Usage metrics

Keywords

MWEs multi-word units (MWUs)multi-word expression Multi-word expressions multi-word expressions Idioms Noun Compounds Idiomatic expressions idiomatic sentences

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM