The University of Sheffield
Browse
DATASET
all_recordings.csv (2.46 MB)
DATASET
transcribed_recordings.csv (5.25 MB)
ARCHIVE
preprocessed_recordings.zip (1.8 GB)
ARCHIVE
transcribed_recordings.zip (1.47 GB)
.ZIP
raw_recordings.zip (6.54 GB)
TEXT
readme.txt (1.58 kB)
1/0
6 files

SNuC: The Sheffield Numbers Spoken Language Corpus

Version 3 2022-06-22, 09:36
Version 2 2022-06-22, 08:18
Version 1 2022-04-29, 14:08
dataset
posted on 2022-06-22, 09:36 authored by Emma BarkerEmma Barker, Jonathan BarkerJonathan Barker, Robert GaizauskasRobert Gaizauskas, Ning MaNing Ma, Monica ParamitaMonica Paramita

SNuC is the first published corpus of spoken alphanumeric identifiers of the sort typically used as serial and part numbers in the manufacturing sector. The dataset contains recordings and transcriptions of over 50 native British English speakers, speaking over 13,000 multi-character alphanumeric sequences and totalling almost 20 hours of recorded speech.


Ethical approval to use human participants to gather spoken data using the setup described above was sought and obtained via the University of Sheffield's Research Ethics Review procedures (application 031449). 


Please refer to the following paper for more information about this dataset: 

Barker, E., Barker, J., Gaizauskas, R., Ma, N., Paramita, M. L. 2022. SNuC: The Sheffield Numbers Spoken Language Corpus. In: Proceedings of LREC 2022 (forthcoming).

Funding

University of Sheffield Impact, Innovation and Knowledge Exchange (IIKE) Fund

Research England-funded PitchIn project

History

Ethics

  • The project has ethical approval and the number is included in the description field

Policy

  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The uploaded data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • There is a file including methodology, headings and units, such as a readme.txt

Usage metrics

    Information School

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC