File(s) not publicly available

ShefCE: A Cantonese-English bilingual speech corpus

dataset

posted on 2017-03-10, 14:06 authored by Wai Man NgWai Man Ng, Alvin C.M. Kwan, Tan LeeTan Lee, Thomas HainThomas Hain

ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in [1].

The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at http://mini.dcs.shef.ac.uk/resources/shefce, and cite [1] when using the data.

[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment", in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.

Funding

IIKE Fund@Sheffield, Google

History

Ethics

The project has ethical approval and have included the number in the description field

Policy

The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

The data requires access restrictions, explained in the description field, files are not attached

Data description

The file formats are open or commonly used

Methodology, headings and units

Headings and units are explained in the files

Usage metrics

Keywords

Cantonese bilingualism English data sets Pronunciation changes Language learning Natural Language Processing Chinese Languages English Language English as a Second Language

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM