File(s) not publicly available
ShefCE: A Cantonese-English bilingual speech corpus
ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in [1].
The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at http://mini.dcs.shef.ac.uk/resources/shefce, and cite [1] when using the data.
[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment", in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
Funding
IIKE Fund@Sheffield, Google
History
Ethics
- The project has ethical approval and have included the number in the description field
Policy
- The data complies with the institution and funders' policies on access and sharing
Sharing and access restrictions
- The data requires access restrictions, explained in the description field, files are not attached
Data description
- The file formats are open or commonly used
Methodology, headings and units
- Headings and units are explained in the files