The University of Sheffield

File(s) not publicly available

ShefCE: A Cantonese-English bilingual speech corpus

posted on 2017-03-10, 14:06 authored by Wai Man NgWai Man Ng, Alvin C.M. Kwan, Tan LeeTan Lee, Thomas HainThomas Hain

ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in [1].

The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at, and cite [1] when using the data.

[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment",  in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.


IIKE Fund@Sheffield, Google



  • The project has ethical approval and have included the number in the description field


  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The data requires access restrictions, explained in the description field, files are not attached

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • Headings and units are explained in the files