This online repository contains the speech recognition model sets and the recording transcripts used in the phoneme/syllable recognition experiments reported in .
Speech recognition model sets
The speech recognition model sets are available as a tarball,
named model.tar.gz, in this repository.
The models were trained on Cantonese and English data. For each language, two model sets were trained according to the background setting and the mixed-condition setting respectively. All models are DNN-HMM models, which are hybrid feed-forward neural network models with 6 hidden layers and 2048 neurons per layer. Details can be found in . The Cantonese models include a bigram syllable language model. The English models include a bigram phoneme language model. All model sets are provided in the kaldi format.
1. The background-cantonese model was trained on CUSENT (68 speakers, 19.4 hours) of read Cantonese speech.
2. The background-english model was trained on WSJ-SI84 (83 speakers, 15.2 hours) of read English speech
3. The mixed-condition-cantonese model was trained on background-cantonese data and ShefCE Cantonese training data (25 speakers, 9.7 hours).
4. The mixed-condition-english model was trained on background-english data and ShefCE English training data (25 speakers, 2.3 hours)
The recording transcripts are available as a tarball, named, stms.tar.gz, in this repository. These transcripts cover the ShefCE portion of the training data and the ShefCE test data.
Four files can be found in the stms.tar.gz archive.
- ShefCE_RC.train.v*.stm contains the transcripts for ShefCE training set (Cantonese)
- ShefCE_RE.train.v*.stm contains the transcripts for ShefCE training set (English)
- ShefCE_RC.test.v*.stm contains the transcripts for ShefCE test set (Cantonese)
- ShefCE_RE.test.v*.stm contains the transcripts for ShefCE test set (English)
Please cite  for the use of ShefCE data, models or transcripts.
 Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment", in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.