Interspeech 2016 - Experiment results for Sheffield Wargame Corpora (SWC1, SWC2, SWC3)

dataset

posted on 2016-06-15, 08:59 authored by Yulan LiuYulan Liu, Thomas HainThomas Hain, Madina HasanMadina Hasan

The files in the dataset correspond to results that have been generated for Interspeech 2016 paper: "The Sheffield Wargames Corpus - Day Two and Day Three" (DOI: 10.21437/Interspeech.2016-98). This paper details a natural English speech corpora recorded in natural environment with multi-media and multi-microphones, reports baseline speech recognition performance based on standalone training and adaptation, and it also releases a Kaldi recipe for standalone training.

The files in the zip file are of three types:

- .ctm, which correspond to the output of the automatic speech recognition system and the columns include segment information as well as transcripts of the recognition.

- .ctm.filt.sys, which correspond to scoring of the automatic speech recognition system and includes the overall word error rate as well as the number of insertions, deletions and substitutions of the overall system.

- .ctm.filt.lur, which provides a more detailed decomposition of the word error rate across multiple genres.

The three file types are repeated for all the results described in Table 4 and Table 5 of the paper.

The following is a description about the naming convention of the files (already explained in the paper):

"ihm" refers to "Individual Headset Microphone".

"sdm" refers to "Single Distant Microphone".

"mdm8" refers to "Multiple Distant Microphone - 8 channels".

"LDA" refers to "Linear Discriminant Analysis".

"MLLT" refers to "Maximum Likelihood Linear Transform".

"SAT" refers to "Speaker Adaptive Training".

"MMI" refers to "Maximum Mutual Information".

"DNN" refers to "Deep Neural Network".

"sMBR" refers to "state-level Minimum Bayes Risk".

"fMLLR" refers to "feature-level Maximum Likelihood Linear Regression".

"o4" refers to "maximally 4 overlapping speakers in scoring".

All three file types are standard outputs that are recognized by the automatic speech recognition community and can be opened using any text editor.

Funding

EP/I031022/1

History

Ethics

There is no personal data or any that requires ethical approval

Policy

The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

The data can be shared openly

Data description

The file formats are open or commonly used

Methodology, headings and units

There is a readme.txt file describing the methodology, headings and units

Usage metrics

Keywords

Speech recognition speech recognition outcomes speech recognition performance speech recognition in noise Interspeech 2016 Computer-Human Interaction

Licence

CC BY 4.0