The University of Sheffield
Browse

Interspeech 2016 - Experiment results for Sheffield Wargame Corpora (SWC1, SWC2, SWC3)

Download (13.33 MB)
dataset
posted on 2016-06-15, 08:59 authored by Yulan LiuYulan Liu, Thomas HainThomas Hain, Madina HasanMadina Hasan

The files in the dataset correspond to results that have been generated for Interspeech 2016 paper: "The Sheffield Wargames Corpus - Day Two and Day Three" (DOI: 10.21437/Interspeech.2016-98). This paper details a natural English speech corpora recorded in natural environment with multi-media and multi-microphones, reports baseline speech recognition performance based on standalone training and adaptation, and it also releases a Kaldi recipe for standalone training.


The files in the zip file are of three types:

- .ctm, which correspond to the output of the automatic speech recognition system and the columns include segment information as well as transcripts of the recognition.

- .ctm.filt.sys, which correspond to scoring of the automatic speech recognition system and includes the overall word error rate as well as the number of insertions, deletions and substitutions of the overall system.

- .ctm.filt.lur, which provides a more detailed decomposition of the word error rate across multiple genres.


The three file types are repeated for all the results described in Table 4 and Table 5 of the paper.


The following is a description about the naming convention of the files (already explained in the paper):


"ihm" refers to "Individual Headset Microphone".

"sdm" refers to "Single Distant Microphone".

"mdm8" refers to "Multiple Distant Microphone - 8 channels".

"LDA" refers to "Linear Discriminant Analysis".

"MLLT" refers to "Maximum Likelihood Linear Transform".

"SAT" refers to "Speaker Adaptive Training".

"MMI" refers to "Maximum Mutual Information".

"DNN" refers to "Deep Neural Network".

"sMBR" refers to "state-level Minimum Bayes Risk".

"fMLLR" refers to "feature-level Maximum Likelihood Linear Regression".

"o4" refers to "maximally 4 overlapping speakers in scoring".



All three file types are standard outputs that are recognized by the automatic speech recognition community and can be opened using any text editor.

Funding

EP/I031022/1

History

Ethics

  • There is no personal data or any that requires ethical approval

Policy

  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • There is a readme.txt file describing the methodology, headings and units

Usage metrics

    School of Computer Science

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC