The compressed folder contains the experiment output described in Table 1 and Table 2 in the paper "Groupwise learning for ASR k-best list reranking in spoken language translation" DOI: 10.1109/ICASSP.2016.7472853.
In the top level there are two files, "E12.filelist" contains the 1124 segments used in the experiments. Each segment is named after the convention (TALK)_(SPEAKER)_(STARTTIME in 10ms)_(ENDTIME in 10ms). The segments are identical to those from the IWSLT 2012 evaluation ( "" contains the reference French translation of these 1124 sentences.
Experiments results were organised into subfolders. Results from which TABLE 1 was generated could be found in the folder TABLE_1/Setting_[A or B or C] accordingly. For each Setting, 8 regression experiments (predicsvcpermute*) and 14 classification experiments (predicsvrpermute*) were conducted. The translation outputs are recorded in the files prefixed "rerank.merged.*" and the translation scores are recorded in the files prefixed "best-results.fullset.*"
The folders TABLE_2/Groupwise+LDA/Setting_[A or B or C] contain the translation outputs "rerank.merged.*" and translation scores "best-results.fullset.*" for the Groupwise+LDA results described in Table 2 in the paper. For Setting A, B and C, the corresponding optimal LDA dimensions are 3, 5 and 4 respectively. These are also reflected in the filename in the uploaded data.
EPSRC (EP/I031022/1) and Google ("WFST based Integration of ASR and MT in Spoken Language Translation")
There is no personal data or any that requires ethical approval
The data complies with the institution and funders' policies on access and sharing