<p>The dataset and model checkpoints are needed to reproduce the results of the EAMT 2022 paper Controlling Extra-Textual Information About Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation, Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 121–130, https://aclanthology.org/2022.eamt-1.15.</p>
<p><br></p>
<p>This data (data.zip) originally comes from the OpenSubtitles18 corpus and the Europarl corpus.</p>
<p><br></p>
<p>OpenSubtitles18:</p>
<p><br></p>
<p>[P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)](https://aclanthology.org/L16-1147/)<br>
</p>
<p>The corpus can found at [OPUS website](https://opus.nlpl.eu/OpenSubtitles-v2018.php). The data was originally sourced from [OpenSubtitles.org](http://www.opensubtitles.org/)</p>
<p><br></p>
<p><br></p>
<p>Europarl:</p>
<p><br>
[Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. Conference Proceedings: The Tenth Machine Translation Summit, 79–86.](https://aclanthology.org/2005.mtsummit-papers.11/)</p>
<p><br></p>
<p>Data originally sourced from [statmt.org](https://www.statmt.org/europarl/)</p>
<p><br></p>
<p>Direct links:</p>
<p>Europarl: https://www.statmt.org/europarl/v7/pl-en.tgz</p>
<p>OpenSubtitles: </p>
<p>- English XML files:</p>
<p>http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/xml/en.zip</p>
<p>- Polish XML files:</p>
<p>http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/xml/pl.zip</p>
<p>- English-to-Polish alignment files:</p>
<p>http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/xml/en-pl.xml.gz</p>
<p><br></p>
<p>The models (checkpoints.zip) were trained in PyTorch:</p>
<p>Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32(NeurIPS).</p>
<p><br></p>
<p>Full documentation to how to use the resources is included in the GitHub repository which contains a link to this ORDA page: </p>
<p>https://github.com/st-vincent1/grammatical_agreement_eamt</p>
Funding
UKRI Centre for Doctoral Training in Speech and Language Technologies and their Applications
Engineering and Physical Sciences Research Council