Traduction et Langues
Volume 21, Numéro 1, Pages 77-98
2022-08-31

Post-édition De Ta Neuronale à La Dgt Et Qualité Des Textes Finaux : étude De Cas     / Neural Machine Translation Post-editing In Dgt And Final Text Quality: A Case Study

Auteurs : De Faria Pires Loïc .

Résumé

This article aims at presenting the results of a case study carried out in collaboration with the European Commission’s Directorate General for Translation. This study analyses the quality of contents post-edited from Neural Machine Translation (NMT) proposals (eTranslation NMT engine) by translators with varied translation experience levels. Two types of participants were recruited: “Blue Book” interns (i.e. recently graduated translators taking part in a 5-month paid internship in DGT) and in-house translators. In order to proceed with this analysis, we used an evaluation grid created by French researchers Toudic et al. (2014), and containing nine error categories, as well as four types of effects which guide raters when they attribute severity penalties to errors. The reliability of this tool was verified by an interrater agreement score: 583 revision marks were compared in terms of 1) severity penalty, 2) category and 3) raw MT responsibility by two investigators. As far as methodology is concerned, for each source text, a NMT proposal from the eTranslation engine was post-edited by a DGT translator (10 participants; 7 in-house translators and 3 “Blue Book” interns) and revised by a DGT colleague. This procedure follows the typical DGT workflow: texts are usually first translated by a translator, then systematically revised by a colleague from the same (or sometimes, a different) translation unit. The evaluation of PE text quality was thus carried out through the revision marks introduced in the PE texts. Each of these revision marks was categorised and was attributed a penalty score ranging from 1 (minor) to 5 (critical), according to the perceived distortion of the original message and intention that the source text is supposed to convey. Severity penalties were then normalised using a 100-word basis, in order for the results to be comparable between participants and texts: a total penalty score was computed for each text, and then accordingly divided to reach a 100-word penalty score. These normalised scores enabled us to compare the perceived quality of the texts provided by our participants. Though our results cannot be generalised, since the study presented here is a case study for which no significance score could be computed (not enough data), several conclusions were reached: the overall PE text quality is higher in participants with high experience levels (senior translators) than in junior translators; participants with lower experience levels produce PE texts containing more fidelity and terminology problems than their more experienced counterparts, and professional experience does not seem to have an influence on the proportion of errors directly caused by NMT proposals. Several organisational constraints limited the scope of our study. First, the modest number of participants did not provide for significant results. Hence, a deeper study could be carried on with more volunteers, in order to reach more generalisable results. Secondly, each participant provided us with an uneven number of texts and PE words. This is due to the very nature of our study, in the framework of which translators provided us with texts coming from their daily translation tasks, which limits the quantity of collected data but increases natural validity. Furthermore, the authentic context in which this study was implemented did not enable us to collect process data: further studies could include said data, which would provide for more representative results and provide us with an insight in translators’ cognitive processes when post-editing. In this context, eye-tracking data could be collected, and methods such as questionnaires and think-aloud protocols could be implemented in order to link process data to the quality scores obtained in our study. Finally, studying additional language pairs would be relevant, since NMT quality tends to vary according to these.

Mots clés

Institutional Translation; Neural Machine Translation; Post-Editing; Product; Quality