Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation - PaRis AI Research InstitutE Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation

Résumé

Domain transfer remains a challenge in machine translation (MT), particularly concerning rare or unseen words. Amongst the strategies proposed to address the issue, one of the simplest and most promising in terms of generalisation capacity is coupling the MT system with external resources such as bilingual lexicons and appending inline annotations within source sentences. This method has been shown to work well for controlled language settings, but its usability for general language (and ambiguous) MT is less certain. In this article we explore this question further, testing the strategy in a multi-domain transfer setting for German-to-English MT, using the mT5 language model fine-tuned on parallel data. We analyse the MT outputs and design evaluation strategies to understand the behaviour of such models. Our analysis using distractor annotations suggests that although improvements are not systematic according to automatic metrics, the model does learn to select appropriate translation candidates and ignore irrelevant ones, thereby exhibiting more than a systematic copying behaviour. However, we also find that the method is less successful in a higher-resource setting with a larger lexicon, suggesting that it is not a magic solution, especially when the baseline model is already exposed to a wide range of vocabulary.
Fichier principal
Vignette du fichier
DadaMT.pdf (340.47 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04591889 , version 1 (29-05-2024)

Identifiants

  • HAL Id : hal-04591889 , version 1

Citer

Jesujoba O Alabi, Rachel Bawden. Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation. KEMT 2024 - First International Workshop on Knowledge-Enhanced Machine Translation, Jun 2024, Sheffield, United Kingdom. ⟨hal-04591889⟩
0 Consultations
0 Téléchargements

Partager

Gmail Mastodon Facebook X LinkedIn More