Training data for language pair adaptation
The training data corresponds to the content used to adapt a language pair. Language Weaver Edge uses this data to create a new adapted language pair that will offer customized translations for a specific domain.
Consider the following points when preparing the training data for the language pair adaptation:
- The data is stored in a translation memory (TM) in
*.tmxformat. - You can upload a single
*.tmxfile or a*.zipfile containing multiple TMs. The maximum file size for either the*.tmxor the*.zipfiles is 3 GB. - Each translation unit (TU) consists of a single sentence or a meaningful unit, but not incomplete phrases, multiple sentences, or paragraphs.
- There must be a minimum of 1,000 TUs, but RWS strongly recommends a minimum of 30,000 TUs.
- The maximum number of TUs that can be used for training is 30 million.
- Terminology is consistent across the TUs in the TMs.
- Segments are correctly aligned: the target text is a proper translation of the source text.
- Segments are representative of the content you will process with machine translation: similar terminology, similar style, similar domain.
- Factual-style content has more chances of being well translated, while flowery expressions or idioms may not perform so well.