Data preparation for language pair adaptation
Valid *.tmx files are required to adapt language pairs.
General checks for language pair adaptation
Although not enforced within the application, RWS recommends that you perform some generic checks on the translation units (TUs) used to adapt a language pair.
- A valid
*.tmxfile format - UTF-8 encoding
- Correct
*.tmxlanguage tags - Clean data
Data cleaning for language pair adaptation
The data cleaning refers to the preparation of the input material to make it compatible with the Language Weaver Edge trainer. The cleaning process is aimed at optimizing the quality of the resulting model, and it does not involve any improvement of the data from a linguistic point of view.
Good quality translation memories (TMs) is the starting point for language pair adaptation. Cleaning will not, however, improve the consistency of your data if the TMs include, for example, different translations for the same source or different styles.
- TUs in languages different from the language pair that you'd like to adapt.
- TUs containing only non-semantic text, like symbols or punctuation marks.
- Misaligned TUs, i.e., translations that don't correspond to the source text.
- Corrupted data, normally caused by an incorrect encoding.