Test data for language pair adaptation
The test data corresponds to the content used to evaluate the quality of a language pair after adaptation.
Consider the following points when preparing the test data for the language pair adaptation:
- The data is stored in a translation memory (TM) in
*.tmxformat. - You can upload only one
*.tmxfile. - Each translation unit (TU) consists of a single sentence or a meaningful unit, but not incomplete phrases, multiple sentences, or paragraphs.
- There must be a minimum of 500 TUs that are not included in the training data set.
- If no test data is provided, Language Weaver Edge will automatically extract a set of 500 TUs that will be used to evaluate the quality of the adapted language pair.
- The selected test data is representative of the content that will be usually processed with the language pair resulting from the adaptation.
- The data consists of correct, grammatical sentences that are different in terms of style, terminology, and average length.