Documentation Center

Test data for language pair adaptation

The test data corresponds to the content used to evaluate the quality of a language pair after adaptation.

Consider the following points when preparing the test data for the language pair adaptation:
  • The data is stored in a translation memory (TM) in *.tmx format.
  • You can upload only one *.tmx file.
  • Each translation unit (TU) consists of a single sentence or a meaningful unit, but not incomplete phrases, multiple sentences, or paragraphs.
  • There must be a minimum of 500 TUs that are not included in the training data set.
  • If no test data is provided, Language Weaver Edge will automatically extract a set of 500 TUs that will be used to evaluate the quality of the adapted language pair.
  • The selected test data is representative of the content that will be usually processed with the language pair resulting from the adaptation.
  • The data consists of correct, grammatical sentences that are different in terms of style, terminology, and average length.