Documentation Center

Understanding repetitions

Some content may be reused within an asset and across a collection of assets. The translation of duplicated content should not result in the same translation cost as for the original content. Arguably, the content should not incur a translation cost at all, provided the translation of the original instance is readily available and appropriate for the new context of translation. However, the cost issue is beyond the scope of translation memory technology. Instead, the translation memory technology enables such cost discussions to be had by supporting and exposing a concept of repetitions.

The repetition calculation process identifies and counts repetitions across segments that cannot be fully leveraged by the TM. This means that both ICE and 100% matches are not included in the repetition counts. The idea is that ICE matches do not need to be reviewed at all, and represent no translation cost. 100% matches often do not incur a translation cost, depending on the vendor. At most, the customer may decide to have these 100% (non-ICE) matches reviewed, and the customer would be charged some minimal amount.

The motivation behind repetition counting is to reduce translation cost for the customer by preventing them from being charged full price for identical segments that must be translated. The first occurrence of a duplicated segment is considered the original or repeated segment, and it is scoped as normal. This means that the word count for the repeated segment will be attributed to the fuzzy match bucket representing the best TM fuzzy match for the segment, and thus will incur a translation cost relative to the translation effort. Subsequent occurrences of the segment are referred to as repetition segments. The word count for repetition segments are placed in the scoping bucket for repetitions. For example, if the segment "Oh what a beautiful morning" containing five words was repeated five times in an asset or set of assets, then the first occurrence would be scoped normally, and the additional occurrences (collectively containing 20 words) would be placed in the repetition bucket provided that they cannot be fully leveraged by the TM.