Repetition counting scheme
A segment is considered to be a repetition if the same segment has already occurred during the scoping process in either the same asset or in another asset. TRADOS compares segments exactly. A segment becomes a repetition only if it matches completely including number and placement of placeholders (but the contents of the placeholder does not have to match).
In WorldServer a segment is considered a repetition if a previously translated match can be used to generate a 100% match for this segment. Even if a segment does not contain the same text as a previously translated match it can be counted as a repetition. For example, if segments differ by leading and trailing placeholders, only the second segment is counted as a repetition because placeholder difference will be automatically repaired up to 100% match.
For example, consider two segments: “This is a test. This is another test” and “<b>This is a test. This is another test.</b>”. The second segment is a bolded version of the first one. WorldServer correctly detects this repetition while TRADOS does not.
WorldServer also tolerates whitespace differences. Whitespace is ignored during 100% match generation and segments that differ only by whitespace are considered to be repetitions. WorldServer uses Java whitespace definition. See Character.isWhitespace() for more information.
Repetition calculations are greatly affected by segmentation rules. Smaller generated segments result in a higher chance of repeated fragments. TRADOS by default breaks sentences on the colon (:) character. This configuration allows for smaller segments and higher repetition counts, but might produce linguistically incorrect segmentation. WorldServer does not break English sentences on the colon character by default but can be configured to do so if required by a customer.