Documentation Center

Recognized Token Example

The following input document is translated in Studio, with default settings, against an empty TM:
<html><body>
<p>Today is November 2, 2010 and the sun is shining.</p>
<p>Today is 11/2/2010 and the sun is shining.</p>
<p>The index rose by 123.3 points, or 3.8% after the announcement.</p>
<p>It's less than 5km to London, but 160.8 mi to Sheffield.</p>
<p>After 3:30 pm I'm getting tired.</p>
</body></html>

Notes:

After translation in Studio, the document looks like this:

  • All placeables (indicated by a blue bracket under the placeable) have been automatically localized by Studio and selected from the segment-specific drop-down list (Ctrl-Comma).
  • The second segment in fact is an exact match against the first segment. The source of the first segment uses a long date pattern, while the second uses a short date pattern, but for auto-substitution purposes, only the type (date) is compared, not the specific pattern type (long date vs. short date).
  • The TM resulting from translating the above document will contain four TUs – one for segments 1 and 2 (which are “equal after auto-substitution”), and one for each of the other segments.
Now, the following document is opened in Studio. The placeables differ textually, but the other text is the same:
<p>Today is October 11, 2010 and the sun is shining.</p>
<p>Today is 10/11/2010 and the sun is shining.</p>
<p>The index rose by 23.45 points, or 1.3% after the announcement.</p>
<p>It's less than 4.3km to London, but 1,6000.99 mi to Sheffield.</p>
<p>After 1:12 AM I'm getting tired.</p>
</body></html>

Notes:

When pre-translating these documents in Studio, you get the following result, without any manual input:

  • The text of all recognized tokens (number, dates, times, measurements) differs, but all tokens are correctly inserted into the translation in their auto-localized form.
  • All matches are context matches (similar to the ICE match type in WorldServer). That is, the CM match property "survives" repairs (which has pros and cons).
  • In Studio, a 100% match which was repaired to become an exact match is not considered an original fuzzy match. That is, the match candidates are found by exact search, not by fuzzy search, since the index only stores "placeholders" for the token types, and does not depend on the actual textual form of the token.
  • The Studio TM will only store and process one "version" of two translation units which are equal, apart from auto-substitutable recognized tokens. That is, even after confirming all 5 TUs in the newly translated document, the TM will only contain the 4 TUs generated by translating the original document.
  • The text context hashes used to determine the CM property are independent of recognized tokens. This means that after translating the second document, the text context hashes annotated to the 4 TUs in the TM will not be changed or extended.