Recognized Tokens
A recognized token is a short piece of text, in a defined set of formats, that a translation memory (TM) treats as a single word.
- Dates
- Times
- Numbers (in numerals)
- Measurements
- Acronyms and URLs
- Variables
- Inline tags
For example, if dates are enabled as recognized tokens in a TM, the TM recognizes Monday 1 January, 1900 as one word.
Where to enable or disable the recognition of tokens
You enable or disable the recognition of tokens by editing the general properties of a TM.
These settings affect how the TM analyzes the source text into tokens. The recognized tokens settings specify text patterns that the TM recognizes as tokens, and also specify the type of token.
For some types of recognized token, you can provide a rule that the TM uses to translate occurrences of the token in the source text. These rules are auto-localization rules and are described below.
How the TM uses the recognition of tokens setting
When a translation unit (TU) is added to a TM, the TM scans the TU for recognized tokens. For example, if dates are enabled, it analyses the TU for dates. If recognition of this token is not enabled for the TM, the TM does not recognize the tokens. If, for example, you later enable date recognition, the TM does not immediately analyze existing TUs for dates - you need to re-index the TM. When you do re-index the TM, the TM scans existing TUs for currently recognized tokens.
When new text is presented for a translation, the TM segments the text and then checks for recognized tokens, that is, it checks for recognized tokens only within a segment.
How recognized tokens are handled in translation
- Inline tags
- Acronyms
- URLs
- Variables
- Dates
- Times
- Numbers (in numerals)
- Measurements
Where to change automatic localization rules for recognized tokens
The automatic localization rules are specified in the project settings.
Where to change settings related to recognized tokens
When a TM analyzes source text, it uses the recognize tokens settings. If the TM recognizes a token within a matched segment, it translates the whole segment.
If the token is not part of a matching segment, then it provides the suggested translation (for example, provides a date in the target language).
You can specify these settings under the generic language resources settings, and also directly to any TM, under TM settings. As you would expect, any settings that are directly specified for a TM override the generic settings.