Segmentation
Studio processes files for translation by breaking text into segments. A segment can be a paragraph or a sentence. Punctuation marks are used to identify where each segment ends. This is called segmentation and affects how the text is displayed in the Editor upon opening a document.
When a file is opened in the Studio Editor, it goes through three levels of segmentation:
- Structure-based segmentation
- Rules-based segmentation
- Inline-tags-based segmentation
Structure-based segmentation
What this first level of segmentation does is splitting the input file on defined structure elements. This is dependent on the used file type or and the user defined settings (e.g. XML parser structure rules).
Rules-based segmentation
After the document is split into paragraph units, another round of segmentation is performed, based on translation memory segmentation rules. The user can define characters or RegEx patterns which will act as sentence splitters. Rules-based segmentation applies to all file types and are based on the default translation memory.
The segmentation rules can be changed from the Translation Memory settings.
Inline-tags-based segmentation
The third and last layer of segmentation uses segmentation hints to process the final segment form. Segmentation hints define the required behavior of a tag placeholder or tag pair that appears on a segment boundary (leading or trailing). During this phase, Studio decides what content (text + tags + placeholders) split into segments should be translatable and what is non-editable. The following segmentation hints are used:
- Include
- Exclude
- Include with Text
- May Exclude