Configuring settings for the Legacy embedded content processor

The Embedded content (Legacy) page is available for file types that still use the legacy embedded content processor. This is a generic processor which does not differentiate between the type of embedded content. As a result, this restricts you from specifying custom extraction.

About this task

Legacy embedded content is available for the following file types: Microsoft Excel, Java Resources, XML: Any XML and new (Legacy Embedded Content) file types.

Procedure

  1. Decide for which projects you want to configure file type settings:
    • For the active project, go to the Projects view, and on the Home tab, select Project Settings.
    • For all future projects, go to File > Options.
  2. Expand the File Types tree and select the relevant file type Microsoft Excel, Java Resources, XML: Any XML and new (Legacy Embedded Content) file types.
  3. On the Embedded content page of your file type, select the Enable embedded content processing checkbox.
  4. Choose Document structure > Add... to create extraction rules based on document structure information. Make sure that the document structure information you specify here is covered by a parser rule on the Parser page of your file type. SDL Trados Studio can only extract embedded content that is recognized by the file type parser.
  5. Add tag definition rules to specify how to treat the embedded content defined in Document structure information box.
    Tag Type
    Placeholder

    Converts embedded content to standalone (placeholder) tags.

    Tag Pair

    Identifies tag pairs (a start tag and an end tag) in the embedded content.

    Start Tag Expression (Placeholder)

    This is a regular expression that identifies embedded content, and converts each occurrence to a placeholder tag. For example, to convert all HTML <br> (line break) tags to placeholder tags, enter <br.*?>

    Start Tag Expression and End Tag Expression (Tag Pair)

    These are regular expressions that identify embedded content by start and end tags. The start and end tags may enclose some content or none.

    The processor will try to match the tag pair before tries to match each tag expression. That is, it looks for any section of text that starts with the Start Tag expression and ends with the End Tag expression before it tries to match individual start and end tags.

    For example, to identify all HTML <tr>...</tr> (table row) tag pairs, enter:
    • Start Tag: <tr.*?>
    • End Tag: </tr>
    Ignore case

    Check this box and the letter case of your defined tags is not taken into consideration when the embedded content is identified.

    Translate

    Not translatable means that the content between the tag pairs is displayed to the translator as locked content.

    Text within tag pairs can be translatable or non-translatable. Placeholder tags are Not translatable.

    Formatting

    You can edit how the embedded content will be displayed in the Editor view.

  6. The Advanced Settings specify how tags are displayed.
    Inside text the tag acts as a word end

    This option changes the behavior of cursor placement in the Editor window.

    When selected, the editor treats the tag as a word for the purposes of navigation. For example, in the editor, pressing Ctrl+Left Arrow will move the cursor to the beginning of the tag and Ctrl+Right Arrow will move the cursor to the end of the tag.

    Text lines can be wrapped after the tag

    Selecting this option indicates that a line break after this tag does not indicate the end of a segment. For example:

    Gather ye rosebuds while ye may,<br>

    Old Time is still a-flying: <br>

    And this same flower that smiles to-day <br>

    To-morrow will be dying.

    Tags represent formatting only and can be hidden in the editor

    When this option is selected, text is formatted correctly and the standard formatting tags (for example, bold, italic, and font type) are not displayed.

    Selecting this option does not mean that the tag is always hidden; the user can change the editor settings to force the tag to be displayed.

    Tags represents the text

    Placeholder (standalone) tags only.

    A tag can have a text equivalent. For example, the entity tag &quot; has the text equivalent ".

    Segmentation Hint
    A segmentation hint is a property of a tag that helps the software to segment the file better when converting the file to a translatable format: whether to position the tag within a segment or outside of the segment, or to force a segmentation break. Choose one of the following options.
    IncludeIf selected, the tag is displayed in the editor, even if it has no associated text. You would rarely select this option.
    Include with text

    If selected, when the tag has associated text, the tag is displayed in the editor.

    Example: the tag specifies a footnote marker. Where this is the case, the translator needs the ability to move the marker to another word in the same sentence, so the tag should be included with the text.

    Exclude

    If selected, the software will, where possible, use the tag or tag pair to segment the text. For example, if <p>...</p> or <br> tags are marked Exclude , then if an XML document includes embedded HTML code, the software will use the HTML tags <p>...</p> and <br> to segment the document. This segmentation is in addition to the segmentation that is already applied to the embedding XML code.

    May exclude, Undefined

    These two are effectively the same. The editor determines whether the tag is part of the text.