Documentation Center

Adding or editing an HTML parser rule

Use the parser rule settings available on the Add Rule, Edit Rule and Copy Rule pages to define the properties of the HTML parser rules. These settings help SDL Trados Studio better sort translatable from non-translatable text and correctly display the content extracted from the HTML documents.

About rules, attributes and conditions

Each element in HTML documents can have specific attributes. Attributes give additional information about the HTML element. For example, the <a> element indicates a hyperlink. An "href" attribute applied to the <a> element adds a web address to the hyperlink. Similarly, a "title" attribute adds a tooltip which is visible when hovering with the mouse over the hyperlink. The content between <a> (the opening tag) and </a> (the closing tag) define the title of the hyperlink.

The image below shows an example of an <a> element and its components:

Usually, you would want Studio to extract and allow you to translate the title of the hyperlink and its tooltip. In contrast, you would not normally want Studio to allow you to edit the address of a hyperlink because altering this address may break the link. These conditions are specified as default settings for the <a> element on the Parser Rules page . However, there may be situations when you would want to translate the address of the hyperlink because the link in your translation should point to the localized version of the website which has a different address. Also, sometimes you may not want to translate the tooltip of the hyperlinks, for example when the tooltip shows a numeric value.

For such situations, when you do not want Studio to apply the default settings or when you want to teach Studio to deal with elements not mentioned on the parser rules list, customize the existing parser rules.

You modify a rule by editing its attributes and conditions. The next time that you open an HTML document, Studio will no longer apply the default settings for the HTML element with the customized parser rule.

The HTML Add/ Edit rule page

Click Add... Edit... or Copy on the Parser page to open the Edit Rule, Add Rule or the Copy Rule page where you can configure the properties of the HTML rules.

Rule section

OptionDescription
Name

The name of the element for which you are modifying the parser rule.

For example, the name of the rule which affects the <a> elements of HTML documents is named a.

Conditions

The conditions which define the extraction settings.

Specify under which conditions should Studio extract the content inside the selected element.

For example, you might modify the a rule so that Studio will extract the content from an a element only if the a element is placed inside a text paragraph written in English. To do this, create a condition that will check the language of the paragraphs and the location of the a element inside the structure of the HTML documents.

  1. Select the a rule from the Parser rules list and click Edit... .
  2. On the Edit Rule page, click Edit next to the Conditions box.
  3. Select the <a> tag from the Element Context box and click Add Element….
  4. Enter p in the Element name field and click OK to close the Select Element page. Studio will now only look for I elements which are located under a <p> element.
  5. Click Add Attribute... to add an attribute condition for the <p> element.
  6. Enter language="en" in the Attribute field and enter true in the has value field.
  7. Click OK to add the language attribute to the paragraph element.
  8. Click OK again to close the Element Conditions page. Studio will now extract any content inside <a> elements only if they are located inside a paragraph written in English.

Attributes section

OptionDescription
Attributes

The localization setting which determine whether the attributes of an element will be editable after extraction.

Specify which of the attributes that could define the selected HTML element should be extracted as editable text in Studio and which attributes should be extracted as non-editable text.

For example, for situations where you do not want to translate the tooltip of a hyperlink, change the Translate property of the title attribute inside the a rule:

  1. Select the a rule from the Parser rules list and click Edit....
  2. Select the title attribute from the Attributes list and click Edit. This displays the Edit Attribute window for the title attribute.
  3. Make sure that the Translate attribute checkbox is unchecked and click OK.
  4. Click OK on the Edit Rule page to save your changes
The next time that you open an HTML document, Studio will extract the tooltips from hyperlinks but will not allow you to edit them.

Properties section

OptionDescription
Translate

The localization setting which determine whether the content of the selected element will be editable after extraction.

Specify if Studio should allow you to translate in the Editor the content extracted from the selected element.

You can set the Translate property to one of the following options:
  • Always translatable - You can edit the content extracted from the HTML element.
  • Translatable (but not in protected content) - You can edit the content extracted from the HTML element unless the HTML element has a protected content value inherited from its parent.
  • Not Translatable - Studio extracts and displays the content of the HTML element but does not allow you to edit it.
Whitespace

The setting which define how Studio deals with any extra whitespace characters it finds in the translatable content extracted from the selected HTML element.

Specify if you want Studio to keep or remove extra whitespace. To edit the settings for the whitespace in non-translatable content and in element attributes, use the Whitespace in tags option on the global Whitespace page.

Set the Whitespace property to one of the following:
  • Inherit from parent - The content extracted from the HTML element uses the same whitespace setting as its ancestor.
  • Always preserve - Studio never keeps whitespace as it is.
  • Normalize unless xml:space='preserve' - Studio replaces whitespace with a single space unless the element includes xml:space='preserve'.
  • Always normalize - Studio always replaces whitespace with a single space and ignores any xml:space='preserve' attribute.
Tag Type

The settings which controls how the HTML elements will be displayed in the Editor.

HTML elements are extracted and shown in the Editor as tags. The translatable content inside the elements is displayed as editable text.

Tags can be displayed as:

  • Inline - Inline tag show formatting information and the translatable content extracted from the HTML element is available for editing.
  • Structure - Structure tags usually contain information about the structure of the HTML document. Only translatable attributes inside structure elements are displayed in the Editor.
Segmentation Hint (applicable to inline tags)Segmentation hints helps Studio better segment the HTML document when converting it to a translatable format. Segmentation hints determine if Studio will position the element within a segment, outside of the segment or whether it will force a segmentation break.
Set the Segmentation Hint to one of the following:
  • Include with text - The tag is displayed with the HTML content when it has leading or preceding text. Example: for tags that specify a footnote marker, you will need to attach the marker to another word in the same sentence. Therefore, the tag should be included as part of the text.
  • Include - The tag will be displayed in the segment, even if it has no associated text.
  • May exclude, Undefined -The Editor determines whether the tag is part of the text.
  • Exclude - Studio will, where possible, use the tag or tag pair to segment the text. For example, if <p>...</p> or <br> tags are marked Exclude , then if an HTML document includes embedded HTML code, the HTML tags <p>...</p> and <br> will be used to segment the document. This segmentation is additional to the segmentation that is already applied to the embedding HTML code.
Formatting

The settings which define how the content extracted by the parser will look line in the Editor:

Click Edit and select one of the following options for each of the six available styles:
  • Inherit - Applies the style that is specified for the parent, if there is such a setting.
  • Activate - Applies that style to the text.
  • Deactivate - Does not apply that style to the text.

The Sample box shows a preview of how the text extracted by the rule will look in the Studio Editor.

Structure Information Properties section

(applicable to structure elements)

Structure information allows you to add additional context information to structure elements. You can then view this information in the Document Structure column and in the Document Structure tree which are available in the Editor view.

Click Add or Edit to define the following settings for a structure element:

Type of elementSettingsDescription
StandardOffers a list of standard HTML structure elements with predefined context information. Choose Custom if you want to create your own element and customize its context information
CustomFor custom elements you can specify the following properties:
Purpose

Choose Match if you want Studio to store the document structure information as additional information in the translation memory. Studio will then use this information during context matching when doing a lookup in the translation memory.

Choose Information if you do not want to store document structure information when confirming the segments that include the content of your HMTL element.

Document explorerSelect what information is displayed in the Document Structure tree in the Editor view. You can choose to display only the name of the element, the entire content of the element or no information at all.
NameSpecify a name for the element. By default, Studio also uses this name for the Code and Identifier fields but you can edit them if you want to use different names instead.
Code

Code is the abbreviation of an HTML element. Codes show where the segment text appears in the HTML document.

Studio displays the code of a structure element in the Document Structure column of the Editor. For example, a Heading element is displayed as in the Editor view. When translating, this information can be useful, as you may need to translate heading elements differently than paragraph elements.

If you selected Custom in the Standard field, you can give a custom code for the structure information of your element.

IdentifierA unique identifier which Studio can then use for other tasks like processing embedded content.
DescriptionSpecify a description for your element. The description is displayed in the Additional Information column of the Document Structure Information dialog box.
ColorSpecify the background color for displaying the element in the Document Structure column and in Document Structure Information dialog box.
FormattingSpecify the font, size, color and style for displaying the content of the element in the Editor view. You can choose to inherit the formatting from the element's parent or to activate/deactivate a certain style.