Adding or editing an HTML parser rule
Use the parser rule settings available on the Add Rule, Edit Rule and Copy Rule pages to define the properties of the HTML parser rules. These settings help SDL Trados Studio better sort translatable from non-translatable text and correctly display the content extracted from the HTML documents.
About rules, attributes and conditions
Each element in HTML documents can have specific attributes. Attributes give additional information about the HTML element. For example, the <a> element indicates a hyperlink. An "href" attribute applied to the <a> element adds a web address to the hyperlink. Similarly, a "title" attribute adds a tooltip which is visible when hovering with the mouse over the hyperlink. The content between <a> (the opening tag) and </a> (the closing tag) define the title of the hyperlink.
The image below shows an example of an <a> element and its components:
Usually, you would want Studio to extract and allow you to translate the title of the hyperlink and its tooltip. In contrast, you would not normally want Studio to allow you to edit the address of a hyperlink because altering this address may break the link. These conditions are specified as default settings for the <a> element on the Parser Rules page . However, there may be situations when you would want to translate the address of the hyperlink because the link in your translation should point to the localized version of the website which has a different address. Also, sometimes you may not want to translate the tooltip of the hyperlinks, for example when the tooltip shows a numeric value.
For such situations, when you do not want Studio to apply the default settings or when you want to teach Studio to deal with elements not mentioned on the parser rules list, customize the existing parser rules.
You modify a rule by editing its attributes and conditions. The next time that you open an HTML document, Studio will no longer apply the default settings for the HTML element with the customized parser rule.
The HTML Add/ Edit rule page
Click Add... Edit... or Copy on the Parser page to open the Edit Rule, Add Rule or the Copy Rule page where you can configure the properties of the HTML rules.
Rule section
| Option | Description |
|---|---|
| Name | The name of the element for which you are modifying the parser rule. For example, the name of the rule which affects the |
| Conditions | The conditions which define the extraction settings. Specify under which conditions should Studio extract the content inside the selected element. For example, you might modify the
|
Attributes section
| Option | Description |
|---|---|
| Attributes | The localization setting which determine whether the attributes of an element will be editable after extraction. Specify which of the attributes that could define the selected HTML element should be extracted as editable text in Studio and which attributes should be extracted as non-editable text. For example, for situations where you do not want to translate the tooltip of a hyperlink, change the Translate property of the
The next time that you open an HTML document, Studio will extract the tooltips from hyperlinks but will not allow you to edit them.
|
Properties section
| Option | Description |
|---|---|
| Translate | The localization setting which determine whether the content of the selected element will be editable after extraction. Specify if Studio should allow you to translate in the Editor the content extracted from the selected element.
You can set the Translate property to one of the following options:
|
| Whitespace | The setting which define how Studio deals with any extra whitespace characters it finds in the translatable content extracted from the selected HTML element. Specify if you want Studio to keep or remove extra whitespace. To edit the settings for the whitespace in non-translatable content and in element attributes, use the Whitespace in tags option on the global Whitespace page.
Set the Whitespace property to one of the following:
|
| Tag Type | The settings which controls how the HTML elements will be displayed in the Editor. HTML elements are extracted and shown in the Editor as tags. The translatable content inside the elements is displayed as editable text. Tags can be displayed as:
|
| Segmentation Hint (applicable to inline tags) | Segmentation hints helps Studio better segment the HTML document when converting it to a translatable format. Segmentation hints determine if Studio will position the element within a segment, outside of the segment or whether it will force a segmentation break.
Set the Segmentation Hint to one of the following:
|
| Formatting | The settings which define how the content extracted by the parser will look line in the Editor:
Click Edit and select one of the following options for each of the six available styles:
The Sample box shows a preview of how the text extracted by the rule will look in the Studio Editor. |
Structure Information Properties section
(applicable to structure elements)
Structure information allows you to add additional context information to structure elements. You can then view this information in the Document Structure column and in the Document Structure tree which are available in the Editor view.
Click Add or Edit to define the following settings for a structure element:
| Type of element | Settings | Description |
|---|---|---|
| Standard | Offers a list of standard HTML structure elements with predefined context information. Choose Custom if you want to create your own element and customize its context information | |
| Custom | For custom elements you can specify the following properties: | |
| Purpose | Choose Match if you want Studio to store the document structure information as additional information in the translation memory. Studio will then use this information during context matching when doing a lookup in the translation memory. Choose Information if you do not want to store document structure information when confirming the segments that include the content of your HMTL element. | |
| Document explorer | Select what information is displayed in the Document Structure tree in the Editor view. You can choose to display only the name of the element, the entire content of the element or no information at all. | |
| Name | Specify a name for the element. By default, Studio also uses this name for the Code and Identifier fields but you can edit them if you want to use different names instead. | |
| Code | Code is the abbreviation of an HTML element. Codes show where the segment text appears in the HTML document. Studio displays the code of a structure element in the Document Structure column of the Editor. For example, a Heading element is displayed as If you selected Custom in the Standard field, you can give a custom code for the structure information of your element. | |
| Identifier | A unique identifier which Studio can then use for other tasks like processing embedded content. | |
| Description | Specify a description for your element. The description is displayed in the Additional Information column of the Document Structure Information dialog box. | |
| Color | Specify the background color for displaying the element in the Document Structure column and in Document Structure Information dialog box. | |
| Formatting | Specify the font, size, color and style for displaying the content of the element in the Editor view. You can choose to inherit the formatting from the element's parent or to activate/deactivate a certain style. |