Edit/Add/Copy Rule page
Use the parser rule settings available on the Add Rule, Edit Rule and Copy Rule pages to define the properties of the HTML parser rules. These settings help WorldServer sort better translatable from non-translatable text and display correctly the content extracted from the HTML documents.
- About Rules, Attributes and Conditions
- Each element in HTML documents can have specific attributes. Attributes give additional information about the HTML element. For example, the <a> element indicates a hyperlink. An "href" attribute applied to the <a> element adds a web address to the hyperlink. Similarly, a "title" attribute adds a ToolTip which is visible when hovering with the mouse over the hyperlink. The content between <a> (the opening tag) and </a> (the closing tag) define the title of the hyperlink.
- Usually, you would want WorldServer to extract and allow you to translate the title of the hyperlink and its ToolTip. In contrast, you would not normally want WorldServer to allow you to edit the address of a hyperlink because altering this address may break the link. These conditions are specified as default settings for the <a> element on the Parser Rules page. However, there may be situations when you would want to translate the address of the hyperlink because the link in your translation should point to the localized version of the website which has a different address. Also, sometimes you may not want to translate the ToolTip of the hyperlinks, for example when the ToolTip shows a numeric value.
| Rule section | Description |
|---|---|
| Name | The name of the element for which you are modifying the parser rule. For example, the name of the rule which affects the <a> elements of HTML documents is named a.
|
| Conditions | The conditions which define the extraction settings. Specify under which conditions should WorldServer extract the content inside the selected element. For example, you might modify the a rule so that WorldServer will extract the content from an a element only if the a element is placed inside a text paragraph written in English. To do this, create a condition that will check the language of the paragraphs and the location of the a element inside the structure of the HTML documents.
|
- Attributes section
| Option | Description |
|---|---|
| Attributes | The localization setting which determines whether the attributes of an element becomes editable after extraction. Specify which of the attributes that could define the selected HTML element should be extracted as editable text in WorldServer and which attributes should be extracted as non-editable text. For example, for situations where you do not want to translate the ToolTip of a hyperlink, change the Translate property of the title attribute inside the a rule:
The next time that you open an HTML document, WorldServer will extract the ToolTips from hyperlinks but will not allow you to edit them. |
- Properties section
| Option | Description |
|---|---|
| Translate | The localization setting which determines whether the content of the selected element becomes editable after extraction. Specify if WorldServer should allow you to translate in the Editor the content extracted from the selected element.
You can set the Translate property to one of the following options:
|
| Whitespace | The setting which defines how WorldServer deals with any extra whitespace characters it finds in the translatable content extracted from the selected HTML element. Specify if you want WorldServer to keep or remove extra whitespace. To edit the settings for the whitespace in non-translatable content and in element attributes, use the Whitespace in tags option on the global whitespace characters. Set the Whitespace property to one of the following:
|
| Tag Type | The settings which control how the HTML elements are displayed in the Editor. HTML elements are extracted and shown in the Editor as tags. The translatable content inside the elements is displayed as editable text.
Tags can be displayed as:
|
| Segmentation Hint (applicable to inline tags) | Segmentation hints help WorldServer better segment the HTML document when converting it to a translatable format. Segmentation hints determine if WorldServer will position the element within a segment, outside of the segment or whether it will force a segmentation break.
Set the Segmentation Hint to one of the following:
|
| Formatting | The settings which define how the content extracted by the parser looks like in the Editor. Click Edit and select one of the following options for each of the available styles:
The Sample box shows a preview of how the text extracted by the rule looks in the WorldServer Editor. |
- Structure Information Properties section
- (applicable to structure elements)
| Type of element | Setting | Description |
|---|---|---|
| Standard | Offers a list of standard HTML structure elements with predefined context information. Choose Custom if you want to create your own element and customize its context information. | |
| Custom | For custom elements you can specify the following properties: | |
| Purpose | ||
| Document Explorer | Select what information is displayed in the Document Structure tree in the Editor view. You can choose to display only the name of the element, the entire content of the element or no information at all. | |
| Name | Specify a name for the element. By default, WorldServer also uses this name for the Code and Identifier fields but you can edit them if you want to use different names instead. | |
| Description | Specify a description for your element. The description is displayed in the Additional Information column of the Document Structure Information dialog box. | |
| Color | Specify the background color for displaying the element in the Document Structure column and in Document Structure Information dialog box. | |
| Formatting | Specify the font, size, color and style for displaying the content of the element in the Editor view. You can choose to inherit the formatting from the element's parent or to activate/deactivate a certain style. |