Improved TMX import quality
- Source and target do not have the same number of embedded tags or placeholders. The import process attempts to repair the entry by adding placeholders to the end of the segment with the missing placeholders. (This is required since WorldServer requires the source and target to contain the same number of placeholders.) While this makes the entry importable, it does not guarantee that the added placeholders are at the proper place. As result, this entry should either be suppressed or prevented from yielding a 100% match since the placeholder repair process is not currently capable identify the current place to insert added placeholders.
- The source and target contain multiple substituted tags. TM technologies like TRADOS allow the source and target to contain different markup. However, TRADOS-generated TMX files do not provide TMX information that would allow the tags in the source to be definitively aligned to the substituted tags in the target. In the event that there is only one substituted singleton and/or one substituted paired tag, WorldServer can determine the proper alignment. However, when there are more than one of the singleton tag or more than one paired tag substituted, the alignment is no longer guaranteed.
- The source and/or target contains duplicated tags. If the tags are identical, there is no issue since the targets will be generated identically regardless of which instance is ordered first. However, the impact of this goes beyond the alignment process in that the source could change in the asset leading to uncertain results. The problem is that there is no clear way to know which repeated tags in the source should be mapped to the same repeated tag in the target. Similarly, if one of the repeated tags was substituted in the target for a different tag, it is not possible to know which one should be mapped to the substituted tag.
The alignment of tags should not be considered guaranteed unless the mapping of the tags in the source can be mapped to the tags in the target in a definitive way. In the past, the above entries were allowed to reach the TM, thus having the potential to lower the overall quality of the TM. In WorldServer 7.5.1 Service Pack 2, SDL introduced the distinction of non-guaranteed TMX entries, and implemented an enhancement that prevents any TMX entry deemed non-guaranteed from generating a 100% match. They can in fact lead to high fuzzy matches. When TMX files are imported in current versions of WorldServer, the alignment process prepends a series of special characters to the source segment text. As a note to the translator, the alignment process prepends a string to the target segment text as well, which notifies the translator that the placeholder order should be checked.
The duplicated tag scenario is a fairly common scenario, and the duplication of certain tags does not create any risk. However, you should decide which tags are to be considered unsafe and treated as non-guaranteed for this scenario. By default, all duplicated tags will result in a non-guaranteed TMX entry unless specifically identified as being safe. In order to ignore certain tags, these tags will need to be listed in the optional ignoredTags.properties file.
b
th td strong br em
You will need to create this file. Note that the above examples represent tags that SDL feels should be among the tags ignored.