Using the TMX migration utility
WorldServer provides a TMX migration utility in the form of a Java class named Trados2Idiom. Although named for the TRADOS product, it is used for migrating from other third-party TMX files as well. If the files are in a TMX format preceding 1.4b, they must be adjusted slightly so they can be properly imported into WorldServer. Often, this requires preprocessing the TMX files to address data aspects that are tool dependent.
Trados2Idiom utility, run the following command on the WorldServer host:
java
com.idiominc.ws.autoalignment.Trados2Idiom <-guaranteedEntriesOnly>
<-showStatsSegments> <-statsFile
file> <TRADOS TMX file>
<error file>
<WorldServer-friendly TMX
file>
- guaranteedEntriesOnly – (optional) If this flag is provided, entries that are not considered guaranteed will not be exported to the newly generated WorldServer TMX file. Not providing this flag will lead to all valid TM entries being exported to the new WorldServer-friendly TMX. Not all entries marked as being non-guaranteed are necessarily aligned incorrectly. Rather, the heuristics employed simply cannot guarantee the correctness. Suppressing non-guaranteed entries may mean filtering out what may amount to good TM entries. However, it also means keeping out what may be bad entries. SDL encourages customers to apply this flag. While it will affect the leverage of old data, it is the safest approach. Nonetheless, customers are also encouraged to assess the impact of this flag, and make their own informed decision.
- showStatsSegments – (optional) This option tells the statistics engine to capture segment data for each of the counters so that they can be exported to the stats file. This allows you to identify segments that fit into certain scenarios (such as which had tags swapped, which are marked as guaranteed, etc.)
- statsFile – (optional) Path and file name of the file used to collect statistics data.
The statistics include the number of entries exported, the count of swap and substituted tag entries, count of entries with placeholder repairs, and the number of non-guaranteed entries.
For example, here is a sample from a status file with the showStatsSegments option:
Total entries with swapped tags only: 6 Total entries with tag substitutions (with possible tag swaps): 6 Total non-guaranteed entries : 6 Total PH range issue entries : 0 Total PH repaired entries : 1 Total invalid or empty TMX entries : 0 Total exported entries : 6 of 6
Definitions
- swapped tags: entries having tags that have been shifted around during the translation
- substituted tags: entries having tags that have been exchanged for another tag
- non-guaranteed: entries having tags that cannot be definitively aligned by the current implementation. This includes segments requiring placeholder repairs, segments with more than one singleton or paired tag substitutions, or segments with tags having different attribute values. Use the
guaranteedEntriesOnlyoption described above to suppress these from being exported. - PH repaired: entries that required placeholder repairing between the source and target segments.
- PH range issue: this is an external tracking stat for engineering only.
Invalid or empty : entries in the TMX that are either empty or are missing a segment (source or target.) Wordless entries are considered empty.
Exported: total number of entries exported. This is generally the total number of TMX entries minus the invalid entries count, and optionally minus the non-guaranteed if the
guaranteedEntriesOnlyoption is provided.
- Example
java com.idiominc.ws.autoalignment.Trados2Idiom -guaranteedEntriesOnly –showStatsSegments -statsFile stats_amv.tmx.log amv.tmx err_amv.log > ws_amv.tmx
- alignment issues: the number of tags do not match between the source and target segments.
- tag mismatches: tags differ either by name or attribute value.