Documentation Center

Overview of Auto Split/Merge

The primary purpose for the auto split/merge feature is to facilitate the reconstruction of segments that have been manually altered by the user during the translation process. The user can reasonably expect that once a document has been translated in WorldServer, he should be able to completely leverage the same content later against the TM to which the translations were stored. This premise should hold true even if the user manually splits or merges segments during the translation process. The filters are incapable of reconstructing altered segments because such segments, by definition, are inconsistent with the filters segmentation strategy.

Even though the auto split/merge feature has been created to address segmentation differences that result from user-defined segments, it can also handle generic segmentation differences that may occur as a result of variations in segmentation across different filters and the TMs that have been created.

The core concept of the auto split/merge feature is that WorldServer tries to adjust the segmentation for previously split or merged segments, based on the TM matches it finds during the TM leverage process. During this process, TM match candidates are found and processed in the order in which they are retrieved. SP/ICE, exact and 100% match candidates for the segment are not guaranteed. The match or matches that are retrieved may not even be high scoring matches. In the event that SP/ICE, exact and 100% matches are not found, the system will subject the matches to the auto-split/merge process. The process applies the auto-split/merge sub-processes in the following order:
  1. standard auto-merge - Checks to see if the match candidate contains the current asset segment followed by the next asset segment. This process is extremely sensitive to formatting (placeholder) differences.
  2. hyper-merge - Same as the standard auto-merge, except that it ignores formatting differences.
  3. auto-split - Checks to see if the current match candidate represents the beginning text of the current asset segment. This process is extremely sensitive to formatting (placeholder) differences.

The auto-split process asserts that if a more complete match cannot be found, it is preferable to accept a 100% match, or better, for part of the segment, than to accept a lower overall score for the complete original segment. The merge process asserts a similar argument — that it is preferable to accept a higher collective score for a larger segment, than to accept lesser scores for multiple smaller segments. The remainder of this section provides details on each component of this technology.