When to Use Auto-Split/Merge
The auto-split/merge feature can be optionally used within the WorldServer environment. By default, it is partially enabled. The merge processes within the auto-split/merge feature are generally deemed more reliable than the auto-split process. When an appropriate merge candidate is found, there is a guarantee that both asset segments involved will be leveraged higher against the applied TM than they would be on their own. (There is no guarantee that the larger merged segment will lead to a 100% or better match.) For auto-splits, the only known quantity is that the first part of the segment will result in a 100% match, while the new segment created from the second part of the original segment may not even have a TM match in the applied TM.
There are two processes during which the auto-split/merge process is generally applied: ICE lookup and fuzzy lookup. If translators are expected to manually split or merge segments during translations, then the ICE matches that would be expected after re-leverage on the same content depend on whether the split/merge options for the ICE lookup are enabled. For this reason, the options for the ICE lookup process are enabled by default. The hyper-merge option does not apply to the ICE match lookup process. Both merge options are enabled for the fuzzy lookup process by default. The auto-split option is disabled by default for the fuzzy lookup process.
In general, the split/merge technology can be effectively used when significant segmentation differences exist between the asset and the TM. For instance, consider using the split technology when the TM mostly contains sentence-level entries and the asset file type produces paragraph-level segments. The split process can potentially break up the asset segments to make better matches against the TM entries.
Similarly, the merge process can be effectively used when the asset segments are produced at the sentence-level, but the TM entries were created at the paragraph-level. The merge process can progressively merge successive asset segments to match against a larger TM entry.
Both technologies can be used in tandem to leverage against a TM that may contain both sentence and paragraph-level entries. It is worth restating that this technology is only employed in the event that a qualifying match cannot be found for the original segment.