The scoring algorithm calculates a percentage effort score based on the cheapest path of transforming one segment into another. The underlying process uses the string difference algorithm.
The basic idea is that the system associates a cost to each step required to transform one segment to the other based on the elements that make up the segment. The cost is a factor of the transformation required. For example:
- Inserting a word may have an associated cost of 1 unit of work.
- Correcting the punctuation following a word may have a significantly smaller cost.
- Matching elements yield a zero-cost edit.
The cost of each edit is averaged out over the total number of edits to yield a final score that represents the transformation effort.
WorldServer translation memory currently supports the transforms described in the following table. There is no plan to make changes to this table.
| Transform | Default Cost | Configurable | Notes |
|---|
| delete element | element weight | No | Element weights are configurable, but a delete always costs the weight of the element. | | insert element | element weight | No | (same as for delete elment) | | keep element | no cost | No | The elements match. The match is determined independently of surrounding (preceding or following) punctuation or whitespace. |
|
The following table describes the supported correction penalties:
| Correction Penalty | Default Cost | Configurable | Notes |
|---|
| Capitalization | .01 unit | Yes | Applies only to words. | | Punctuation | .005 unit | Yes | Covers all non-alphanumeric data, not just punctuation. Penalty is exacted whenever a prefix or suffix differs between the compared elements. | | Placeholder | outer: .01 unit inner: .25 unit | Yes | Represents a minor penalty assessed when the two placeholders being compared have different sequence numbers. |
|
The maximum cost of an edit step is the weight of the element being transformed. Insert and delete transforms automatically result in a maximum cost. As a result, penalties apply only to keep transforms.
The penalties are weighted, not absolute. These penalties are applied to the transform step, and thus are averaged across all transform steps.