Documentation Center

Sample cases

The following examples show how the configurable options can affect the scoring results.
OptionScoring Scenario 1Scoring Scenario 2
Weights
  • word = 1 unit
  • number = 1 unit
  • outer placeholder = .25
  • inner placeholder = .5
  • word = 1 unit
  • number = 1 unit
  • outer placeholder = .05
  • inner placeholder = .025
Penalties
  • capitalization = .2
  • punctuation (non-alphanumeric) = .1
  • non-matching placeholder sequence = .1
  • capitalization = .05
  • punctuation (non-alphanumeric) = .01
  • non-matching placeholder sequence = .05
SummaryThis scenario has a big penalty for capitalization. For segments with few words, this can have a big impact on the final score. For extremely long segments, where few words are penalized, the effect is smaller.This scenario registers only about a 5% penalty per word.
ResultIf the only issues in the two segments being compared are capitalization, the worst possible score is an 80.0% match. (Capitalization exacts a .2 or 20% penalty)If the only issues in the two segments being compared is capitalization, the worst possible score is a 95.0% match.
The results in the following table reflect the scenarios described in the previous table. Also, the scores represent no repairs being done.
Lookup SegmentHit SegmentScenario 1 ResultsScenario 2 Results
"this is a segment""this is a segment"100100
"this is a segment""This Is A Segment"8095
"this is a segment""{1}this is a segment"94.198.7
"this is a segment""this is a{3} segment"88.999.3
"this is a segment""this is a segment."97.599.7
"this is a segment""this 'is' a segment"9099
"this is a segment""this segment"5050
"this is a segment""Need more Calgon!"00
"this is a segment""a segment merged together"33.333.3
"this is a segment""this segment is a"6060
"{1}this is a{2} segment""this is a segment"84.298.1
"{1}this is a{2} segment""This Is A Segment"67.393.2
"{1}this is a{2} segment""{1}this is a segment"89.499.3
"{1}this is a{2} segment""this is a{3} segment"92.698.1
"{1}this is a{2} segment""this is a segment."82.197.9
"{1}this is a{2} segment""this 'is' a segment"75.797.1
"{1}this is a{2} segment""this segment"42.149.0
"{1}this is a{2} segment""Need more Calgon!"00
"{1}this is a{2} segment""a segment merged together"29.632.9
"{1}this is a{2} segment""this segment is a"52.159.1
Note that the drastic difference in scoring for Scenarios 1 and 2 effectively illustrate the control available through the configurable options. Companies with varying average size segments can align weights and penalties to emphasize specific differences between the lookup and hit segments.