Recognizing and capturing text in different scripts
When you use the mouse action, you can capture text from the screen even if that text is just a picture - a photograph for example. MultiTerm Widget does optical character recognition (OCR) of characters on the screen.
About this task
You can improve the recognition of characters written in other languages and scripts by downloading and installing language-specific training data.
The widget factory settings are optimized to recognize English characters. Because English does not usually include some characters (for example, á a ä â), the software may not recognize them correctly without training.
Training data provides the widget software with a different set of reference characters. Google's tesseract-ocr page has suitable training data for many languages and scripts.
Procedure
- In your browser, go to Tesseract documentation.
- Download the appropriate training package as described on that page. For example, for German, download
deu.traineddata. - Rename the file to
generic.traineddata. - Ensure that MultiTerm Widget is stopped.
- In Windows Explorer, navigate to the MultiTerm Widget installation folder. Usually this is %programfiles%\Trados\MultiTerm\MultiTerm18.
- Rename the existing
generic.traineddatafile (for example, togeneric.traineddata.save), and copy the downloadedgeneric.traineddatafile to this folder. - Restart MultiTerm Widget.