Samples

Document with Product Codes (creating a new language)

Situation description: too many errors in recognition of product codes

Document with article codea

(NewLang.tif)

When reading a document the program uses special knowledge about the recognition language. It uses the language dictionary (to check the correctness of words), its morphology model and some rules about text arrangement. If a document contains a lot of "unnatural" structures such as product codes a lot of recognition errors may occur. This happens because the program reads such structures letter by letter and does not use any additional information about their structure.

Solution: To improve the recognition of product codes we need to create a new recognition language for this document.

To create a new recognition language:

  1. Select the Language Editor item in the Tools menu.
  2. Click the New button and in the opened dialog select Create a New language based on existing one, then select a source language for the new one.
  3. The Simple Language properties dialog will open.

Set the following  parameters for a new language (all parameters can be set in the Simple Language Properties dialog):

  1. The new language name - Codes.
  2. The basic alphabet to be used by your new language. This parameter is set in the Alphabet field. If necessary, edit the alphabet by clicking the button. 

    The created language should contain only .0123456789BDFGLRW characters.

  3. The dictionary to be used by the system (both for recognition purposes and checking the spelling).

    Here create the dictionary based on a regular expression. To do this:

    • Select the Regular expression item in the Dictionary group and enter the following expression: DRG|(B[0-9][0-9]|22.5)|(L[0-9])|(F[0-9][0-9][0-9])|(W([0-9]+))

    To find out more about regular expressions see "Regular expression" section.

Then set the recognition language. Select English for the whole document (select it from the languages list on the Standard toolbar) and the new language - Codes - for the column with the product codes.

 

To set a new created language for a particular table column:

  1. Select the column and right-click it. Select the Properties item in the displayed local menu.
  2. Select the necessary language from the Recognition languages list and the Selected cells in the Apply to group on the Block tab in the Properties dialog.