Document with Product Codes (creating a new language)
Situation description: too many errors in recognition of product codes
|When reading a document the program uses special knowledge
about the recognition language. It uses the language dictionary (to check
the correctness of words), its morphology model and some rules about text
arrangement. If a document contains a lot of "unnatural"
structures such as product codes a lot of recognition errors may occur.
This happens because the program reads such structures letter by letter
and does not use any additional information about their structure.
Solution: To improve the recognition of product codes we need to
create a new recognition language for this document.
To create a new recognition language:
- Select the Language Editor item in the Tools menu.
- Click the New button and in the opened dialog select Create
a New language based on existing one, then select a source
language for the new one.
- The Simple Language
properties dialog will open.
Set the following parameters for a new language (all
parameters can be set in the Simple
Language Properties dialog):
- The new language name - Codes.
- The basic alphabet to be used by your new language. This parameter
is set in the Alphabet field. If necessary, edit the alphabet
by clicking the
The created language should contain only .0123456789BDFGLRW
The dictionary to be used by the system (both for recognition
purposes and checking the spelling).
Here create the dictionary based on a regular expression. To do
- Select the Regular expression item in the Dictionary group
and enter the following expression: DRG|(B[0-9][0-9]|22.5)|(L[0-9])|(F[0-9][0-9][0-9])|(W([0-9]+))
To find out more about regular expressions see
"Regular expression" section.
Then set the recognition
language. Select English for the whole document (select it from the
languages list on the Standard toolbar) and the new language -
Codes - for the column with the product codes.
To set a new created language for a particular table column:
- Select the column and right-click it. Select the Properties item in
the displayed local menu.
- Select the necessary language from the Recognition languages
list and the Selected cells in the Apply to group on the
Block tab in the Properties dialog.