Code Printouts (plain text formatted with spaces)
Situation description: this example has two peculiar features which greatly influence the
recognition quality:
- all left indents are not saved as spaces but by specifying the value of paragraph
indents; left indents are not saved in the TXT format; some
lines are merged into one paragraph, this paragraph is saved in the TXT format
as one text line;
- too many errors during the recognition of programming language structures.
(listing.tif) |
Solution:
-
FineReader has a special option for the correct recognition of such
documents: Plain text formatted with spaces. It indicates that the text is
formatted in one column and set in monospaced font of a same size. In the recognized text
left indents will be represented as spaces; every line is made a
separate paragraph and the original paragraphs will be separated by empty lines.
All this helps to retain the original text formatting when saving in TXT format. To set this option:
- Select the Plain text formatted with spaces item on the Recognition
tab of the Options dialog (Tools>Options menu)
in the Document type group.
- For good recognition of code printouts it is necessary to set a special
recognition language. To do this:
- Select the Choose more languages item in the language
list on the Standard toolbar and in the opened Recognition
language dialog select the C++ item.
Note: If code printouts contain some additional text
comments, select two recognition languages to read the document correctly:
the programming language and the language of text comments.
|
|