What is an OCR System?

An Optical Character Recognition (OCR) system enables you to input printed documents into your computer automatically via a scanner.

FineReader is an omnifont optical text recognition system. As a result it can recognize texts set in practically any font without any prior training. FineReader features high recognition accuracy and low sensitivity to print defects due to its incorporation of a special recognition technology based on the principles of Integral Purposeful Adaptable (IPA) perception.

The document input process can be divided into two stages:

  1. Scanning. During the first stage the scanner acts as the computer's "eye". It looks at the image and transfers it into the computer. The acquired image is nothing more than a picture, a set of black, white and color dots impossible to edit in any word processor.
  2. Recognition. During the second stage FineReader carries out OCR image processing.

Let’s take a closer look at the second stage

FineReader OCR image processing involves analyzing the image file transmitted by the scanner (layout analysis) and recognizing each character. The layout analysis (selecting the recognition areas, tables, pictures, lines, and individual characters) and image reading processes are closely related. Page layout analysis is more accurate if the nature of the text is known to the application.

As mentioned previously, the image recognition process is based on the principles of IPA perception.

These three principles determine the system's behavior. The system generates a hypothesis concerning a recognition object (a character, part of a character, or several glued characters) and then accepts or rejects this hypothesis according to whether the structural elements are present. These structural elements are computer equivalents of character parts crucial for human perception (arcs, circles, dots, etc.). The application then adapts itself to the text according to the degree of accuracy attained. Purposeful searching and context information enable the system to recognize even torn and distorted characters, rendering it almost insensitive to print defects.

The final result is the recognized text that you see in the FineReader Text window, a text you can edit and save in any convenient format.

