Contents
About
ILINX Capture Zonal Recognition, a key component of the ILINX ECM suite, is a zonal Optical Character Recognition (OCR) product. It enables ILINX Capture workflows to recognize areas in scanned documents, and then translate that recognized information into metadata attached to the document in your repository. These capabilities are designed to work with highly structured forms-based images such as scanned loan applications, web forms output, or PDF forms output.
This guide contains information on the installation and configuration processes for ILINX Capture Zonal Recognition. You can also find instructions for integrating this product’s functionality into an ILINX Capture workflow in the document titled ILINX Capture Technical Guide.
Assumption: This guide assumes that installation of the ILINX Capture product has taken place. For complete installation documentation, please see the ILINX Installation Guide.
ILINX Capture Zonal Recognition is powered by ABBYY FlexiCapture. ABBYY and FlexiCapture are registered trademarks of ABBYY.
Installing
To install ILINX Capture Zonal Recognition, perform the following steps:
Step 1: Unzip the zip file. This is only a temporary working location.
Step 2: Run the .msi installer file.
Step 3: Accept the license agreement and follow the instructions in the installation wizard.
Step 4: Specify a different installation folder if desired. Otherwise, accept the default folder and complete the installation.
System requirements
For general information on supported and recommended hardware, OS, web browsers, databases etc. for ILINX products, please see document titled: “ILINX Support Matrix”.
Configuration
Enables optical character recognition and automates the process within the workflow
Note: the ILINX Zonal Recognition server software is required to use this IXM. Additionally, a GetBatchInfo with the Load Files box checked must precede it in the workflow.
ILINX Capture Zonal Recognition, a key component of ILINX ECM suite, enables ILINX Capture workflows to copy areas of scanned documents and translate the image snippets into document metadata. This allows you to automatically group inbound pages into one or more documents as well as extract information from the pages and apply the extracted information as metadata to the documents. These capabilities are designed to work with highly structured forms-based images such as: scanned loan applications, web forms output, or PDF forms output. Capture workflow designers can add the Zonal Recognition processing step anywhere in your workflow process to perform these automated separation and extraction operations. The workflow IXM provides a graphical mapping tool that allows you, the designer, to identify the regions of the image that will be extracted and sent to the Zonal Recognition server for processing. You can also test the extraction mappings on one or more sample documents to see the results of their extraction definitions and ensure correct placement of the extraction zones. The Zonal Recognition server is sold and deployed separate from the ILINX Capture server to allow you to scale and tune your deployment based on your optical character recognition (OCR) processing load.
Note: This feature is designed to work with highly structured content (content that is formatted precisely and consistently, such as scanned forms). This feature is not appropriate for unstructured or semi-structured content. Accommodations for unstructured and semi-structured content can be found in the independent ILINX Advanced Capture product.
You can apply zonal recognition to your capture process by configuring the ZonalRecognition IXM into a batch profile’s workflow in the Workflow Designer.
GetBatchInfo: Required
SetBatchInfo: Required
Steps
Step 1: Click the configure button to open the Zonal Recognition Configuration dialog.
Step 2: When the web service has successfully connected, click on your desired doc type in the left-hand panel.
Step 3: Right-click on Template file and select Add template file. This will open a new dialog box that displays your available files.
Step 4: Select a file to act as your template and click The selected file will appear in the display panel.
Step 5: Right-click on the template file in the display panel to access a set of options.
> Adding a separation zone will allow you to break up the document as you see fit.
> Adding an index zone allows you to designate an area of the document for data extraction.
> The remaining options adjust how the document appears in the display panel.
Step 6: After selecting one of the zone options, draw a box around the desired area of the document by clicking in the display panel and dragging the cursor.
Step 7: A dialog box will appear that displays options for configuring the recognition process.
> Image zone – A representation of the zone you drew on the document.
> Text type – A drop-down list from which you can pick how the text was made on the original document (e.g., hand printed, typewriter, etc.).
> Test – Click this button to have ZonalRecognition evaluate how well it recognizes the text.
> Recognition result – The text that ZonalRecognition recognized within the image zone.
> Lowest char confidence level (%) – The level of certainty that the text in the image zone was correctly recognized; this field displays the percentage of characters that is least certain.
Note: Confidence level can be affected by a variety of factors, such as font type and scan quality.
> Expected text – A field where you can enter the text that you expected ZonalRecognition to recognize in the image zone. This field appears only if you are working with a separation zone.
> Copy result to index field – A drop-down list from which you can choose the doc level index field that you want to populate with the recognized text. This field appears only if you are working with an index zone.
When you are satisfied with the configuration of the dialog box, click OK.
Step 8: Your configured zone will now display in the appropriate section beneath the display panel. You can right-click on the zone entry to edit the zone, test it again, or delete it. When you are satisfied with your zones, click OK.
Note: If you want to incorporate zonal recognition into your batch profile’s workflow, there are a few things to keep in mind. Please read the following details of the feature.
Document and/or Form Design
If you have control over the design of your document or form, try to create plenty of white space around the text you will be reading with OCR. This will allow you to create a larger box around the text, increasing the amount of document skewing the software can accommodate before losing accuracy and limiting the chance of picking up other surrounding text.
One caveat to this is if the document is badly skewed (10+ degrees), causing the text to be read as two lines and/or incorrectly (see example below).
Recommended Scanning Resolutions
Scanning at higher resolutions will greatly improve OCR; use 400 DPI or higher for best OCR accuracy.
Kofax VRS (Virtual Rescan)
Kofax VRS is highly recommended and does a good job of removing artifacts and background noise on your scanned documents. Otherwise the artifacts and background noise can be interpreted as part of the text you are trying to capture and provide less than optimum results.
Always use caution when configuring background removal as you may remove or visually degrade text you want to keep.
Kofax VRS also helps alleviate skewing on scanned pages, increasing the accuracy of OCR.
Skewed Pages
Badly skewed pages will force the text you are trying to OCR outside the “read” area you defined by drawing a box around the text. Any text that falls partially or completely outside of that area will be either misread or not read at all. Skewing can also move the “read” area into adjacent text, which will reduce accuracy.
Drawing a box too tightly around the text you want to read will reduce the amount of skew the software can accommodate before losing accuracy.
Upside-down or vertical text is not currently supported.