Contents
About
ILINX® Complex Contracts Extraction Module (CCEM) is a desktop application that can work with documents that are captured with ILINX Capture 9.0 and 9.1. It can be configured to provide user assistance with extracting and formatting data from document tables into a normalized format for copy or export. This functionality allows users to copy data from complex tables and reuse it with their business process rather than having to manually retype it all. To perform data extraction, CCEM requires that the documents are searchable, black and white PDFs. ILINX Format Converter can be utilized for this conversion.
ILINX Complex Contracts Extraction Module (CCEM) is installed locally on each user’s workstation. Users that utilize CCEM will be tracked for licensing. To facilitate the CCEM process, documents must be added into an ILINX Capture Workflow batch (one document per batch) and made available in one or more Capture Views. After opening CCEM, the user will run a view (Open Document) to return a list of available documents to work. They can open and view a document, extract data from into Capture batch fields and tables, then export the data for use in another application or process. CCEM supports single-document batches only.
To help ensure a higher rate of extraction success, scanned documents should be of high-quality print text and without skew, speckles, or other distortion. Tables that are missing either horizontal or vertical lines may not extract correctly, and tables spanned across pages need to be extracted one page at a time.
Toolbar
ICON
|
COMMAND
|
DESCRIPTION
|
---|---|---|
Open Document | This opens the Assigned Views window, which allows the user to search for and return a list of documents to open. Please see the Assigned views window information below. | |
Close Document | This closes the document that is open. It will prompt user to Save, Close, or Cancel. | |
Save | This saves the document that is open. | |
Submit | This submits the document for processing once the indexing is complete. The document will continue to the next step in the workflow processed defined by the administrator. | |
Export | This creates an export file extracted data. Files can be exported as xlsx or json format. The xlsx format creates a basic Excel spreadsheet, while the json format can be in a custom format configured by the ILINX administrator. | |
Download | This prompts to download the document to the local machine. | |
Viewer Settings | This opens the viewer Settings box where zoom settings, mouse wheel mode settings, and PDF Viewing options can be selected. After making their choices and clicking apply, the user will default to these settings when the log in again. There is also a button to access the administration settings. | |
Mouse Mode | The mouse can be changed to Area Selection or Pan mode. Area Selection provides rubber band selection of a table, while Pan allows standard text selection for copy and paste using Ctrl-C and Ctrl-V. | |
Page Navigation | This allows users to navigate through the document either page by page or to the end or beginning. The specific page number can be entered as well. | |
Zoom Out | This zooms out of the document. | |
Zoom | Zoom provides several options including:
|
|
Zoom In | This zooms in on the document. | |
Document page rotation | This will rotate the page for viewing. Page rotations are not saved and rotating a page from its original orientation can lead to inconsistent data extraction results. If multiple pages need rotated for extraction it should be performed outside of CCEM and re-submitted to Capture. Alternatively, completed document pages can be permanently rotated or annotated from ILINX Content Store if licensed for use. | |
About | This will open the About window for CCEM. Information here can be used when logging help tickets with ImageSource Support. |
Assigned Views Window
This window becomes available after clicking on Open document.
> Assigned Views – Available views for the users will be in the drop-down menu at the top of the window. Views are defined in ILINX Capture by an ILINX administrator and assigned by permissions.
Note: While logging in as an AD user, you may need to add two registry keys as noted in HTTP 400 Bad Request (Request Header too long) responses to HTTP requests.
> View search – This allows the user to specify the search criteria for a search. The fields available in the view search are determined by the selected Assigned view. The user can set the search Values and Operators for the provided fields. Once the needed values are entered, click Search.
> Reset all search fields – This button may be used to reset the search screen to default.
> Search Results – The search results will display below the View search box. Double click the document or highlight then click Open to open a document to be indexed. The results default order is defined by the ILINX administrator in the view search. The user can temporarily change the order by clicking on a search result column header and selecting the up/down arrow. The next time the search is ran it will default back to the configured sort order.
Document Information Panel
The Document Information Panel is the screen that is opened below the toolbar. Starting from left to right, it contains a vertical tool bar with searching functionality and document page thumbnails, a center document display of the current document page, and on the right a panel of document index data. The document index data contains individual header fields on the top with one or more tabs of table data on the bottom. These data tables store extracted document table data described under “Import Table window”. Grabbing and dragging the boarders between each panel will allow for the user to customize the panel sizes to see the information as needed. Double-Clicking on the boarders allows for closing of the panels which can be reopened by single-clicking on the > symbols once they are closed.
Vertical Tool Bar
> Thumbnails – Once a document is selected thumbnails of the pages will display in the vertical panel. Users may click on a page to select it. On the bottom of the panel, there is a selector button to change the size of thumbnail page display.
> Areas of Interest – The next icon in the vertical tool bar is the Area of Interest tool. This utility pairs words and/or phrases or text patterns from the document with search criteria defined by the ILINX administrator. The feature marks where these phrases are in the document so that users may quickly navigate to these items within the document. Clicking on the selection in the Area of Interest panel will navigate the user to that area of the document. To ensure correct results with this tool, use the Format Converter IXM option to “Remove searchable PDF text and re-OCR.”
> Search – This function allows users to manually search within the document that has been selected. Users will enter text into the search box, then they may select if the case and/or entire word must match what has been entered in the search box. They may choose to search in the current page or all pages. Once search criteria is entered, using the enter key or clicking the Search All button will run the search. Results will appear in the bottom portion of the vertical search menu. Find next and Find previous buttons will allow users to navigate through the results. Users may also click on the search results to navigate to that specific location in the document.
Document Viewer
This document viewer displays the document that has been selected and supports PDF documents only. Navigation in the document can be done by the thumbnails, AOI, or the search function on the vertical toolbar as well as by using the scroll bar in the display panel. This panel is where a user can rubber band select a table area on a page to be extracted into a selected index table.
> Rubber banding – To select data to be extracted, use the left mouse to click and then drag the red rubber band around the table. After the rubber banding is complete, the Table Import window will open. Note that a selection can only be made on one page at a time, i.e., it cannot span pages. It is also important to select an area slightly larger than the data the user wants to extract, i.e., slightly above/below and left/right of the table rows, columns, or borders the user wants to extract from. Selecting too little will exclude some table row or column data.
Table Import Window
This window appears after the rubber banding a table area of a document. Based on the table highlighted in the document, the predefined Target Table should be selected from the drop down. Alternatively, the user can click on an index table tab prior to performing the selection and it will appear as default Target Table.
> OCR Data – This displays the raw, original OCR data from the document table. The column drop-down selections in this table allows the user to specify what each column of data contains, which defines how the data will be parsed, split up and formatted into the data columns of the Target table displayed in the Preview table displayed at the bottom of the Table Import window. As a column-type drop-down selection is made, the Preview table will be automatically refreshed showing how the change effected the output data and if it has been parsed and formatted into the columns correctly. If a column is not needed, deselect it by selecting the blank row from the drop-down. The drop-down selections for column-types and their related data parsing logic is defined by the ILINX administrator.
> Preview Table Index – As described in OCR Data, this is the formatted output extracted from the OCR data. This is the data that will filled into the Target Table. The values in this preview can be edited in this screen or later in the Index Panel target table. If the data in this table was selected and is populating as desired, click OK and it will be added to the Target Table. If the data was not selected properly, or if there are any other issues and this data should not be indexed, click Cancel. The user can then rubber band again to reselect and reopen this dialog. Data can be corrected in this section prior to clicking OK.
Document Indexes Panel
The document indexes panel is horizontally divided into two areas. The top portion fills in the predefined document header properties that have been indexed. The bottom portion allows the user to navigate through each table of data that has been extracted from the document and will be available for export.
In the document header index window, the fields and tables are defined by project and can be customized based on the needs of customer. Possible document fields may include Document ID, Dates, Document types, or other fields based on the business use, these are set by the administrator.
In the document table index panel, the user will see each table that has been extracted from the document, these are visible as tabs. Clicking on the tab will display the extracted data items in rows. Corrections and changes can be made to the indexed data in different ways. For some of the fields, arrows may provide drop-downs and different selections. The drop-down selections are defined by the administrator. The other choice is to right click on the row that needs to be changed which opens a menu.
Right Click Menu:
> Add Row Below will add a new row below the selected Row.
> Add Row Last will add a new row to the end of the table.
> Duplicate Below will duplicate the row selected and insert it below the selected row.
> Duplicate Below will duplicate the row selected and insert it at the end of the table.
> Delete will delete the highlighted row(s) after prompting the user to be certain it should be deleted.
> Clear Contents will clear the contents of the highlighted row(s).
> Update Field Values will open the Update indexes window allowing the user to check the columns that should be updated by checking that column name and then entering the data that should be applied to the highlighted row(s). Columns listed are determined by the table that is being worked on. Press Save & Close after the changes have been entered. Unchecked columns will not update.
> Move up will move the highlighted row(s) up in the table. Can also use keyboard hot key Ctrl + Up Arrow.
> Move down will move the highlighted row(s) down in the table. Can also use keyboard hot key Ctrl + Down Arrow
Important Note: Cannot Undo – Please note that anything done using these edit options cannot be undone.
> Multiple Rows – All the above actions can be performed with one or multiple rows (Add, Duplicate, Delete, Update, Move). Standard table selections can be made:
− Block of Rows – Select the first row then while pressing the shift key select the last row or left click on the first row and drag the mouse down.
− Individual Rows – Select the first row then press / hold the Ctrl key while selecting other rows.
> Comments – On the bottom of the window is comments button. This allows users to add comments and notes as needed while processing the document. Subsequent users that open the document will be able to see existing comments and add additional. Comments cannot be modified.
> Save Your Work – While working on indexing a document, users should utilize the save button frequently to not lose their work. Newly extracted, added, or edited index data is not stored until the document is saved.
> Exporting Extracted Data – Once all the information has been indexed users may use export the data into a file that can be saved locally (xlsx, json or copy/paste).
> Submitting the Document – Once completed with all data extraction / indexing and processing tasks the document can be submitted, i.e., completed with the current processing and routed to the next step in the workflow defined by the administrator.
Troubleshooting
Document Viewing Messages
When opening a non-PDF document (xlsx, docx, etc.), it will prompt you to Save or Open it outside of CCEM into its native application. You cannot view or extract data from non-PDF documents within CCEM. You may still view and manually process the document outside of CCEM. You may also manually key in data into the document index panel within CCEM and submit / complete the document batch from CCEM. Your administrator may have also configured an option for you to have the system convert the document to a PDF for extraction. This would most likely appear in your indexing field panel as checkbox option labeled “Convert to PDF”. You would check this option and then submit the document. The workflow logic would route the document to be converted to a PDF with OCR text then automatically return it to the work queue to be processed. Once reprocessed it will appear again appear in the search results. The reprocessing may take a couple of minutes to occur.
This message indicates that the document was not correctly converted by the system to a PDF with OCR text data. Please note the document and contact your ILINX administrator to review.
Other Errors
For any of the following please contact your ILINX administrator for review.
> Missing or incorrect Areas of Interest. Note that it may not be possible to consistently find certain types of data from all documents.
> Document table data not extracting into the OCR Data table correctly. Note that it may not be possible to extract data from all types of tables or documents and document quality and consistency will also impact data extraction.
> OCR Data not outputting correctly into the Preview table. Please review each OCR Data column and ensure that the correct data type drop-down is selected for the column. If a column is not needed, deselect it by selecting the blank row from the drop-down. If it is still not outputting correctly, the data may be inconsistent requiring manual processing or additional extraction logic may need added.
> Validation Errors during json export. Please note or screenshot the errors. Review the related index table data for any missing or incorrect values that need corrected prior to exporting.