Contents
Have another version? ILINX version 9.1
About
ILINX Complex Contracts Extraction Module (CCEM) is a desktop application that can work with documents that are captured with ILINX Capture. It can be configured to provide user assistance with extracting and formatting data from document tables into a normalized format for copy or export. This functionality allows users to copy data from complex tables and reuse it with their business process rather than having to manually retype it all. To perform data extraction, CCEM requires that the documents be searchable, black and white PDFs. ILINX Capture Format Converter can be utilized for this conversion.
ILINX Complex Contracts Extraction Module (CCEM) is installed locally on each user’s workstation. Users that utilize CCEM will be tracked for licensing. To facilitate the CCEM process, documents must be added to an ILINX Capture Workflow batch (one document per batch) and made available in one or more Capture Views. After opening CCEM, the user will run a view (Open Document) to return a list of available documents to work. They can open and view a document, extract data from the open document into Capture batch fields and tables, and then export the data for use in another application or process.
Note: CCEM only supports batches containing a single PDF file
To help ensure a higher rate of extraction success, scanned documents should be of high-quality print text and without skew, speckles, or other distortion. Tables that are missing either horizontal or vertical lines may not extract correctly, and tables spanned across pages need to be extracted one page at a time.
Toolbar
ICON
|
COMMAND
|
DESCRIPTION
|
---|---|---|
Open Document | This opens the Assigned Views window, which allows the user to search for and return a list of documents to open. Please see the Assigned Views window information below. | |
Close Document | This closes the open document. It will prompt the user to Save, Close, or Cancel. | |
Save | This saves the open document. | |
Submit | This submits the document for processing once the indexing is complete. Continue to the next step in the workflow process, as defined by the Administrator. | |
Export | This exports the current values in all batch-level index fields to either an Excel document or JSON file. The xlsx format creates a basic Excel spreadsheet, while the JSON format can be in a custom format configured by the ILINX Administrator. | |
Download | This prompts you to download the document to the local machine. | |
Viewer Settings | This opens the Viewer Settings box where zoom settings, mouse wheel mode settings, and PDF viewing options can be selected. After making their choices and clicking apply, the user will default to these settings when they log in again. There is also a button to access the administration settings. | |
Mouse Mode | The mouse can be changed to Area Selection or Pan mode. Area Selection provides a rubber band selection of a table, while Pan allows standard text selection for copy and paste using Ctrl+C and Ctrl+V. | |
Page Navigation | This allows users to navigate through the document either page by page or to the end/beginning. The specific page number can be entered as well. | |
Zoom Out | This zooms out of the document. | |
Zoom | Zoom provides several options including:
|
|
Zoom In | This zooms in on the document. | |
Document page rotation | This will rotate the page for viewing. Page rotations are not saved and rotating a page from its original orientation can lead to inconsistent data extraction results. If multiple pages need to be rotated for extraction, it should be performed outside of CCEM and re-submitted to Capture. Alternatively, completed document pages can be permanently rotated or annotated from the ILINX Content Store if licensed for use | |
About | This will open the About window for CCEM. Information here can be used when logging help tickets with ImageSource Support. |
Assigned Views Window
This window becomes available after clicking on Open document.
> Assigned Views – Available views for the users will be in the drop-down menu at the top of the window. Views are defined in ILINX Capture by an ILINX Administrator and assigned by permissions.
Note: While logging in as an AD user, you may need to add two registry keys as noted in HTTP 400 Bad Request (Request Header too long) responses to HTTP requests.
> View search – This allows the user to specify the search criteria for a search. The fields available in the view search are determined by the selected Assigned view. The user can set the search values and operators for the provided fields. Once the needed values are entered, click ‘Search.’
> Reset all search fields – This button is used to reset the search screen to default.
> Search Results – The search results will be displayed below the View search box. Double-click the document or highlight then click ‘Open’ to open a document to be indexed. The results default order is defined by the ILINX Administrator in the view search. The user can temporarily change the order by clicking on a search result column header and selecting the up/down arrow. The next time the search is run it will default back to the configured sort order.
Document Information Panel
The Document Information Panel is the screen that is opened below the toolbar. Starting from left to right, it contains a vertical toolbar with searching functionality and document page thumbnails, a center document display of the current document page, and on the right a panel of document index data. The document index data contains all batch-level index fields, divided between table index fields and non-table indexes. These data tables store extracted document table data described under “Import Table Window.” Grabbing and dragging the borders between each panel will allow for the user to customize the panel sizes to see the information as needed. Double-clicking on the borders allows for the closing of the panels which can be reopened by single-clicking on the “>” symbols once they are closed.
Vertical Tool Bar
> Thumbnails – Once a document is selected thumbnails of the pages will display in the vertical panel. Users may click on a page to select it. On the bottom of the panel, there is a selector button to change the size of the thumbnail page display.
> Areas of Interest – The next icon in the vertical toolbar is the Area of Interest tool. This utility pairs words and/or phrases or text patterns from the document with search criteria defined by the ILINX administrator. The feature marks where these phrases are in the document so that users may quickly navigate to these items within the document. Clicking on the selection in the Area of Interest panel will navigate the user to that area of the document. To ensure correct results with this tool, use the Format Converter IXM option to “Remove searchable PDF text and re-OCR.”
> Search – This function allows users to manually search within the document that has been selected. Users will enter text into the search box, then they may select if the case and/or entire word must match what has been entered in the search box. They may choose to search on the current page or all pages. Once the search criteria are entered, using the enter key or clicking the Search All button will run the search. Results will appear in the bottom portion of the vertical search menu. Find Next and Find Previous buttons will allow users to navigate through the results. Users may also click on the search results to navigate to that specific location in the document.
Document Viewer
This document viewer displays the document that has been selected and supports PDF documents only. Navigation in the document can be done by the thumbnails, AOI, or the search function on the vertical toolbar as well as by using the scroll bar in the display panel. This panel is where a user can rubber band select a table area on a page to be extracted into a selected index table.
> Rubber banding – To select data to be extracted, use the left mouse to click and then drag the red rubber band around the table. After the rubber banding is complete, the Table Import window will open. Note that a selection can only be made on one page at a time, i.e., it cannot span pages. It is also important to select an area slightly larger than the data the user wants to extract, i.e., slightly above/below and left/right of the table rows, columns, or borders the user wants to extract from. Selecting too little will exclude some table row or column data.
Table Import Window
This window appears after rubber-banding a table area in the document. The data that was highlighted for extraction will be processed according to the table index field that was selected in the Document Indexes Panel. Alternatively, you can use the “Target table” drop-down to select a different table which will cause the data to be processed again and formatted to that table.
> OCR Data – This displays the raw, original OCR data from the document table. The column drop-down selections in this table allow the user to specify what each column of data contains, which defines how the data will be parsed, split up, and formatted into the data columns of the Preview Table displayed below. As a column-type drop-down selection is made, the Preview Table will be automatically refreshed showing how the change affected the output data and if it has been parsed and formatted into the columns correctly. If a column is not needed, deselect it by selecting the blank row from the drop-down. The drop-down selections for column types and their related data-parsing logic are defined by the ILINX Administrator.
> Preview Table Index – This section displays a preview of the OCR Data after it has been processed and before it is inserted into the target table. The values in this preview can be edited on this screen or later in the Index Panel Target Table. If the data in this table was selected and is populating as desired, click OK and it will be added to the Target Table. If the data was not selected properly, or if there are any other issues and this data should not be indexed, click ‘Cancel.’ The user can then rubberband again to reselect and reopen this dialog. Data can be corrected in this section before clicking OK.
Document Indexes Panel
The document indexes panel contains all of the batch-level index fields, divided into non-table index fields at the top and all table index fields tabulated at the bottom. Administrators can customize these fields by editing the document’s Batch Profile from ILINX Capture. The bottom portion allows the user to navigate through each table of data that has been extracted from the document and will be available for export.
In the document header index window, the fields and tables are defined by the project and can be customized based on the needs of the customer. Possible document fields may include Document ID, Dates, Document types, or other fields based on the business use, these are set by the administrator.
In the document table index panel, the user will see each table that has been extracted from the document, these are visible as tabs. Clicking on the tab will display the extracted data items in rows. Corrections and changes can be made to the indexed data in different ways. For some of the fields, arrows may provide drop-downs and different selections. The drop-down selections are defined by the administrator. The other choice is to right-click on the row that needs to be changed which opens a menu.
Right Click Menu:
> Add Row Below will add a new row below the selected Row.
> Add Row Last will add a new row to the end of the table.
> Duplicate Below will duplicate the row selected and insert it below the selected row.
> Duplicate Last will duplicate the row selected and insert it at the end of the table.
> Delete will delete the highlighted row(s) after prompting the user to be certain it should be deleted.
> Clear Contents will clear the contents of the highlighted row(s).
> Update Field Values will open the Update indexes window allowing the user to check the columns that should be updated and then enter the data that should be applied to the highlighted row(s). Columns listed are determined by the table that is being worked on. Press Save & Close after the changes have been entered. Unchecked columns will not update.
> Move up will move the highlighted row(s) up in the table. Can also use the keyboard hotkey Ctrl + Up Arrow.
> Move down will move the highlighted row(s) down in the table. Can also use the keyboard hotkey Ctrl + Down Arrow
Important Note: Cannot Undo – Please note that anything done using these edit options cannot be undone.
> Multiple Rows – All the above actions can be performed with one or multiple rows (Add, Duplicate, Delete, Update, Move). Standard table selections can be made:
− Block of Rows – Select any row in the table, then while pressing the shift key, select the first row, then select the last row or left-click on the first row and drag the mouse down.
− Individual Rows – Select any row in the table, then press/hold the Ctrl key while selecting other rows.
Comments:
On the bottom of the window is the comments button. This allows users to add comments and notes as needed while processing the document. Comments cannot be modified after they have been added, and they will be visible to any other user who has access to the document.
Saving Your Progress:
> Save Your Work – While working on indexing a document, users should utilize the save button frequently to not lose their work. Newly extracted, added, or edited index data is not stored until the document is saved.
> Exporting Extracted Data – After extracting the relevant information into the document index panel, users may export the data into an Excel document or JSON file on their local machine.
> Submitting the Document – After completing all steps of the extraction process, you can save your work and submit the document to the next step in the workflow process by clicking the Submit button.
Troubleshooting
Document Viewing Messages
When opening a non-PDF document (xlsx, docx, etc.), it will prompt you to Save or Open it outside of CCEM into its native application. You cannot view or extract data from non-PDF documents within CCEM. You may still view and manually process the document outside of CCEM. You may also manually key in data into the document index panel within CCEM and submit/complete the document batch from CCEM. Your Administrator may have also configured an option for you to have the system convert the document to a PDF for extraction. This would most likely appear in your indexing field panel as a checkbox option labeled “Convert to PDF”. You would check this option and then submit the document. The workflow logic would route the document to be converted to a PDF with OCR text and then automatically return it to the work queue to be processed. After allowing a few minutes for the reprocessing to complete, the document should reappear in the search results.
This message indicates that the document was not correctly converted by the system to a PDF with OCR text data. Please note the document and contact your ILINX administrator to review it.
Other Errors
For any of the following please contact your ILINX administrator for review.
> Missing or incorrect Areas of Interest. Note that it may not be possible to consistently find certain types of data from all documents.
> Document table data not extracting into the OCR Data table correctly. Note that it may not be possible to extract data from all types of tables or documents, and document quality and consistency will also impact data extraction or additional extraction logic may need to be added.
> OCR Data not outputting correctly into the Preview table. Please review each OCR Data column and ensure that the correct data type drop-down is selected for the column. If a column is not needed, deselect it by selecting the blank row from the drop-down. If it is still not outputting correctly, the data may be inconsistent requiring manual processing or additional extraction logic may need to be added.
> Validation Errors during JSON export. Please note or screenshot the errors. Review the related index table data for any missing or incorrect values that need to be corrected before exporting.