BeyondRecognition™ Image Based Document Classification

BeyondRecognition™ technology performs these primary functions:

Visual classification 

BeyondRecognition (“BR”) automatically classifies or clusters documents based on visual similarity. Documents that are visually similar to the human eye are clustered together, whether they had been stored as native files (e.g. Word, Excel, or PDF) or scanned paper copies. This step is independent of text recognition or conversion of the underlying documents. Visual classifications can be used to route documents for subsequent processing or to create classification-specific document coding or data extraction rules. Visual classifications permit project managers to be fully aware of the types of documents contained in a collection at the very beginning of a project.

Visual data extraction

With BR, extracting specific data elements from the classifications or clusters can be as easy as drawing a box around the BR representation of a document. Without this type of fielded data, users cannot perform field-limited searching, sort output in multiple ways, create formatted reports, or perform mathematical functions on specified data. With BR, end users are able to create their own data extraction rules, they are not held captive by third party solutions engineers.

Text conversion

Using glyph cataloging, BR is able to provide textual representations of documents that have initial word and character accuracy levels far exceeding legacy optical character recognition systems. The glyph cataloging system enables single-instance, cascading, persistent error correction that enables even higher levels of word and character accuracy and makes brute force linear editing a thing of the past. Single keystrokes can correct ALL instances of a given glyph cluster, sometimes affecting literally millions of characters in hundreds of thousands or millions of documents. What could be better than faster and more accurate?


Individual forms get automatically placed in the same clusters where BR can identify the unchanging part of the cluster (the form), and the parts that change from form to form (the data). Without human intervention, BR will be able to create tags based on the form text that appears to the left of, or above, data elements and extract the form data to XML using those tags. Users will be able to modify the tags or map them to a master XML schema.

Additional functionality

BR also provides additional functionality that, depending on the project or process, may be vitally important:

  1. Automated Logical Document Boundary Determinations (“LDBD”) — As a byproduct of BR’s ability to visually classify documents, BR is able to determine where document boundaries should be for documents that were scanned without logical document boundaries, i.e., without beginning and ending document tags. By freeing scanner operators from having to insert slip sheets, BR’s LDBD functionality speeds scanning by minimizing prep time and avoiding the need to feed slip sheets through the scanner.
  2. Automatic cascading mass redaction — BR catalogs the coordinates of every glyph, and those coordinates enable BR to provide highly accurate redactions of words or patterns.
  3. Glyph search — BR can locate all instances of a given glyph, whether or not it has a textual representation or not.


Click here for the in-depth report, "Using Image Based Document Classification Technology"

Click here for more discussion on BeyondRecognition Technology