OCR and Document Scanning

We get lots of requests for OCR capabilities from our clients and prospects. Quite often, people are not quite sure what OCR is and why they need it. The ‘why’ is the first question we need to address if we are going to match you up with the best solution for your needs.

OCR, or optical character recognition, is the process of transforming printed text into electronic data that can be used in a computer system. It’s a processor intensive process by which a computer program scans every pixel of an image trying to decipher patterns that it recognizes as letters or groups of letters.

OCR technology can do amazing things. It can convert books into e-books, pull only the invoice numbers out of a stack of paper invoices, and convert a 100 year old contract into an editable computer file. For the most part, our clients use OCR for two main functions:

  • Searchable PDFs - converting a scan into a searchable document
  • Convert to Word - converting a scan into an editable document

A PDF that has been OCR’d to searchable can be more powerful than a standard scan. The free version of Adobe Acrobat can find every instance of a name in a 1000 page PDF in seconds, saving hours of flipping through pages. It can also find every PDF on a hard drive that contains the same name or find the one invoice that contains a particular invoice number amongst thousands of PDFs all without opening any of them. The time savings can be astounding!

The Convert to Word function can be a time saver as well. It's used to eliminate time spent retyping documents that only exist on paper.

