Cut your paperwork down to size with OCR

Why it's time to give optical character recognition software a second look

ReadIris 12 Pro

Even with the most meticulous filing system and the willpower to store physical documents in their proper places, paperwork can prove a chore that consumes far too much time and space. It's time to go digital.

There are several benefits to doing so. As long as you have an effective back-up routine, your electronic documents will be much safer than their easily damaged paper counterparts, and you can make them available in more than one location: at home and at the office, for example.

Even when you're armed with nothing more than a budget scanner, it's still relatively quick and simple to digitise documents and store them on your computer's hard drive.

If you can get your hands on a competent optical character recognition (OCR) program, however, you can digitise your paperwork far more effectively, and with much more versatile results.

The OCR advantage

At its simplest level, scanning a page of a document gives you a pictorial image of the original. No matter how accurate this is, there are distinct limitations to this concept. A 20-page document would result in 20 separate files, each of which would have to be named and viewed independently. Looking for a piece of information in a large document would be both frustrating and tedious.

The main advantage of an OCR program is that the words within the document are analysed and 'read' by the software before being stored as editable text.

Not only can you save multipage documents as single files, but you can also search for specific words or strings of text within the whole document just by using the simple search tools that are available in any word processor.

Going one step further, applying OCR technology to scanned paperwork gives you absolute freedom to edit and change text if you want to. The alternative would be to waste an inordinate amount of time retyping the whole document from scratch.

Digital to digital

The idea of being able to convert documents into editable files goes beyond scanning physical paper copies. The latest OCR programs make it just as easy to convert uneditable digital documents from file formats like PDF, TIF and JPEG, so you can simply open the file from within the OCR program and turn passages of text into, say, Microsoft Word or RTF files.

To maintain the integrity of the originals, good OCR programs attempt to read source documents 'intelligently', reproducing the precise placement of pictures, columns of text, captions and other page elements like footers and numbering. Text attributes can also be retained, like the font type and size.

Even with the best will in the world, though, it's easy for complex page layouts to lose a little in the translation, so a degree of manual correction is often needed before saving the document. The final consideration is the accuracy with which the OCR program can 'read' the page. To some extent, this will depend on the quality of the scanner that you use.

Even so, the programs themselves have been steadily improving, to the point where top OCR software developers like Abbyy and Iris are claiming accuracy enhancements of around 30 per cent in their latest programs compared with previous editions. Potentially, this means much less time correcting mistakes, and the best opportunity ever to cut your paperwork down to size.

-------------------------------------------------------------------------------------------------------

First published in PC Plus Issue 282

Liked this? Then check out 10 essential tips for recovering lost files

Sign up for TechRadar's free Weird Week in Tech newsletter
Get the oddest tech stories of the week, plus the most popular news and reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register

Follow TechRadar on Twitter