OCR converts printed media into digital form using a scanner. The important part of it, however, is that it does not produce an image file but a editable text file. This means that sections of text can changed after they have been printed or old books can be stored in a much smaller space.

OCR is a post processing step after a document is scanned. It starts of with a image and will convert it to a text file. It will look through the image looking for anything which it can recognise as a letter or number. It works by breaking down letters and symbols into their component parts and search for those within the document. The exact process is outside the scope of this book.

It is very easy for a human to do this task but very complicated for a computer to do it. This is why the computer will sometimes get it wrong. That means that person using OCR must proofread the result before it is used just in case there are some mistakes.