My workmate has been doing a lot of data extraction from scanned images recently. I know you specified that it shouldn't be a scan, but I don't think there is much difference, is there? The software analyses the image data. The software he used was not part of a scanner suite, and just operated on the image files.
I'll ask my workmate tomorrow. He has had reasonable success with his OCR extraction. The software he used requires some training to recognise particular fonts correctly. He was doing text and numerical extraction from printed data books. Once the software was trained to know that a 'w' is not 'VV', etc, it worked reliably.