Amiga.org

Coffee House => Coffee House Boards => CH / General => Topic started by: motorollin on August 04, 2008, 02:27:58 PM

Title: OCR from image
Post by: motorollin on August 04, 2008, 02:27:58 PM
Does anyone know of any software which can do OCR from an image rather than a scan? I tried this one (http://www.verypdf.com/tif2pdf/tif2pdf.htm#Image-To-PDF-OCR) but it just crashes when I try to run it.

--
moto
Title: Re: OCR from image
Post by: bloodline on August 04, 2008, 02:42:20 PM
Quote

motorollin wrote:
Does anyone know of any software which can do OCR from an image rather than a scan? I tried this one (http://www.verypdf.com/tif2pdf/tif2pdf.htm#Image-To-PDF-OCR) but it just crashes when I try to run it.

--
moto


Email it here and I'll wap it though my Adobe OCR.
Title: Re: OCR from image
Post by: Oliver on August 04, 2008, 02:49:06 PM
My workmate has been doing a lot of data extraction from scanned images recently. I know you specified that it shouldn't be a scan, but I don't think there is much difference, is there? The software analyses the image data. The software he used was not part of a scanner suite, and just operated on the image files.

I'll ask my workmate tomorrow. He has had reasonable success with his OCR extraction. The software he used requires some training to recognise particular fonts correctly. He was doing text and numerical extraction from printed data books. Once the software was trained to know that a 'w' is not 'VV', etc, it worked reliably.
Title: Re: OCR from image
Post by: motorollin on August 04, 2008, 02:55:44 PM
@bloodline
Uploading to my web space. I'll email you a link. Thanks!

@Oliver
The problems is whether or not the software will allow you to select a file to OCR or just do it from a scan. Lets see what Adobe OCR comes up with.

Cheers guys!
Title: Re: OCR from image
Post by: motorollin on August 04, 2008, 02:58:43 PM
Hmm, I've got Acrobat Pro on my Mac and I have just found the "Recognise Text Using OCR" function. But it says my document is too low dpi. Ugh, I'll have to redo all the source images :-x
Title: Re: OCR from image
Post by: motorollin on August 04, 2008, 03:07:09 PM
Just re-created the PDF at a higher resolution and it allowed me to do the OCR, but what came out was garbage. Bloodline, is that what you were planning to do?
Title: Re: OCR from image
Post by: bloodline on August 04, 2008, 03:25:18 PM
Quote

motorollin wrote:
Just re-created the PDF at a higher resolution and it allowed me to do the OCR, but what came out was garbage. Bloodline, is that what you were planning to do?


Yeah, I guess... we use Adobe OCR here at work for various thing... I was just gonna throw it through that and see what comes out...
Title: Re: OCR from image
Post by: Oliver on August 04, 2008, 04:09:33 PM
Quote
motorollin wrote:
The problems is whether or not the software will allow you to select a file to OCR or just do it from a scan.


My workmate's software will operate on a file.

btw, does this have to be a freeware operation?
Title: Re: OCR from image
Post by: Oliver on August 05, 2008, 02:04:03 AM
Quote
From my workmate:
I found the best one to be:

ABBYY FineReader Professional v9.0

the second best is:

OmniPage Pro 15 Office (although there may be a more recent version than this by now)

I think the recognition algorithms are fairly comparable in accuracy but I found the ABBYY interface to be a bit more friendly.
Title: Re: OCR from image
Post by: motorollin on August 06, 2008, 09:34:50 PM
@bloodline
Have just sent you a link to the files. If you have time, perhaps you could see what your software makes of it. Thanks!

@Oliver
I'll have a look at those pieces of software. Thanks for checking with your mate.