Welcome, Guest. Please login or register.

Author Topic: OCR from image  (Read 2637 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline motorollinTopic starter

  • Hero Member
  • *****
  • Join Date: Nov 2005
  • Posts: 8669
    • Show only replies by motorollin
OCR from image
« on: August 04, 2008, 02:27:58 PM »
Does anyone know of any software which can do OCR from an image rather than a scan? I tried this one but it just crashes when I try to run it.

--
moto
Code: [Select]
10  IT\'S THE FINAL COUNTDOWN
20  FOR C = 1 TO 2
30     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NAAAA
40     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NA-NA-NAAAAA
50  NEXT C
60  NA-NA-NAAAA
70  NA-NA NA-NA-NA-NA-NAAAA NAAA-NAAAAAAAAAAA
80  GOTO 10
 

Offline bloodline

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: OCR from image
« Reply #1 on: August 04, 2008, 02:42:20 PM »
Quote

motorollin wrote:
Does anyone know of any software which can do OCR from an image rather than a scan? I tried this one but it just crashes when I try to run it.

--
moto


Email it here and I'll wap it though my Adobe OCR.

Offline Oliver

  • Hero Member
  • *****
  • Join Date: Sep 2005
  • Posts: 803
    • Show only replies by Oliver
Re: OCR from image
« Reply #2 on: August 04, 2008, 02:49:06 PM »
My workmate has been doing a lot of data extraction from scanned images recently. I know you specified that it shouldn't be a scan, but I don't think there is much difference, is there? The software analyses the image data. The software he used was not part of a scanner suite, and just operated on the image files.

I'll ask my workmate tomorrow. He has had reasonable success with his OCR extraction. The software he used requires some training to recognise particular fonts correctly. He was doing text and numerical extraction from printed data books. Once the software was trained to know that a 'w' is not 'VV', etc, it worked reliably.
Good good study, day day up!
 

Offline motorollinTopic starter

  • Hero Member
  • *****
  • Join Date: Nov 2005
  • Posts: 8669
    • Show only replies by motorollin
Re: OCR from image
« Reply #3 on: August 04, 2008, 02:55:44 PM »
@bloodline
Uploading to my web space. I'll email you a link. Thanks!

@Oliver
The problems is whether or not the software will allow you to select a file to OCR or just do it from a scan. Lets see what Adobe OCR comes up with.

Cheers guys!
Code: [Select]
10  IT\'S THE FINAL COUNTDOWN
20  FOR C = 1 TO 2
30     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NAAAA
40     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NA-NA-NAAAAA
50  NEXT C
60  NA-NA-NAAAA
70  NA-NA NA-NA-NA-NA-NAAAA NAAA-NAAAAAAAAAAA
80  GOTO 10
 

Offline motorollinTopic starter

  • Hero Member
  • *****
  • Join Date: Nov 2005
  • Posts: 8669
    • Show only replies by motorollin
Re: OCR from image
« Reply #4 on: August 04, 2008, 02:58:43 PM »
Hmm, I've got Acrobat Pro on my Mac and I have just found the "Recognise Text Using OCR" function. But it says my document is too low dpi. Ugh, I'll have to redo all the source images :-x
Code: [Select]
10  IT\'S THE FINAL COUNTDOWN
20  FOR C = 1 TO 2
30     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NAAAA
40     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NA-NA-NAAAAA
50  NEXT C
60  NA-NA-NAAAA
70  NA-NA NA-NA-NA-NA-NAAAA NAAA-NAAAAAAAAAAA
80  GOTO 10
 

Offline motorollinTopic starter

  • Hero Member
  • *****
  • Join Date: Nov 2005
  • Posts: 8669
    • Show only replies by motorollin
Re: OCR from image
« Reply #5 on: August 04, 2008, 03:07:09 PM »
Just re-created the PDF at a higher resolution and it allowed me to do the OCR, but what came out was garbage. Bloodline, is that what you were planning to do?
Code: [Select]
10  IT\'S THE FINAL COUNTDOWN
20  FOR C = 1 TO 2
30     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NAAAA
40     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NA-NA-NAAAAA
50  NEXT C
60  NA-NA-NAAAA
70  NA-NA NA-NA-NA-NA-NAAAA NAAA-NAAAAAAAAAAA
80  GOTO 10
 

Offline bloodline

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: OCR from image
« Reply #6 on: August 04, 2008, 03:25:18 PM »
Quote

motorollin wrote:
Just re-created the PDF at a higher resolution and it allowed me to do the OCR, but what came out was garbage. Bloodline, is that what you were planning to do?


Yeah, I guess... we use Adobe OCR here at work for various thing... I was just gonna throw it through that and see what comes out...

Offline Oliver

  • Hero Member
  • *****
  • Join Date: Sep 2005
  • Posts: 803
    • Show only replies by Oliver
Re: OCR from image
« Reply #7 on: August 04, 2008, 04:09:33 PM »
Quote
motorollin wrote:
The problems is whether or not the software will allow you to select a file to OCR or just do it from a scan.


My workmate's software will operate on a file.

btw, does this have to be a freeware operation?
Good good study, day day up!
 

Offline Oliver

  • Hero Member
  • *****
  • Join Date: Sep 2005
  • Posts: 803
    • Show only replies by Oliver
Re: OCR from image
« Reply #8 on: August 05, 2008, 02:04:03 AM »
Quote
From my workmate:
I found the best one to be:

ABBYY FineReader Professional v9.0

the second best is:

OmniPage Pro 15 Office (although there may be a more recent version than this by now)

I think the recognition algorithms are fairly comparable in accuracy but I found the ABBYY interface to be a bit more friendly.
Good good study, day day up!
 

Offline motorollinTopic starter

  • Hero Member
  • *****
  • Join Date: Nov 2005
  • Posts: 8669
    • Show only replies by motorollin
Re: OCR from image
« Reply #9 on: August 06, 2008, 09:34:50 PM »
@bloodline
Have just sent you a link to the files. If you have time, perhaps you could see what your software makes of it. Thanks!

@Oliver
I'll have a look at those pieces of software. Thanks for checking with your mate.
Code: [Select]
10  IT\'S THE FINAL COUNTDOWN
20  FOR C = 1 TO 2
30     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NAAAA
40     DA-NA-NAAAA-NAAAA DA-NA-NA-NA-NA-NA-NAAAAA
50  NEXT C
60  NA-NA-NAAAA
70  NA-NA NA-NA-NA-NA-NAAAA NAAA-NAAAAAAAAAAA
80  GOTO 10