I’ve been scanning some stuff that I can’t really discuss in the evenings this week, but have been very, very impressed with the character recognition quality of Abby Finereader 9. It is very nearly perfect, and such an improvement on previous OCR software.
The only thing that I wish it could handle is English translation with embedded accented characters — strange names like `Abu and words like šeikh (=sheikh).
Adobe’s built in OCR engine gets a lot of (most of?) the accented characters. At least, you can do a search without diacritics and find the words. Spotlight (my external brain) can also find the accented characters within the PDF!
It has the added advantage that it reduces one’s scans in size. I think it is actually just running PDF optimizer on them.
The OCR engine seems to have major memory leaks in it, however. And this makes Acrobat 8 and 9 for Mac quite prone to crashing and burning — which really sucks when one is nearly done with a 40 minute OCR job, speaking from this morning’s personal experience.
Perhaps I need to have a play with this. I’m doing a fair bit with accented stuff at the moment.
Mind you, I was deeply impressed again with the sheer quality of Abby FR9 again last night. I started editing a text that I had scanned with FR8, and it was weary stuff. It didn’t *look* much if at all inferior, but it was. I stopped, rescanned with FR9, and suddenly it was so much less work.