Experiments in Arabic OCR

A correspondent has suggested to me the possibility of using Optical Character Recognition (OCR) software to read a portion of al-Makin that was published in the Bibliotheque d’etudes orientale 15, back in the 1950’s.  I admit that I was dubious, but I’ve spent a little time this evening looking into the matter.

I believe that Adobe Acrobat Pro XI may have a facility to OCR text in Arabic.  Certainly Acrobat Pro 9 does not; at least, my copy doesn’t seem to.  There is discussion at the Adobe forums here.

One product mentioned there was something called Novoverus.  This is supposedly used by the US government.  It comes as no surprise, therefore, that the company website omits any prices and will only deal with customers personally.  However I did find a site offering it for sale, here, at a cool $1,299!

Fortunately the Adobe forum notified that Abbyy Finereader Pro 11 supports Arabic OCR.  This I have.  The user interface to this version of FR is buggy. It caused me endless grief while scanning Theodoret’s commentary on Romans.  So I have mostly used an older version.

I’ve installed FR11 (version 10 is not good enough) and it does indeed have an Arabic option: “Arabic (Saudi Arabia)”.

I tried OCR’ing the text on a page of Erpenius.  I didn’t think the results were that great; but then it wasn’t a fair test on a 1625 font!  So I tried again on Cahen’s text.  The result is as follows:

fr11_arabic

I don’t think that seems particularly impressive; but perhaps those who can actually read Arabic might comment.

Share