I’ve been working on placing Theodoret’s commentary on Romans on the web for a while. I OCR’d it in Abbyy Finereader 11, and I finished proofing the OCR in Finereader before Easter.
Today I tried exporting the text to HTML. It has rather a lot of italics in it, so imagine my fury when I discovered that exporting “formatted” text had lost all the italics! A bit of experimentation revealed that the same happened when saving “formatted” text as .RTF. Only saving “exact text” retained the italics. And you don’t want all the crud that comes with that.
I imagine that it’s just a bug; but it is a frustrating one. I really do not want to reitalicise some 100 pages.
Another annoyance was that Finereader now attempts to work out where footnotes are involved, and create its own numeration. In Word this is fine, as inserting and renumbering footnotes is trivial. In HTML, however, it simply creates work that has to be undone.
Finereader does excellent OCR. But I wish they would spend some time getting the product user-tested, really I do.
After some years of experiencing what you are going through, I have stopped upgrading software that I am satisfied with. Otherwise, we end up being testers for releases that have not been adequately regression tested. Who needs it?
Exactly.
For this exercise I have been using Finereader 11 for the first time … and I wish I had not. I wish I had stuck with FR10. Yet one can’t avoid upgrades, because the quality of optical character recognition does improve (unlike the user interface and ancillary functions, which merely change, at best).
I have also been using FrontPage 2003, which I installed so that I could handle Ibn Abi Usaibia and exotic characters. And I find … that in an important respect, I have to do more work. Not a deal-breaker, but annoying. Again, I had to upgrade in order to deal with unicode.
I haven’t yet worked out how to get my text out of FR11 without having to reapply all the formatting, or being stuck with far too much formatting. Give me time…
Calibre seems to have a lot of different conversions that work well. I know it’s primarily for ebooks, but they’ve got a lot of formats available both for input and output, so you might take a look.
Thanks!