Following my last post, I’ve started to look at the PDFs of Bauer’s 1783-5 German translation of Bar Hebraeus’ History of the Dynasties.
It must be said that the Fraktur print is not pleasant to deal with. But it could be very much worse! I’ve seen much worse. Here’s the version from Google Books:
And here is the same page from the MDZ library:
I’ve tried running both through Abbyy Finereader 15 Pro. Curiously the results are better, on the whole, from the higher resolution MDZ version. I had expected that the bleed-through from the reverse might cause problems – and it may yet! Even more oddly, the OCR on the “Plain Text” version of Google Books is better still.
But there is a problem with using Google Books in plain text mode. There is no way to start part way through the book. You will always be placed at the very start, and you can only navigate by clicking “Next page” or whatever it is. This is not good news if you have 100 pages to click through before you get to where you want to be.
The opening portion of these world chronicles is always a version of the biblical narrative about the creation, followed by material from the Old Testament, combined with apocryphal material. I may be alone here, but I have always found these parts of the narratives unreadable. When I translated Agapius, I started with the time of Jesus, part way through. I did the same with Eutychius. I only did the opening chapters at the end, after I had translated all the way from Jesus to the end of the book first. I recall that it felt like wading through glue. I might have given up, except that I had already invested so much time in the project.
Starting in the time of Jesus immediately introduces us to familiar figures. On page 88 of volume 1, the “Sixth Dynasty” starts, with Alexander the great. It ends on page 98 with Cleopatra. Each section starts with a familiar name, one of the Ptolemies in most cases.
On page 99, dynasty 7 begins, after an introduction, with Augustus. The dynasty ends on p.139 with Justinian. Each ruler gets a paragraph, often only a few sentences.
It’s all do-able, clearly. I’m not sure that I want to get into working on this book seriously, with the St Nicholas project still in mid-air. But it’s not hard work, which is something!
Hi Roger,
If you use the classic Google Books interface instead of the new one, the text appears in chunks of 5 pages, and there is a field at the top left that allows you to jump to any page. Or you can enter it directly in the url, e.g. to page 150:
https://books.google.co.uk/books?redir_esc=y&output=text&id=dBk-AAAAcAAJ&jtp=150
Or better still, you can download the book as an EPUB, which is basically all the same pages put together. Then you convert this to RTF or any other editable format. There are a lot of free tools for this, e.g. onlineconvertfree.com.
I’ve been using Google Books OCR a lot lately and I agree that the text tends to be better. I suspect the OCR is done on the original high quality color images, rather than the b/w 600 dpi images that Google uses for its PDFs.
Ah thank you! I had not spotted the empty page box at top left! Magic.
That’s a very interesting thought about the EPUB. I will give it a go!
Roger, have you used Rescribe? It is an OCR specifically trained on the sorts of typefaces you find in books from the early modern era. It’s free to use, I believe.
https://rescribe.xyz/
In terms of moving more easily within the uploads on google books, I’ll often download as an epub (the plain text is mich better, and is the same as appears within the reader) and then convert it to a pdf using a free epub->pdf converter, if need be.
I cannot say that I have. I must try sometime!