Getting manuscript reproductions in the UK – important and useful court judgement?

Via Dr Bendor Grosvenor on Twitter, I learn of an interesting court case about “image fees”.  According to Dr. G, this is very good news for manuscript researchers, and historians in general, and also for those who want to download and post online images of out-of-copyright material.  Here’s his thread:

Those of us who’ve had to pay image fees will know the system relies on museums claiming copyright in their photos – irrespective of whether the art they’re photographing is itself in copyright. (In the UK, copyright lasts for 70 years after the death of the artist).  In other words, a painting by John Constable may be long out of copyright, but taking a photo of it creates a new copyright in that photo. By restricting the taking or sharing of other photos, museums force us to use their own photos for publication, and thus charge large sums.

Copyright is the glue which holds the system together, otherwise, we’d be able to either take a photo from the museum’s website, or use a photo someone else has already paid for. The ‘copyright licence’ we buy prevents us from sharing the image for wider re-use.

In the UK, this copyright claim has for long been contentious. For example, under the 2019 EU Copyright Directive (Article 14), it is not possible to claim copyright in a straightforward reproduction of a work of art which is itself out of copyright (older than 70 years).  The relevant bit of Art. 14: “when the term of protection of a work of visual art has expired, any material resulting from an act of reproduction of that work is not subject to copyright or related rights unless the material resulting from that act of reproduction is original in the sense that it is the author’s own intellectual creation.”

In other words, take a straightforward photo of the Constable painting = no new copyright in your photo. But pose something in front of it, add an extra cow in Photoshop = new copyright in your photo.

For many of us, that EU Directive looked like the end to image fees in the UK – but Brexit happened just before ratification was required in member states.

In the UK, museums and image libraries relied on the UK’s Copyright, Designs and Patents Act 1988, which appeared to give copyright to your photo of the Constable simply because of the effort you took in taking it. This was called the ‘sweat of the brow’ concept.  In other words, you did not need to demonstrate any creative effort, or add any personal touch, to claim your copyright. BUT, since 1988, various EU and UK judgements have eroded the ‘sweat of the brow’ concept.

But the situation was still not entirely clear, until now. In an Appeal Court judgement this November (THJ v Sheridan [2023] EWCA Civ 1354). Here’s the full judgement.

Click to access ewca_civ_2023_1354.pdf

(And here (to which I am indebted) is Prof. Eleonora Rosati @eLAWnora  commentary on the judgement.)

https://ipkitten.blogspot.com/2023/11/originality-in-copyright-law-objective.html

Para 16 rules that, for copyright to pertain: ‘What is required is that the author was able to express their creative abilities in the production of the work by making free and creative choices so as to stamp the work created with their personal touch.”

So, taking a straightforward photo does not count, nor does getting the lighting right or other labour of a ‘technical’ kind.

What does this mean for the image fee system which strangles so much art historical scholarship, prevents the public learning about the art they own, and acts as a tax on knowledge? In the UK, it means it’s over.  In fact, because in THJ v Sheridan, the judges said the ‘skill and labour’ test has not been valid *since 2004*, it suggests that all those ‘image licences’ which have been sold relying on copyright have been invalid, and (I suspect?) mis-sold.

Those of us who’ve been campaigning against image fees have been arguing (with hard evidence) that the system doesn’t raise meaningful revenue for museums (and in many cases, costs them money).  But to little avail, as far as museums are concerned. They just carried on charging, insisting they had copyright, which encouraged publishers to insist we kept buying ‘licences’. And now we know that for historic, 2D artworks it’s basically been a scam.

What do we do now? I suppose museums can carry on restricting the availability of decent photos. That’s why Tate’s website only lets us see low-res photos (of the art we own).  But without the glue of copyright, the system must collapse, because there’s nothing to stop images being re-used.  So, if you’re able to take a tolerably good photo of a historic artwork from online for your publication, do so.  Don’t let publishers and journals bully you into buying ‘licences’. Don’t agree to label photos (C) when no copyright exists.  And if you’re a museum director or trustee, think hard about your museum mis-selling licences for the last two decades.

Note that this is clearly downstream of the EU ruling.  This now leaves the USA behind, at least until some public-spirited person clarifies the law there.

The actual court case was about whether a GUI could be copyrighted, so it isn’t really the same thing.  But the case is about “originality” in copyright, and this is what lies behind the claim of museums that a photograph is an “original work” and therefore in copyright. There is discussion of the case on these sites:

UK Court of Appeal rules on copyright in GUIs

Originality in copyright – a review of THJ v Sheridan

Let us hope that the judgement does indeed mean what Dr G. says that it does, and frees up public domain material for the use of us all.  I suspect the foot-dragging will be immense, tho.

Share

More experiments with Amharic and technology

In my last post I found that it was possible to turn a PDF full of images of Amharic text into recognised electronic text using Google Drive, and then get some translation of the results into English using Google Translate.

There were some extremely interesting comments made on the post, which I have been reading.  I have also prepared a PDF of the whole text of the Life of Garima by Yohannes, and run that through the Google Drive process.

Where we started was in trying to read a passage of this text, in which – supposedly – God stopped the sun so that St Garima could copy the bible in one day.  The summary of the work  given by Rossini (instead of a proper translation, drat him), indicates that this was on lines 356-60 of his text, which turns out to be the last line of p.161 and the first three of p.162.  Here they are:

The output from the OCR is good, but you still have to compare the characters carefully.  Errors can often be picked up just by dumping the raw scan output into Google Translate, which shows things like numerals.

Here we have a character that is plainly wrong, and coming out as a numeral “4”.  It looks like an “o” with a hat and two dots under.  The two dots under are legs in another copy of Rossini.

I’m guessing that it’s a “ge” character, from looking at the Wikipedia article, but I can’t be sure. The script isn’t an alphabet, but a syllabary, based on syllables.  Each character is a consonant followed by a  vowel, which makes for a lot more characters.  There’s a table of the characters on the Wikipedia article, consonants down the left, vowels across the top.  I’ve not really looked at this.

The Google translate output is also interesting because of the choice of “detected language” – Tigrayan, rather than Amharic.  If you force it to Amharic, you get a lot less meaning.

One awkward part of using Google Drive to do the OCR is that it doesn’t preserve the line breaks.  That makes comparing the lines more awkward.   So you have to manually do this:

፬ ፡ ወኮነ ፡ በአሐቲ ፡ ዕላት ፡ ወነሥአ ፡ መጽሐፈ ፡ ወቀለመ ፡ ወወጠነ፡
ይጽሐፍ ። ወተንሥአ ፡ ለጸሎት በሰርክ ። ወጸሐፉ ፡ ሎቱ : መላእክት ፡ ወንጌ ለ ፡
በ፬ ፡ ሰዓት ፡ ወትርጓሜሁ ። ወመላእክተ ፡ እግዚአብሔር ፡ ወትረ ፡ ይት ለአክዎ ፡
ወእግዚእነሂ ፡ ክርስቶስ ፡ ያንሶሱ ፡ ምስሌሁ ። ወተሰምዐ ፡ ዜናሁ :
ውስተ ፡ ኵሉ ፡ ሀገር ። ጸሎቱ ፡ ወበረከቱ ፡ የሀሉ ፡ ምስሌነ ።

The Wikipedia article mentioned earlier gave me a list of punctuation marks.  There are two sorts of punctuation visible in here.  The colon mark is actually word division, which means that some words above go over two lines.  I’ve chosen not to split words above.  The double colon mark “::” is the full stop.  Interestingly Google Translate gives different results if you remove the spaces!

Going through the electronic text, removing spaces, I notice that sometimes the word-separator isn’t detected by the OCR.  So I added that in.  Sometimes it put a Roman colon instead, so I replaced that.  Finally I split on sentence:

፬፡ወኮነ፡በአሐቲ፡ዕላት፡ወነሥአ፡መጽሐፈ፡ወቀለመ፡ወወጠነ፡ይጽሐፍ።
ወተንሥአ፡ለጸሎት፡በሰርክ።
ወጸሐፉ፡ሎቱ፡መላእክት፡ወንጌ ለ፡በ፬፡ሰዓት፡ወትርጓሜሁ።
ወመላእክተ፡እግዚአብሔር፡ወትረ፡ይትለአክዎ፡ወእግዚእነሂ፡ክርስቶስ፡ያንሶሱ፡ምስሌሁ።
ወተሰምዐ፡ዜናሁ፡ውስተ፡ኵሉ፡ሀገር።
ጸሎቱ፡ወበረከቱ፡የሀሉ፡ምስሌነ።

And run it again and I get this:

But this still is not good enough to do much with.  If we didn’t have an idea what the text said, this would not tell us.

All this fiddling about would certainly get to into contact with the language, and start you on a journey to learning it.  But it’s not good enough a translation for other purposes, although intriguing.

One suggestion that was made in the comments to the last article was that ChatGPT gave better results.  The output quoted was indeed produced, and was very smooth and seemed to be a series of liturgical prayers.  But… I don’t think that this is actually the content.  These AI tools are really only an improved version of the text prediction tools you get on messaging on a mobile phone.  So it was pumping out garbage.

Anyway I tried it on this passage, and it crashed GPT very effectively!  At the moment I can’t get any reply of any sort, not even to “hello”.

I don’t think that I will do more here.  Clearly the technology is almost, but not quite good enough to be useful.

Share

Is it possible to read editions of Amharic texts? An experiment

In my last post I mentioned how the Life of St Garima in Ethiopian was printed by Rossini, but without a translation.  In fact it has never been translated into any modern language, to my knowledge.  I don’t know any Ethiopian, and I doubt that I ever will.

But we live in an age of wonders, when it comes to unfamiliar languages.

So… is it possible to work with Ethiopian language editions, even if you know no Ethiopian?  What about Google Translate?  Ethiopian is in this heavy unfamiliar script.  Is there OCR for this?  If you can scan Rossini’s edition, can you pop it into Google Translate and get the English?

There are two sorts of Ethiopian out there, I know.  There is Ge`ez, or classical Ethiopian; and there is Amharic, the modern dialect.  Rossini printed his text from a 19th century manuscript.  So it seems likely that this is in Amharic.

A quick Google confirmed; Google Translate knows Amharic!  A bit of googling found me an Amharic news website online, here.  I’m using Chrome, so all I had to do was right-click anywhere and select “Translate to English” and the whole website was rendered into some sort of English.  And… it worked!!  Yay me!  It’s obviously not 100%, but it’s way better than 0%!

So what about OCR?  I was sad to see that Abbyy Finereader apparently doesn’t support Amharic.  That’s a blow.  It was developed originally to handle Cyrillic, so it certainly has the capability.  But it’s not offered.  Drat.

A bit of googling brought me to a dubious-looking website here, claiming to offer a selection of tools which could do Amharic OCR.  The prose felt a bit machine-generated, so I worried that it was bunk, or worse, a malicious site.  But the first option was… Google Drive.

I never knew this, but seems that, if you upload a PDF containing an image of text, and then open it in Drive as a Google Docs document, it OCR’s the content.

Well, I thought, let’s give it a try.  So I extracted the first page of Rossini’s edition, using Adobe Acrobat Pro 9 – no flashy latest-edition stuff going on here!  Here’s a pic:

Then I uploaded it, and opened as a Google document.  And … it just treated the Amharic as an image.  Dang!  But I noticed that it did indeed OCR the Italian at the top of the page!

This is supposed to work.  So I thought maybe I should work over the image a bit.  I imported the one-page PDF into Abbyy Finereader 15, and chopped off the Italian at the top, and the critical apparatus at the bottom.  I then used the image editor in Finereader to “whiten the background”.  This can be flaky, but this time it worked fine, and I got a pure white background.   And I got this:

(I’ve just seen the marginal notes, which I need to chop off as well, so I’ll have to go round the loop again)

I exported the image as a PNG, and I used Acrobat again to create a PDF from the image.  Then I uploaded the new PDF to Google Drive, and opened it as a Google Docs document.  And… it worked!  Sort of…

በስመ : አብ : ወወልድ ‘ ወመንፈስ ፡ ቅዱስ ፡ ፩ ፡ አምላከ ፡ ላዕሌሁ ፡ ተወ ከልኩ፡ ወቦቱ ፡ አመንኩ ፡ እስከ ፡ ላዓለመ ፡ ዓለም ፡ አሜን ።

ድርሳን ፡ ዘደረሰ ፡ ቅዱስ ፡ ዮሐንስ ፡ ኤጲስ ፡ ቆጶስ ፡ ዘአክሱም o ፡ በእንተ ዕበዩ ፡ ወክብሩ ፡ ለቅዱስ ፡ ይስሓቅ = ወይቤ ፤ ስምዑ ‘ ወልብዉ ፡ ኦአኀውየ 5 ፍቁራንየ ፡ ዘእነግረከሙ ። ርኢኩ ፡ ብእሲተ ፡ እንዘ ፡ ይዘብጥዋ ፡ ዕራቃ ወእንዘ ፡ ይሀርፉ ፡ ላዕሌሃ ፡ ወላዕለ ፡ እግዝእትነ ፡ ማርያም ፡ እንዘ ፡ ይብሉ በእንተ ፡ ወልዳ ፡ ክርስቶስ ፤ እምብእሲት ፡ ኪያሁ : ኢተወልደ ፣ ይብሉ ፡ እላ ፡ ኢየአምኑ ፡ በክርስቶስ = ወኮንኩ ፡ እንዘ ፡ እረውጽ ፡ ወአኀዝኩ እስዐም ፡ ታሕተ ፡ እገሪሃ ፡ ለይእቲ ፡ ብእሲት ፡ እንዘ ፡ ትብል ፤ እወ ▪በዝ ፡ አንቀጽ ፡ ወፅአ ፡ ንጉሠ ፡ ሰማያት ፡ ወምድር ። ወሶበ ፡ ትብል፡ ከሙዝ ፡ ወ

That’s… rather astonishing.  No idea what all that is, but it looks sort of right.  Let’s bear in mind that Rossini printed his edition in 1897.  This is not a modern typeface.  So this is rather good.

Next step was to paste it into Google Translate.  It set it to auto-detect the language, and pasted in the first bit.  And… it worked.  In fact it gave a really useful transcription into Roman letters as well, which makes it a LOT easier to manipulate the text.

OK, I’m cheating slightly.  The first time I uploaded, the translation ended at “Spirit”.  But this is a Google Translate bug – it sometimes omits the remainder of a sentence.  If you split the text with a line feed, you often get the rest.  And that’s what I did.  I worked out by experiment where I needed to be, and then I got the above.

I don’t quite believe the translation of the second sentence either.  I suspect I need to play with this a bit to work out what each word is.

I notice all those colons between every word.  It might help if I actually looked up the script online!

But I think you’ll agree that this is quite marvellous – I, who know absolutely nothing about the language, am getting something useful out!

Magic!

Share

Why we should use Latin spellings of Greek names

A twitter thread by @EzhmaarSul from June 11, 2023, made some interesting points about the use in English of spellings like “Nikaia” rather than “Nicaea”. Few will have seen it, and I’ve never seen another public discussion of the subject.  So let’s give it a bit more visibility.

It went as follows:

Something I really hate about modern amateur historians (and which will leak into the professional class as these amateurs achieve doctorates) is the mixing of Greek and Latin spellings of Greek names. I’ve fallen prey to the same because of the ubiquity of amateur historians.

It starts with people wanting to use phonetic spellings of closer to the original Greek.

This urge comes from a deranged, nerdish desire to “well actually” people through text. Not as malicious as BCE, but coming from an adjacent place in petty souls.

“It’s Nikaia, not Nicaea!”

Not only does this look ugly and wrong in modern English, which is based on Latin rules of spelling and grammar, but it betrays a certain philistinism.

Greeks don’t use our alphabet! You’re broadcasting to us, “I don’t know how to pronounce this unless I spell it wrong.”

This also screws up scholarship. We have centuries of scholarship referring to Alexius Comnenus and John Palaeologus. Then along comes some redditor-turned-PHD writing about “Alexius Komnenos” and “Ioannes Palaiologos.”

And they invariably f**k it up.

“Oh Theodoros is obviously Theodore, so I’ll call him Theodore Laskaris in my paper… but Ioannes is exotic! I’ll call him Ioannes even though everyone recognizes it’s the Greek version of ‘John.’”

Don’t get me started on “Constantine.”

Just stick with the Latin and Anglophone spellings, you buffoons.

I think the author has a point. It does look hideous.  It does create a barrier.  It makes Greek history look barbarous.

There is a definite tendency among elites to create barriers for others in order to advance themselves, to order others around while feeling smug.  How else did we end up with printed Latin texts where the useful modern separation of consonant and vowel, of “i”/”j” and “u”/”v”, was actually and deliberately abandoned?  So… I rather agree.

Share

Archivo Pertzii??

Here’s a reference guaranteed to waste the time of a researcher.  It’s from the Bibliotheca Hagiographica Latina:

This is some miracle material associated with the abbey of Brauweiler.  But… what is “Archivo Pertzii”?? I did find out, but it was enough work that I thought I’d put up a blog post, in case I forget and need to google it again.

At the moment, a google search points you right back to the BHL, seemingly the only publication in all history to know of this source.

The italics on Archivo are the key; clearly it’s the abbreviated title of a journal.  I know that German publications often referred to the editor of a journal, especially if he was someone famous, so “Pertzii” is probably the editor.  But you may search for “Archivo” as long as you like.  It’s bad enough with Google.  Imagine the bafflement of a 20th century researcher without it!

I tried “Pertzius”, and kept reading results, and this gave me what I needed.  Apparently this is Georg Heinrich Pertz, whoever he might have been.

I found that he edited the last volume, volume 12, of the “Archiv der Gesellschaft für Ältere Deutsche Geschichtkunde zur Beförderung einer Gesammtausgabe der Quellenschriften deutscher Geschichten des Mittelalters”.  So the “Archivo” is just a Latin ablative of the real name “Archiv”.  The BHL, in translating it into Latin, managed to obscure the sense completely.

Just to make it better, Pertz only edited some volumes.  It wasn’t his “Archiv” anyway.

The journal is online at Digizeitschriften.de, which has a useful page for the whole serial here, but a rather awkward interface to download any of it.  I ended up downloading the 9 pages individually and combining them locally. Then I found there was a button for “PDFs for individual items”, which I struggled with and finally got the same chunk in one file.  Why you can’t just download the volume I can’t imagine.  But I think this is teething troubles.  The site otherwise seemed well organised.

There is another copy online, at digitale-sammlungen.de, here.  But I only found this after more effort.

Share

A new use for the parallel Latin translations in the Patrologia Graeca

Now that we have a very effective Latin translation in Google translate, it occurs to me that we can also use this to read a great deal of patristic Greek.  For as we all know, the Greek fathers were all translated into Latin at the renaissance and after, and were nearly always printed with parallel Latin translation, right the way down to the 19th century.

The obvious example of this is Migne’s Patrologia Graeca, our standard reference collection of texts.  It’s never been worth transcribing the Latin side.  But maybe now it is, just as a reading aid for those of us without fluent Greek?

This isn’t a new situation, in a way.  Indeed the reason why all these Latin translations even exist at all, is that knowledge of Greek was always rarer than fluency in Latin.  The translations are not always reliable; but something is better than nothing.

On the other hand it won’t be all that easy to OCR the Latin of Migne…

An excerpt from PG volume 78, column 226, a letter of Isidore of Pelusium in the Migne edition.

The low quality of Migne’s printing is something that we have all struggled with.

But there are workarounds.  The last time that I needed to OCR the Latin of Migne, I went and found the edition that he was reprinting on Google Books.  This, needless to say, was far better printed, and created many fewer errors in Finereader 15.

So it is possible, and it’s worth bearing in mind if we need to work with a large patristic text for which no modern translation exists.  Spend some time creating an electronic text of the Latin translation, and push it through Google Translate!

Update (5 Aug 2023): Note that it is actually possible to copy the OCR’d text from Google books itself, for both the Greek and Latin sides in the PG.  Go to the page in question.  Hit the cut-and-paste icon so it goes dark grey, then drag a rectangle over the area that you want to copy the text from. As you release the mouse, a dialogue will pop up, and the text is in the top box. It looks as if its monotonic for Greek. The results are quite respectable.

Share

A new BHL-type database of Latin hagiographical texts and manuscripts at the IRHT?

It seems that something is going on at the IRHT (= L’Institut de recherche et d’histoire des textes).  For those who do not know, the IRHT is the French manuscripts people.  They do all sorts of very useful things.  But there is no announcement, nor much online.  It’s a new database, designed to allow you to look up a particular Latin hagiographical text, and find what manuscripts it appears in.

We already have something of the sort.  A couple of decades ago, the Bollandist “BHL” index (= Bibliographica Hagiographica Latina) was turned into the BHLms database.  I use it intensely.  You look up a text by the BHL reference and it will tell you the libraries that have copies in manuscript, the manuscript reference number (= shelfmark), and the pages or folio numbers.  It’s great. But it is obviously old.

Last night I saw a job advert.  It’s for 6 months, starting in September.  It’s actually in various places, but I saw it here.  Excerpts (Google translate) –

Job offer – Study engineer (M/F). Feeding a database devoted to Latin hagiographic manuscripts

As part of the “Latin Legends” project – the result of a partnership between the IRHT, the Société des Bollandistes (Brussels) and the University of Namur – the IRHT offers a 6-month fixed-term contract to contribute to the data load of a new database dedicated to Latin hagiographic manuscripts.

This database, still in the process of being run in, is intended to list all the manuscript witnesses (medieval and modern) of Latin hagiographic texts identified by the “Bibliotheca Hagiographica Latina” (“BHL”). Once published, it will work in conjunction with the other databases hosted by the IRHT (Medium, Pinakes, Biblissima).

Sounds interesting!

– he will contribute to integrating the data from the Bollandist files (approximately 9000 handwritten files, now digitized on the IRHT servers). If necessary, he will complete the information, and, in case of doubt, will verify its accuracy.  …

The recruited research engineer will work under the scientific direction of Cécile Lanéry-Ouvrard (IRHT Researcher), within the Latin section of the IRHT (Aubervilliers site). He will also be in contact with the other researchers and technicians involved in the “Latin Legendaries” project (in particular Cyril Masset, from the IT department of the IRHT, and Fernand Peloux, researcher at the CNRS in Toulouse). On occasion, he may be required to correspond with the Société des Bollandistes de Bruxelles, or with other researchers specializing in hagiography. 

Part of the work may be carried out remotely, with this restriction, however, that he has permanent access to the tools necessary for the proper performance of his activities (access to IRHT servers, access to bibliographic tools, etc.)

The list of skills is formidable, but this isn’t really an IT role, as one bit amusingly makes plain:

some skills or previous experience in the field of databases and their operation would be welcome, to facilitate exchanges with the IT department of the IRHT, responsible for maintaining the database.

So it basically requires manuscripts skills, rather than hard-core SQL.  I wonder what the database actually is.

A bit of googling reveals another worker on the project, Antoine Charrié-Benoist, who is listed as doing data load for the BHLms database, and gave a paper about the project.

It would be good to know more.  But clearly whatever database is planned will be immensely useful.  Excellent news.

Share

Getting a manuscript offline from the Forschungsbibliothek Gotha

The Gotha collection of manuscripts is less well-known than it should be, except to specialists.  But anybody doing anything with English and Cornish and Welsh saints’ lives is aware of a semi-mythical manuscript in that collection, with the shelfmark “Gotha Forschungsbibliothek Membr. I 81”.  These lives are mainly accessed in an abbreviated recension made by John of Tynemouth and printed as “Nova Legenda Anglie”.  What makes the Gotha manuscript special is that it contains unabbreviated versions of some of this same material.

We live in a period of transition, where archives know that manuscript material ought to be accessible online.  But at the moment most archives have limited IT resources, both of infrastructure and people skills.  It’s important for extremely online people to remember this.  There may well be just one person at the other end.

A lot of Gotha manuscripts are online.  Unfortunately the website was clearly designed by a non-manuscript person – not at all uncommon, this! -, and it makes it hard to find what is online.  You can’t search by shelfmark.  If they would just put up a single page with all the manuscripts on, listed by shelfmark, and with a link to each ms, that would solve it.

Last Tuesday, a mere 6 days ago, I decided to write to the library and ask.  From the list of contacts I selected a certain Dr Henrikje Carius, and enquired.  I didn’t get a reply, but the following day I had an email instead from Dr Monika Müller:

Memb. I 81 has been digitized, however, the digital copy has not yet been put online due to the lack of a sufficient catalogue entry. It is provided to put the digital copy online in a project planned for next year. In general, the Research library sells already existing digital scans which not are accessible online for 8 Euro. Please, inform me about how you would like to proceed.

Here we see evidence of a library that is in the transitional period; because it’s hard to see why you would do all the hard work of photography and then not put it on the web, just because of cataloguing.  That’s an old trap that librarians sometimes fall into, because cataloguing is never finished.  All the same this was a very helpful reply.  But clearly we were going to get a version of the old-fashioned labour-intensive manual process that used to happen.

I was wary of the 8 euro charge, trivial as it was.  Accounting for money takes loads of manual labour, more than such a charge would justify.  Anyway I agreed to it, mainly out of curiosity.  The next step was that I was sent a long form in PDF format which was an “estimate”, and asked to complete it.  But also:

My apologies, that I have overlooked one aspect: As the manuscript has 230 folios and therefore the scan 460 images, it takes a lot of time to upload the scan. The library charges fees for this service, i.e. 25 Euro for the scans of Memb. I 81.

I didn’t know it then, but the zip file in question was 10Gb, so it did take a while.  I don’t think I’ve ever been charged for this before, however.  On the other hand, it was not so long ago that a CD would be sent out by post.

The paperwork duly caused problems.  Thankfully this was emailed to me – once, this would have been by post.  That is a step forward.  Unfortunately I was away from home and reading the PDF form on a phone.  I could see no way to enter text.  Emails to and fro.  When I returned home, two days later, I found that the PDF was indeed read-only!   So I printed it off, hand-scribbled my agreement, and scanned it back in and sent it in.  I would guess that I should have been sent a Word .docx file instead.  All transitional stuff.  They need a form online that you can enter the data into.

Once  I had emailed the PDF in then things moved swiftly.  Another document in PDF appeared, which luckily I did not have to do anything with.  Then I had to find out just how to send money.  International bank transfer was the sole option.  This is common in the EU, but rarely done outside.  Banks tend to charge 10 euros just for the trouble.  But I was fortunate: since the last time I did this, the banks have introduced ways to do it, and the money went over swiftly.  This morning I received a link to the download – the monster 10 Gb file!  This I shall stash on 3 external drives.

Inside the zip were all the pages in TIFF format, each about 30 mb.  I was relieved to find that they were all excellent quality colour photographs.  I opened one in MS Paint and saved it as PNG, and the size dropped to 20mb.  I then saved it as JPG and the size dropped to… 3mb.  That’s about the size I would expect.

What I want, of course, is a PDF.  I have the tools to create it, and then I can add bookmarks for the various sections of the manuscript.  So the PDF needs to be a reasonable size.

There are about 460 images in the folder, so I’m not doing that conversion manually.  Instead I used ImageMagick.  Looking at my collection of installers, I’ve not done this since about 2011!  But it all worked fine.  I right-clicked on the folder and opened it in Terminal, and then ran:

mogrify -format jpeg *.tif

This ran extremely fast and, in less than a minute, it had merrily converted every .tif image into a brand new .jpeg file in the same directory.  Whatever the image conversion defaults were – some loss of quality, of course -, the jpg file size was 3mb each time, and the images looked just as readable for my purposes.  I then fired up Adobe Acrobat Pro 9 – very elderly now, but still working – and combined all the .jpgs (ignoring endleaves etc) into a PDF.  This itself is a mighty 1.18 Gb, but it will serve my purposes very well.

The next step is to use an online set of contents, and create bookmarks.

Thank you, Dr Müller, and the Forschungsbibliothek staff, for what was a far more efficient process than in the past.

Share

Searching for BHL 6173 (part 2)

In my last post, I started searching online for a manuscript copy of BHL 6173, a miracle story about St Nicholas, which has never been printed.  Two French manuscripts were supposed to contain a copy; neither did.  But two Austrian manuscripts were also listed by the Bollandists in their BHLms database:

  • Heiligenkreuz SB 14
  • Melk SB C.12

Both of these abbeys are in Austria.  This has a union site, which is a good idea.  All the fully digitised manuscripts they have can be located here, and then you drill down.  So far, so good.

There are 93 fully digitised mss of Melk online!  That’s great news.  I find that “C 12” is the old shelfmark – the site in fact lists a concordance of Melk shelfmarks here, but it is useless unless you know which catalogue your source was working from – unlikely with an old reference.  But it’s a fine idea in principle.

In fact “Melk C 12” is now Melk 546, online here.  It’s a 15th century manuscript, so very late.  But we don’t care about that.

Unfortunately the manuscripta.at site has been changed since I last looked at it.  It was frankly rather clunky, but it was entirely usable.  It is now rather quicker to find the actual digitised manuscript.  But otherwise the changes are a disaster.  No researcher can work with this.  Negative changes include:

  • Disabled downloads – at least for the public – and instead tried to force you to use their online browser.
  • Set up that browser menu so that Google Translate can’t translate their pop-up menus.  Non-German speakers are not welcome.
  • Made sure the menu options cannot even be copied, in case you tried to use Google Translate that way.
  • Clicking on “fol. 40 r” instead displays f.36r.
  • There’s no way to download the page that I want.  Links point to the wrong pages.

Somebody has really set out to make the researcher’s job impossible.  There are good, solid reasons why researchers hate librarians. Stuff like this, that makes your life harder, is the reason why.  This has cost me an hour of pain, and in reality the manuscript will now be omitted from my list of witnesses.

The only part of all this that is actually an improvement is that the “Scroll” option in the browser – which, weirdly, is horizontal – is quick.  You can skim through the pages.  On fol. 40r I do find “Quidam praepotens vir“.  Not that I can download the page, of course.

Luckily for me the amount of text that I want is small, and can be screen grabbed.  Here’s the text of BHL 6173.

It’s not hugely readable, to a layman.  I’ll try transcribing it another time.

Blessedly the manuscript also contains BHL 6175, which I am also looking for.  This is only found in the Melk and Heiligenkreuz manuscripts, plus one in Belgium, KBR 07487-07491 (3182), somewhere between fol. 170v-185v, a 13th century manuscript.  But that isn’t online.

What about the Heiligenkreuz 14 manuscript?  Sadly not.  Some of the Heiligenkreuz manuscripts are indeed online, but not this one.  [Update, March 21: Heiligenkreuz 14 is indeed now online].

That’s our four manuscripts, and we have a single hit, which luckily contains both unpublished texts.

But although the Bollandists with their BHL, and BHLms database, are the essential reference, they are not the sole source of all truth.  Google searches can reveal things unknown to the excellent fathers.

Doing so led me to a massive monograph online here at Persee.fr, by Sarah Staats, “Le catalogue médiéval de l’abbaye cistercienne de Clairmarais et les manuscrits conservés” (2016).  And on page 64, we learn of a 12th manuscript, now Saint-Omer 701, which contains part of the Speculum Ecclesiae of Honorius Augustodunensis (who?).  This contains on fol. 121v-122r a “Sermo de sancto Nicolao” (BHL 6173 and 6175).  That manuscript is online and accessible through Mirador.  Here is part of the opening in question! 

Which is a nice bonus.  I think we can get a text together using those two witnesses, don’t you?

Have a good weekend, everyone.

Share

From my diary

For some months a copy of Charles W. Jones, Saint Nicholas of Myra, Bari, and Manhattan has sat next to my computer, pestering me to read it.  Today I gave up and fed it to the sheet-feed scanner.  It is no more; just a PDF, floating in the void.  Even as I write, Adobe Acrobat Pro is OCRing it.

I did try.  I really did.  But although the book is full of erudition, it is just so annoying to read.  This is entirely the fault of the author, for departing from normal standards of scholarly writing, and introducing a literary conceit.

Jones pretends that the legend of St Nicholas is like a person, and so his chapters bear annoying and pointless titles such as “Boyhood”, “Maturity”, and so forth.  This neatly conceals the content in a quite amazing way.

But there is worse.  Jones refers to the legend as “N”.  He then writes, in his text, how “N” does this, or that, displays this or that human quality.  It is utterly, utterly wearisome, at least to me, and again obstructs the reader as he tries to work out exactly what is being said.  Jones displays formidable erudition.  But he also displays a tendency to make literary digressions.  Need I add that his footnotes are all banished to an appendix?  And the numbering restarts with each subsection of each chapter?  And that the table of contents does not list those subsections?  To a busy man seeking specific information, such casualness is a burden.

I did try to read through it twice, but gave up.  The last time I did so, I came across a short section which he had translated, so he said, from the Mombritius edition of John the Deacon.  I put a couple of bookmarks in the book, one in the text and one at the back in the notes.  Today I compared that translation with my own text and translation of John.  It was no translation at all, but rather a paraphrase.  No doubt all his translations are the same.  At that point I snapped, and decided that a searchable PDF would be of infinitely more use.  It is gone.

A couple of days ago, a kind correspondent wrote enquiring about the Gotha manuscript I. 81, containing versions of English and Cornish saints’ lives.  This manuscript is described as containing a rather better text than that of John of Tynemouth.  I found a website run by the Gotha collection at Erfurt University.  I was delighted to find that a good solid number of manuscripts were online.  But the website is clearly a first generation effort, constructed by people who never consulted a manuscript in their lives.  It seems to be impossible to find out whether or not a given manuscript is online.

So I wrote and asked if this manuscript was online.  It is not, and not scheduled to go online for a year.  But the photographs already existed; and, for money – seemingly to cover their time – I could have a copy.  I have since been trying to get hold of these.  I get the impression that the library staff are genuinely trying to help.  But the process is much more clunky than it needs to be.  I will probably write something about this, simply as a historical record of what researchers could have to go through in order to access a manuscript, even as late as 2023.

But I am very tolerant of these babysteps by institutions.  The pace of change in their world is breathtaking.  They have limited resources, yet everyone expects everything all at once.  They all have to start somewhere.  Erfurt at least understand that they must move with the times, and are trying.  But the old habits of paperwork die hard!  Still, we have come so far since the days when I was pestering the British Library about these matters.  What I’ve been doing, from a mobile phone, over the last two days, would have been unthinkable a few years ago.

Share