The decretal “Consulenti tibi” (JK 293) and the canon of the bible

During the fourth century a change comes over the church, and indeed the bishop of Rome.  By the end of the century the medieval papacy is coming into existence.  The accession of Pope Damasus was attended with rioting in the streets and in the churches of Rome, as supporters of the candidates sought to impose their man by means of violence; and the lifestyle of Damasus was such that the Urban Prefect, Praetextatus, was reported as saying to him, by St Jerome (To Pammachius, against John of Jerusalem, c. 8):

“Facite me Romanae urbis episcopum , et ego protinus Christianus.”

“Make me bishop of Rome and I will at once become a Christian!”

Various pieces of imperial legislation require the church courts to follow the practices of the secular courts when hearing appeals.  Likewise it is in this period that the church began its collections of canon law, and papal decretals, and the other apparatus of a institution.

My attention was drawn to the letter of Pope Innocent I to Exsuperius of Toulouse in 405, which contains a canon of scripture.  This turns out to be a papal decretal. I have never known anything about these.  Apparently D. Jasper’s paper “Papal letters and decretals from the beginning through the pontificate of Gregory the Great (to 604)”, pp. 7 ff in D. Jasper & H. Fuhrmann, Papal letters in the Early Middle Ages, CUA (2001) is the orientation to read. (Preview).  A decretal is a papal letter, containing a ruling in response to an appeal for such a ruling from a subordinate bishop.

There is a catalogue of decretals, begin by Ph. Jaffé, Regesta pontificum romanorum; the 2nd edition, co-edited with S. Lowenfeld (who did 882-1198 AD), J. Kaltenbrunner (everything up to 590 AD), and P. Ewald (590-882 AD), appeared at Lepizig in two volumes in 1885 (here) and 1888 (here).  This lists all the decretals and gives them a number, with a brief summary of content.

The letter of Innocent to Exsuperius is JK 293 (on p.45, PDF page 86), with the incipit “Consulenti tibi”.  Here’s the entry:

This summarises the content, which falls into several chapters.  The last, as an appendix, gives a list of the canon of scripture, “which books are received in the canon.”

The Latin text of the letter / decretal is in PL20, 495-502, where it is labelled as Letter 6″ of Innocent I. Apparently a critical text was given by H. Wurm in 1939 in 87 Hubert Wurm, Decretales selectae ex antiquissimis romanorum Pontificum epistulis decretalibus, in: Apollinaris 12 (1939), 40-93, but this I have not seen; in fact I can’t even find any information about the journal.

A couple of chunks of the decretal were translated by Denzinger, and are online here.  An English translation of chapter 7 is online here with the Latin.

(2). . . It has been asked, what must be observed with regard to those who after baptism have surrendered on every occasion to the pleasures of incontinence, and at the very end of their lives ask for penance and at the same time the reconciliation of communion. Concerning them the former rule was harder, the latter more favorable, because mercy intervened. For the previous custom held that penance should be granted, but that communion should be denied. For since in those times there were frequent persecutions, so that the ease with which communion was granted might not recall men become careless of reconciliation from their lapse, communion was justly denied, penance allowed, lest the whole be entirely refused; and the system of the time made remission more difficult. But after our Lord restored peace to his churches, when terror had now been removed, it was decided that communion be given to the departing, and on account of the mercy of God as a viaticum to those about to set forth, and that we may not seem to follow the harshness and the rigor of the Novatian heretic who refused mercy. Therefore with penance a last communion will be given, so that such men in their extremities may be freed from eternal ruin with the permission of our Savior.

(7) A brief addition shows what books really are received in the canon. These are the desiderata of which you wished to be informed verbally: of Moses five books, that is, of Genesis, of Exodus, of Leviticus, of Numbers, of Deuteronomy, and Joshua, of judges one book, of Kings four books, and also Ruth, of the Prophets sixteen books, of Solomon five books, the Psalms. Likewise of the histories, job one book, of Tobias one book, Esther one, Judith one, of the Machabees two, of Esdras two, Paralipomenon two books. Likewise of the New Testament: of the Gospels four books, of Paul the Apostle fourteen epistles, of John three [cf. n. 84, 92] epistles of Peter two, an epistle of Jude, an epistle of James, the Acts of the Apostles, the Apocalypse of John. Others, however, which were written by a certain Leucius under the name of Matthias or of James the Less, or under the name of Peter and John (or which were written by Nexocharis and Leonidas the philosophers under the name of Andrew), or under the name of Thomas, and if there are any others, you know that they ought not only to be repudiated, but also condemned.

Here’s the relevant bit of a (very poor) scan of the PL.

The last sentence is telling:

Data x kalendas Martias, Stilicone secundo et Anthemio viris clarissimis consulibus.

Given on the 10th day before the kalends of March, the nobile Stilicho for the second time and Anthemius being consuls.

The sack of Rome by the Goths was a mere 5 years away.

Share

BHL 5955b – the “Miracula in Monte S. Michaelis in Cornubia”

There is a very obscure medieval text, dated to 1262, which is referred to in a couple of modern works as the “Miracula in Monte S. Michaelis in Cornubia” – “The miracles at St Michael’s Mount in Cornwall”.  It is, apparently, listed in the 1986 supplement to the Bibliographica Hagiographica Latina, “supplementum novum”, published by the Bollandists and still available on the website for no less than 130 euros.  The volume is itself not commonly held, and I have no access to it.  But I understand the author of the BHL supplement assigned the “Miracula” text the reference number  of BHL 5955b.

This information I derive from Richard F. Johnson, Saint Michael the Archangel in Medieval English Legend, (2002), p.68, n. 91.  This is a comment on

… Mirk follows the Garganic myth with a rendering of an apparition of St. Michael to “another bishop at a place that is now called Michael’s Mount in Cornwall.”[90] Although there indeed is a tradition of an apparition by St. Michael in Cornwall,[91]…

The footnote is:

91.  The apparition in Cornwall is designated “Miracula in Monte S. Michaelis in Cornubia” (BHL 5955b). On this apparition and St. Michael’s Mount in Cornwall, see G. H. Doble, Miracles at St. Michael’s Mount in Cornwall in 1262 (St. Michael’s Mount, 1945) and J. R. Fletcher, Short History of St. Michael’s Mount (St. Michael’s Mount, 1951).

As printed this footnote can cause quite a bit of confusion.  It would be clearer in this form:

91.  The 12th century text recording healings by St Michael in Cornwall has been given the modern title “Miracula in Monte S. Michaelis in Cornubia” (BHL 5955b). For the text and translation see G. H. Doble, Miracles at St. Michael’s Mount in Cornwall in 1262 (St. Michael’s Mount, 1945).  On St Michael’s Mount see J. R. Fletcher, Short History of St. Michael’s Mount (St. Michael’s Mount, 1951).

For the “Miracula” text itself does NOT in fact record any apparition; instead it records the miraculous healing of three people who came into the church of St Michael.  St Michael does not appear to anyone, unlike the situation alluded to and referenced to Mirk’s Festial which reads (p.258):

He aperet also to another byschop at a place that ys callet now Mychaell yn the mownt yn Corneweyle, and bade hym go to a hullus top that ys fer, and theras he fonde a bull tent wyih theues, ther he bade make a chyrche yn the worschyp of hym.

The Doble item is merely a couple of sheets of paper, with no title page, nor indication of date.  The catalogues that I have seen date it to the 1930s; which is perhaps more likely than 1945.  Thankfully it is online here.  The footnote does NOT make clear is that it is, in fact, the editio princeps of the “Miracula” text, together with an English translation of it.  In fact it contains nothing else of consequence.  (The Fletcher item is a small hardback, but I have no access to it.)

Let’s look further at the “Miracula” text.  From Doble we learn, by close reading, that he took the text from manuscript Avranches 159, folio 3r, at the foot of the second column.  This manuscript he says contains miscellaneous material, as well as its main text.  The “Miracula” is one such.

But we have an advantage over Canon Doble.  For we live in the age of digital manuscripts.

The surviving manuscripts of the great abbey of Mont S. Michel are now to be found at the public library – Bibliothèque Municipale – at Avranches, where the agents of the French Revolution deposited them.  Doubtless there were many losses.  But their modern heirs have placed the manuscripts online.  Our manuscript may be found here, and you can see the page images by clicking on the binding image at the bottom.

Avranches BM 159 is a 12th century manuscript of the Chronicon Eusebii, plus supplements.  But that work is preceded by three leaves of parchment in a different hand.  Folio 1r has an unreadable paragraph, at least to me; folio 1v starts talking about the books at the abbey of Bec; and fol. 2r, v and f.3r contain a catalogue of the books, giving their titles.  The red splodges seem to be intended to highlight such things as a change of author.  It is quite an impressive collection, for a 12th century abbey.  It is followed by a short paragraph, then more books; and then our text.

Our text is clearly visible on folio 3r.  It has no title, so “Miracula in Monte S. Michaelis in Cornubia” is a modern coinage, presumably by the Bollandist editor.  Doble does not give the work any title; indeed it is probably just a translation of the English title of his pamphlet!  That this is indeed the same work can be seen by looking at the incipit (the starting words) and explicit (final words) of the text, as printed by Doble; as visible in the manuscript, and as given for BHL 5955b on the Bollandist website which gives no other details:

Incipit: Nulli monasterio S. Michaelis in Cornubia accedenti
Desinit: …anno Domini MCCLXII, XIII kal. septembris.

So all these items are the same item.

Let’s look at folio 3r:

Avranches 159, fol. 3

Look at the right-hand column.  The top section is just the list of books.  Then there is a blank line, then a chunk of text, then another blank line.  Then a paragraph with red marks, which seems to be additional “libri”. And then, without any blank line, our text begins with a capital N beginning “Nulli…”.  The whole text is contained here, with abbreviations, and ends with “septembris”.

Here is the transcription by Canon Doble:

Nulli monasterio sancti michaelis in Cornubia accedenti vertatur in dubium quin quaedam mulier nomine Christina de partibus glastonie per sex fere annos occulorum luminibus orbata ad dictum monasterium orationis et peregrinationis causa cum maxima deuocione accedens ii ydus maii anno domini m cc lx ii ante magnam missam quadam die dominica in conspectu populi in maxima fide perseuerans intercessione beati archangeli michaelis clausorum recuperauit diuinitus lumen occulorum testibus presentibus quamplurimis religiosis & aliis. Eodem anno iii ydus Junii quedam mulier nomine matildis de parrochia lanescli que per duos dies & duas noctes sensum amiserat & loquelam a parentibus suis ducta ad illud monasterium die dominica statim cum intrasset ecclesiam precibus celestis milicie principis sensui & loquele fuit restituta. Ego vidi & interfui. erat tunc temporis prior illius loci Radulfus viel. Eodem anno quedam iuuencula nomine aalicia de partibus de herefort engales nata per septem annos elapsos occulorum luminibus orbata ad dictam ecclesiam orationis et peregrinationis causa cum maxima deuotione accedens iii i kal. Februarii ante solis ortum quadam die lune in maxima fide persuerans precibus beati michaelis archangeli clausorum recuperauit diuinitus lumen occulorum erant tunc temporis socii illius loci petrus de vallibus eng(elrannus) de baiocis mauricius taboeier quando illa iiii miracula in illa ecclesia acciderant quartum miraculum de quodam muto est in principio huius libri in vii folio anno domini mcclxii xiii kal septembris.

And his translation:

“Let no one going to the Monastery of St. Michael in Cornwall doubt that a certain woman, named Christina, of the neighbourhood of Glastonbury, who had been deprived of the sight of her eyes for about six years, coming with the greatest devotion to the said monastery for the sake of prayer and pilgrimage, on 14th May, 1262, before High Mass, on a certain Sunday, in the sight of the people, persevering in the greatest faith, by the intercession of the Blessed Archangel Michael, recovered miraculously (lit. divinely) the sight of her closed eyes. There were present as witnesses many monks and others.

In the same year, on the 11th June, a certain woman named Matilda, of the parish of Lanescli (Gulval), who for two days and two nights had lost consciousness and the power of speech, being brought by her parents to that monastery, on Sunday, immediately she had entered the church, by the prayers of the Captain of the Heavenly Chivalry, was restored to consciousness and power of speech. I saw it and was present. The Prior of that place then was Ralph Viel.

In the same year a certain girl named Alice, of the parts of Hereford, born in Wales, who for seven years past had been deprived of the sight of her eyes, coming with the greatest devotion to the said church for the sake of prayer and pilgrimage on the 29th of January, before the rising of the sun, on a certain Monday, persevering in the greatest faith, by the prayers of the Blessed Archangel Michael recovered miraculously the sight of her closed eyes. The socii of that place then were Peter De Vallibus, Engelran of Bayeux, Maurice Taboeier, when those four miracles happened in that church.

The fourth miracle, on a certain dumb man, is in the beginning of this book on page 7, in the year of Our Lord 1262, on the 20th August.

Mr Doble adds,

Unfortunately the page containing the record of the fourth miracle has disappeared.

These few leaves at the start of the manuscript evidently were part of a larger volume before being found in as endleaves to Avranches BM 159.

I hope that anybody in search of “Miracula in Monte S. Michaelis in Cornubia” will find these notes useful.

Share

New publication: Georgi Parpulov’s catalogue of NT catenas

A useful new open-access publication!  Georgi Parpulov has compiled a fresh catalogue of manuscripts containing the medieval chain-commentaries (“catenas”) on the Greek New Testament.  It’s being [published by Gorgias Press, here, but a free PDF is available here.  Get it now while it’s hot!

From Gorgias Press:

The book is a synoptic catalogue of a large class of Greek manuscripts: it describes all pre-seventeenth century copies of the Greek New Testament in which the biblical text is accompanied by commentary. Manuscripts where this commentary consists of combined excerpts (catena) from the works of various authors are described in particular detail. Those that have similar content are grouped together, so that the potential relatives of any given manuscript can be easily identified. Several previously unknown types of catenae are distinguished and a number of previously unstudied codices are brought to light for the first time. To ensure its longer shelf-life, the volume systematically references on-line electronic databases (which are regularly updated). It will be of use to anyone interested in Byzantine book culture and in biblical exegesis.

I remember that Eusebius’ Gospel Problems and Solutions included fragments of the work quoted by Nicetas of Heraclea in the catena on Luke.  It was very hard to find source material.  I’ve written before on catenas, and they are a very neglected area.  This catalogue must be of very great value.  Thank you, Dr. P.!

Via: Elijah Hixson at ETC Blog here.

Share

From my diary: the Tertullian Project cleanup

I’ve continued to work on cleaning up the old Tertullian Project website.  I’ve just counted how many Html pages it includes – the answer is 8,147.  I have been a busy boy, it seems, over the last 24 years.  By chance I came across a page announcing the “tertullian.org” domain – that appeared in 1999, it seems.

Something that I have removed reluctantly is the “counter” that showed the number of hits on each page.  But it had long ceased to work, and the numbers were all wrong anyway.  I gather that such things are often a security threat these days, which I can understand.

I’ve added to every page a long and annoying “meta” tag specifying the “viewport”. The only reason for this is that you get marked down by search engines if it is not there, so everyone is adding it to their pages.  I imagine sooner or later someone will realise the waste involved and get rid of it again.

I’ve started to look at the broken links, of which there are many.  Internal links I can fix.  These must always have been wrong.  External links to now vanished websites are another matter.  One possible solution would be to link to the version of that site archived at Archive.org, but this would be a hugely time-consuming business.  Another would be to remove the link; but this also removes the opportunity for the user to go and find the content at Archive.org.  I suspect that I will have to ignore external breakages.

It’s been a week and a half since I began.  The labour is immense, even with scripting tools.  I’ve always preferred to add content rather than worry about technical underpinnings.  I suppose a couple of weeks once in 25 years is not unreasonable.  Maybe I will revisit it again in 25 years.

I’ll continue working on internal links for a day or two, I think, and that will be that.  Whether it will produce better search results is another question.

Share

Peeking through the arch of Constantine – another view of the Meta Sudans

Another photograph care of Roma Ieri Oggi depicts a US actress, Aloha Wanderwell, with husband, in front of the Arch of Constantine in 1928.  The angle is square on to the arch, unusually, so we can see the Meta Sudans particularly clearly through the arch. Nice!

Share

From my diary

Over the last few days I have been working on the static HTML files of the Tertullian Project.  My objective is to improve its metrics in the search engine race, but I have found much else to do. I’ve enabled HTTPS, as seems trendy today (and you get marked down for not having it).

Most of the files were in ANSI format.  Many contained strange characters, a product of the very spotty support for anything but US 7-bit ascii, even today.  Various meta tags have been removed; others will be added.

At the moment I am grepping the files for non-ascii characters, whatever they happen to be, and fixing files.  I had not remembered that I have a letter by Lupus of Ferrières online, until my grep informed me that the e-grave in his name was corrupt.  Likewise an old favourite about a “Ramshackle Room on the Banks of the Cam” had mysteriously acquired corruption.

A kind correspondent let me know that, while doing this, I had disabled the Roman cult of Mithras pages.  This was mainly because a symlink had vanished; but I spent a stressful hour this morning before I discovered that a .htaccess file needed to be copied also.

A few days ago one of my backup hard drives started to make a squealing sound while idling.  I took this seriously; any odd noises from hard drives mean that failure is imminent.  What annoyed me was that the drive was only 3 months old; and bought to replace a drive which did the same only 6 months earlier.  So last week I bought a (third!) replacement from Amazon, which was defective on arrival, and went straight back.  I then bought a different drive from a shop locally.  But it is not a trivial process to back up in full the terabytes of data on my PC, and it is still running as I speak.

Ah well.  On with it!

Share

An aerial view of the Colosseum, the Meta Sudans, and the base of the Colossus (1909-25)

Via the amazing Roma Ieri Oggi site, I learn of this interesting aerial photo of the Colosseum and, much more interestingly, the meta sudans and the base of the Colossus, the statue of Nero.  It was made between 1909-25.

At the bottom left the gate of Constantine.  Above it is the Meta Sudans, the demolished Roman fountain.  And above that is a square pedestal, also ancient, which is the base on which once stood the massive statue of Nero known as the Colossus, from which the Colosseum took its name.  I believe the pedestal was also demolished by Mussolini when he created the Via del Impero at the top left of the picture.

Share

When saints disagree: the angry parting of St Epiphanius and St John Chrysostom

John Chrysostom started his career as a popular preacher in Antioch in the late fourth century.  Then he was translated to Constantinople, to take up the role of Patriarch.  This was a highly political role, and whoever held it was the target of intrigue and machinations.  So it was with Chrysostom; and eventually his many enemies got him deposed and exiled, and he died while in exile.

This was not the end of his story.  Once his most bitter foes had passed from the scene, it was decided that Chrysostom was actually the victim here, and he was rehabilitated.  He went on to become the most important of the Greek fathers.  His works are preserved in an enormous number of handwritten copies.

The seedy methods of the intriguers are what they always are, except for one unusual point.  Theophilus, Patriarch of Alexandria, was Chrysostom’s enemy, as every Patriarch of Alexandria was a rival with every Patriarch of Constantinople.  He arranged for a “Synod of the Oak” at which Chrysostom was to be put on trial.  Further, he invited the famous Epiphanius of Salamis to attend.

Epiphanius was by this time an old man.  He is best known today from his catalogue of heresies, the Panarion.  This is invaluable as a guide to these groups, which are often today rather obscure.  But the impression given to many readers is of a rather coarse, not too-intelligent man, prone to hasty judgements.  Epiphanius had already got involved in the origenist disputes, which were then just getting underway.  That these were really a pretext for political infighting rather than any genuine doctrinal issue seems to have completely escaped him, as it did many.

So Theophilus got Epiphanius, the heresy hunter, to come to his synod at which he proposed to frame Chrysostom.  Epiphanius came to Constantinople spoiling for a fight.  Chrysostom, wisely, refused to be provoked.  The exact chronology of events is unclear, but it seems that Epiphanius did not in the end attend the synod.  Instead he left Constantinople by ship, intending to return to Cyprus.  We might speculate that the old man had finally realised that he was merely a pawn in someone else’s quarrel, and chose to leave rather than get further involved.

Both Sozomen (H.E. 8, 15:1-7) and Socrates (HE 6, 14:1-4) record that a story circulated about the two saints.  Here’s Socrates, in the old NPNF translation here:

Some say that when he was about to depart, he said to John, `I hope that you will not die a bishop’: to which John replied, `Expect not to arrive at your own country.’ I cannot be sure that those who reported these things to me spoke the truth; but nevertheless the event was in the case of both as prophesied above. For Epiphanius did not reach Cyprus, having died on board the ship during his voyage; and John a short time afterwards was driven from his see, as we shall show in proceeding.

And here is Sozomen:

I have been informed by several persons that John predicted that Epiphanius would die at sea, and that this latter predicted the deposition of John. For it appears that when the dispute between them was at its height, Epiphanius said to John, “I hope you will not die a bishop,” and that John replied, “I hope you will never return to your bishopric.”

Both spoke truly.  Epiphanius died at sea, and never saw Cyprus again, while Chrysostom died in exile.

Both writers express some doubts about the story.  Subsequent hagiographers play down the dispute, as Young Richard Kim has recently discussed in a fascinating article, “An Iconic Odd Couple: The Hagiographic Rehabilitation of Epiphanius and John Chrysostom”, Church History 87 (2018), 981-1002.[1]

All the same, it is an amusing picture.

Share
  1. [1]doi:10.1017/S0009640718002354

Why “search engine optimisation” is an evil

We all want our words to be heard.  Our carefully crafted essays to be found.  That means that they must be visible in Google.  It is, indeed, for no other reason that I have devoted a couple of days of my life to doing some work on the old Tertullian Project files.

Increasingly it is only commercial sites that a Google search returns.  If you search for some out-of-copyright text, available for nothing online, you must first scroll past half-a-dozen adverts for people offering to sell you that, and then page through bookseller sites.  Google gains revenue if you are foolish enough to buy; but real people lose time and energy and money.

But once you start to look at the techniques needed for search engine optimisation, and the endless tweaks and nudges necessary, a conviction comes over you: that all this is evil.  For who has the time to do all this?

I’ve just seen some stuff telling me how I can improve my hits in this WordPress based blog, in respect of just one “problem”.  I’d have to install two plugins, activate them, and check whether or not they mess anything up.  Not too onerous; but impossible if you just sit at home with a text-editor.

When the WWW started, we were all equals.  We all created our HTML in a text editor like Notepad.  We all got traffic equally.   A corporation had no advantage over a man in a bedroom.  But now… not so.

The people who get the hits are not those who have something original and of value to offer.  They are those with the resources to do all the SEO tweaking necessary.  Effectively it privileges the corporation at the expense of the ordinary man or academic.  For the latter simply cannot keep up with all the effort needed.

I do not know the answer to all this.  But the web is a much different place to what it was.  We now have an effective monopoly in place, no different to the old Bell monopoly.  The events of January 2021 and the coordinated attack on Trump revealed that, for practical purposes, access to the web is controlled by a cartel – Google, Amazon, Facebook, Twitter – who can and do coordinate their control of the internet.

The answer must be the same as in the days of the old Bell monopoly.  It must be broken up.

Share

Converting old HTML from ANSI to UTF-8 Unicode

This is a technical post, of interest to website authors who are programmers.  Read on at your peril!

The Tertullian Project website dates back to 1997, when I decided to create a few pages about Tertullian for the nascent world-wide web.  In those days unicode was hardly thought of.  If you needed to be able to include accented characters, like àéü and so forth, you had to do so using “ANSI code pages”.  You may believe that you used “plain text”; but it is not very likely.

If you have elderly HTML pages, they are mostly likely using ANSI.  This causes phenomenal problems if you try to use Linux command line tools like grep and sed to make global changes.  You need to first convert them to Unicode before trying anything like that.

What was ANSI anyway?

But let’s have a history lesson.  What are we dealing with here?

In a text file, each byte is a single character.  The byte is in fact a  number, from 0 to 255.  Our computers display each value as text on-screen.  In fact you don’t need 256 characters for the symbols that appear on a normal American English typewriter or keyboard.  All these can be fitted in the first 127 values.  To see what value “means” what character, look up the ASCII table.

The values from 128-255 are not defined in the ASCII table.  Different nations, even different companies used them for different things.  On an IBM these “extended ASCII codes” were used to draw boxes on screen!

The different sets of values were unhelpfully known as “code pages”.  So “code page” 437 was ASCII.  The “code page” 1252 was “Western Europe”, and included just such accents as we need.  You can still see these “code pages” in a Windows console – just type “chcp” and it will tell you what the current code page is; “chcp 1252” will change it to 1252.  In fact Windows used 1252 fairly commonly, and that is likely to be the encoding used in your ANSI text files.  Note that nothing whatever in the file tells you what the author used.  You just have to know (but see below).

So in an ANSI file, the “ü” character will be a single byte.

Then unicode came along.  The version of unicode that prevailed was UTF-8, because, for values of 0-127, it was identical to ASCII.  So we will ignore the other formats.

In a unicode file, letters like the “ü” character are coded as TWO bytes.  This allows for 65,000+ different characters to be encoded.  Most modern text files use UTF-8.  End of the history lesson.

What encoding are my HTML files using?

So how do you know what the encoding is?  Curiously enough, the best way to find out on a Windows box is to download and use the Notepad++ editor.  This simply displays it at the bottom right.  There is also a menu option, “Encoding”, which will indicate all the possibles, and … drumroll … allow you to alter them at a click.

As I remarked earlier, the Linux command line tools like grep and sed simply won’t be serviceable.  The trouble is that these things are written by Americans who don’t really believe anywhere else exists.  Many of them don’t support unicode, even.  I was quite unable to find any that understood ANSI.  I found one tool, ugrep, which could locate the ANSI characters; but it did not understand code pages so could not display them!  After two days of futile pain, I concluded that you can’t even hope to use these until you get away from ANSI.

My attempts to do so produced webpages that displayed with lots of invalid characters!

How to convert multiple ANSI html files to UTF-8.

There is a way to efficiently convert your masses of ANSI files to UTF-8, and I owe my knowledge of it to this StackExchange article here.  You do it in Notepad++.  You can write a macro that will run the editor and just do it.  It runs very fast, it is very simple, and it works.

You install the “Python Script” plugin into Notepad++ that allows you to run a python script.  Then you create a script using Plugins | Python Script | New script.  Save it to the default directory – otherwise it won’t show up in the list when you need to run it.

Mine looked like this:

import os;
import sys;
import re;
# Get the base directory
filePathSrc="d:\\roger\\website\\tertullian.old.wip"

# Get all the fully qualified file names under that directory
for root, dirs, files in os.walk(filePathSrc):

    # Loop over the files
    for fn in files:
    
      # Check last few characters of file name
      if fn[-5:] == '.html' or fn[-4:] == '.htm':
      
        # Open the file in notepad++
        notepad.open(root + "\\" + fn)
        
        # Comfort message
        console.write(root + "\\" + fn + "\r\n")
        
        # Use menu commands to convert to UTF-8
        notepad.runMenuCommand("Encoding", "Convert to UTF-8")
        
        # Do search and replace on strings
        # Charset
        editor.replace("charset=windows-1252", "charset=utf-8", re.IGNORECASE)
        editor.replace("charset=iso-8859-1", "charset=utf-8", re.IGNORECASE)
        editor.replace("charset=us-ascii", "charset=utf-8", re.IGNORECASE)
        editor.replace("charset=unicode", "charset=utf-8", re.IGNORECASE)
        editor.replace("http://www.tertullian", "https://www.tertullian", re.IGNORECASE)
        editor.replace('', '', re.IGNORECASE)

        # Save and close the file in Notepad++
        notepad.save()
        notepad.close()

The indentation with spaces is crucial for python, instead of curly brackets.

Also turn on the console: Plugins | Python Script | Show Console.

Then run it Plugins | Python Script | Scripts | your-script-name.

Of course you run it on a *copy* of your folder…

Then open some of the files in your browser and see what they look like.

And now … now … you can use the Linux command line tools if you like.  Because you’re using UTF-8 files, not ANSI, and, if they support unicode, they will find your characters.

Good luck!

Update: Further thoughts on encoding

I’ve been looking at the output.  Interesting this does not always work.  I’ve found scripts converted to UTF-8 where the text has become corrupt.  Doing it manually with Notepad++ works fine.  Not sure why this happens.

I’ve always felt that using non-ASCII characters is risky.  It’s better to convert the unicode into HTML entities; using ü rather than ü.  I’ve written a further script to do this, in much the same way as above.  The changes need to be case sensitive, of course.

I’ve now started to run a script in the base directorym to add DOCTYPE and charset=”utf-8″ to all files that do not have them.  It’s unclear how to do the “if” test using Notepad++ and Python, so instead I have used a Bash script running in Git Bash, adapted from one sent in by a correspondent.  Here it is. in abbreviated form:

# This section
# 1) adds a DOCTYPE declaration to all .htm files
# 2) adds a charset meta tag to all .htm files before the title tag.

# Read all the file names using a find and store in an array
files=()
find . -name "*htm" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
      #echo $REPLY - the default variable from the read
      files+=("$REPLY")
done <tmpfile
rm -f tmpfile

# Get a list of files
# Loop over them
for file in "${files[@]}"; do

    # Add DOCTYPE if not present
    if ! grep -q "<!DOCTYPE" "$file"; then
        echo "$file - add doctype"
        sed -i 's|<html>|<!DOCTYPE html>\n<html>|' "$file"
    fi

    # Add charset if not present
    if ! grep -q "meta charset" "$file"; then
        echo "$file - add charset"
        sed -i 's|<title>|<meta charset="utf-8" />\n<title>|I' "$file"
    fi

done

Find non-ASCII characters in all the files

Once you have converted to unicode, you then need to convert the non-ASCII characters into HTML entities.  This I chose to do on Windows in Git Bash.  You can find the duff characters in a file using this:

 grep --color='auto' -P -R '[^\x00-\x7F]' works/de_pudicitia.htm

Which gives you:

Of course this is one file.  To get a list of all htm files with characters outside the ASCII range, use this incantation in the base directory, and it will walk the directories (-R) and only show the file names (-l):

grep --color='auto' -P -R -n -l '[^\x00-\x7F]' | grep htm

Convert the non-ASCII characters into HTML entities

I used a python script in Notepad++, and this complete list of HTML entities.  So I had line after line of

editor.replace('Ë','&Euml;')

I shall add more notes here.  They may help me next time.

Share