Any glitches?

I’ve just upgraded WordPress, the software that this blog uses.  Do let me know if anything is now broken.

I can’t say that the process was seamless.  First I did an export, and deactivated the plugins. I ended up downloading the tar.gz. Then I renamed the old version directory, expanded the .gz  to the new wordpress directory, renaming it weblog, copying the wp-config-sample.php to wp-config.php, and editing it with the credentials from the old directory.  Then I copied across the theme (very important; otherwise I get a blank page).  I also copied across the plugins files.  Then I went to the roger-pearse.com/wp-admin link, activated the plugins, and then tried the main page.  The ‘upgrade’ process just did not work for me.

Share

Greek words in the first millennium

This post at Vitruvian Design is very timely to a man trying to write some Greek->English translation software.  I can’t comment on it from behind this firewall, so will comment here.

I am delighted to see someone else interested in getting a master list of Greek words and morphologies for the first thousand years.  I must look into this project that is referred to.  The problem, surely, will be patristic Greek; and the answer would be to turn G.W.H.Lampe’s Patristic Lexicon into an XML file, in the same way that Perseus have done for Liddell and Scott.  Someone would have to argue with Oxford, who own the copyright; but for non-commercial use, I expect a license could be negotiated.  Lampe is out of print anyway.

I think that I know why Liddell and Scott give weird accusatives as an extra entry.  The book is designed for manual use, and someone finding an odd word is liable to look for something in that form, rather than the unknown to them base form.  But such things are unnecessary in a digital file, I agree.

Not all of the files mentioned in the post are known to me.  I know that an XML file of L&S exists in the Perseus Hopper, and also in the Diogenes download.  But I’m not clear where to find the “invaluable list” by Peter Heslin resulting from running the Perseus morphologiser over the TLG disk E.  A morphology file greek.morph.xml is part of the Perseus Hopper download.

The issue of mismatches between this and L&S is quite interesting.  I’d like to follow this more.

But one obvious omission is the New Testament.  The morphology list in MorphGNT is also available; and English meanings in the XML file of Strong’s dictionary.  These too need integrating into the project, I would suggest.

All this work is enormously valuable.  The project is also trying to establish something shockingly fundamental; a list of extant Greek literature!

I’m not sure how I feel about this.  I agree that the task should be undertaken — indeed it’s appallingly hard to find out these things, as I found out when I wanted a list of manuscript traditions — , but it seems a digression from the main IT-related task.  They’ve decided to start with poets; again, a minority taste.  I can’t help feeling that this task should be spun off.

The post also introduces me to Epidoc, of which I know little, in the context of converting to and from unicode.  If some way to do this reliably exists, I want it!  More details here.  This is the ‘transcoder’.

All in all, a super post!

Share

Fixed width Greek unicode fonts

I’ve been trying to work with the latest version of Jim Tauber’s MorphGNT text file.  For those who don’t know it, it contains all the words in the Greek New Testament, one per line, each identified as noun/verb/plural/whatever, with the word itself as found in the text, plus the dictionary form of the word.  No English meaning; but that can be got from using the dictionary form to look up the meaning in the XML file of Strong’s dictionary.

The Greek used to be present in beta-code, but Jim has now converted to unicode.  That’s fine; except that you now need a font in which to work on it.  Like most text files, you want a fixed-width font.

I suspect Jim does his magic on linux, where one is available.  But on Windows there is no such free font.  I understand, tho, that the new version of “Courier New” shipped in Vista will do the trick.

I came across this discussion in a typographic forum, where a Microsoft font-person lurks.  It lists some of the possible commercial fonts you could use.

Share

G.W.H.Lampe’s “Patristic Lexicon” – could we get it electronically?

As we get XML versions of Liddell and Scott, etc, we inevitably start to wonder about other standard reference tools, such as Lampe.  A PDF of the raw page images doesn’t really do it, although that is better than carrying a book around.

Of course those as rich and privileged as myself have no problem here.  We just buy a dozen printed copies and place one in each of our homes, plus one in the back of the Rolls. Also, we can get our butler to carry it for us.  But this still leaves rather a lot of other people with a problem.  And… if we had it in electronic form, it would be possible to do interesting things with it.

I found this blog post from somewhere unpronounceable which asked the same question.  And I ask: how do we go about getting an XML version of a copyright text?  One that we can all use in our computer programs?

The book was published in 1961, comprises 1600+ pages, and is published by Oxford University Press who presumably own it.

Could Perseus negotiate some deal?  Could Logos?  How would one do this?

 

Share

Linking electronic Greek words to their English meanings

Ancient Greek is tough for computers, and computer programmers, to work with.  Firstly it’s a dead language, secondly it’s a non-Roman script, and thirdly no-one knows Greek anyway (although a lot of people pretend).

What we need are tools on our computers.  These are appearing, but very slowly.  The problem is the non-availability of data.

Except that data does exist.  For some years the Perseus site has had a very nice electronic edition of Liddell and Scott, and a tool wherein you can put in any Greek word and it will spit out the meaning and the standardised form.  The latter is known as the ‘lemma’, presumably to keep people from understanding. 

Perseus have now made their data available in the Perseus Hopper, which can be downloaded for non-commercial purposes.  Liddell and Scott is in a big XML file. 

Peter Heslin of Durham University has grasped the implications.  Version 3.1 of his Diogenes tool includes this XML file, and another file containing all the possible forms of all the words in the Greek language, their lemma, the part of speech (noun, verb, etc), tense, mood, singular or plural (etc), and most importantly the line number of the full description in the XML file.  This means that you can look up any word, and get a full description; so long as it’s in L&S.  The code is in perl, and is supplied.  Perl tends to be impenetrable, but this is a relatively well-written example.  So if you want to create your own dictionary program, here’s the materials.

But what about post-classical Greek?  Well, there’s the New Testament.  A list of all the words, in order, with part of speech, lemma, etc, was created long ago by James Tauber as MorphGNT.  The site is down at the moment, but the 1Mb text file does exist.

Now this is fine, but useless.  It doesn’t contain the English meaning.  But… Ulrik Sandborg-Petersen has digitised Strong’s dictionary and created an XML file of it.  This contains the Greek Lemma, for all words in the New Testament, plus the English meaning and other bits of info of no present concern.  You can see on his site what the data is, by tapping in his demo example.

MorphGNT also contains the lemma.  So this means that if we join the two together, we get all the possible forms of a word in MorphGNT, and the lemma for them; and the lemma plus the meaning in Strong’s.  Effectively, we now have a dictionary of NT Greek, forms, base form, and meanings.  All we have to do is program it.

What about other, non-classical Greek literature?  Somewhere around is a Septuagint in electronic form, with lemmas.  This can be referenced either against the meaning in Strong’s, or that in Liddell and Scott.  How many words appear in neither?  — I don’t know, but it would be interesting to know.  Mostly names, I would guess. 

Every lemmatized Greek text can now be a source of data to this process of creating as large an electronic Greek dictionary as we like.  And, of course, we need more dictionaries of lemmas-plus-English-meaning.  What others could be done, I wonder? 

I’ve just looked for “lemmatized Greek text” in Google and, among many interesting results, I have found the Lexis site, which claims to be able to help produced lemmatized Greek texts.  It runs on Mac, and I haven’t tried it; but it works with the TLG.  Likewise Hypotyposeis talks about lemmatized searches in TLG.  I think Josephus must be available somewhere in lemmatized form — where?

What I’m not finding is much Patristic Greek, tho.  What we need, clearly, is G.W.H.Lampe’s Patristic Greek Lexicon in XML.  This was published in 1961, so will be in copyright until all of us are dead.  But… couldn’t someone license an electronic version for non-commercial use?   It’s much too expensive for me to buy just at the moment (although a pirate PDF of the page images does exist, I see; apparently pp.1202-3 are missing, tho).

There is much that I don’t know still, tho.  Interesting to see that there is a blog called Coding Humanist.  Is there anyone out there interested in this stuff too?

Share