Earlier this evening I was working away on a translation of one of the medieval St Nicholas miracle stories, BHL 6177, the “Miracles at Angers”. I was using Google translate on the Latin text, which was producing some very decent quality outputs. Then I was checking and correcting it. It did indeed need correcting, but it’s mostly right and far faster than sitting there doing it manually.
A little while later I had a play with Bard, Google’s AI product. (It would have been ChatGPT but this demanded an alarming amount of personal information.) I asked the prompt if it could translate ancient Greek, and it said that it could, including whole documents; but it might not be perfect just yet, but was improving.
So I tried it out with a chunk from the Greek Anthology, which it rendered into English; but I realised afterwards that this probably just retrieved a pre-existing translation. Then I went to Ulrich Harsch’s useful site – which Google downvotes because he hasn’t implemented the latest https – and found a page on George of Pisidia here. This began:
Γεώργιος Πισίδης, διάκονος καὶ ποιητὴς ἐν Κωνσταντινουπόλει, ἐγένετο περὶ τὸ ἔτος 580, ἐτελεύτησε περὶ τὸ ἔτος 634, εὑρετὴς τῶν στίχων δωδεκασυλλάβων.
So I put in “translate the following ancient greek into English:” and then the Greek. The result was:
George of Pisidia, deacon and poet in Constantinople, was born around the year 580, died around the year 634, inventor of the twelve-syllable lines.
That’s very reasonable. I then asked it a second time whether it could translate ancient Greek. This time I got a rather different reply, and much more bullish:
Yes, I can translate ancient Greek. I have been trained on a massive dataset of text and code, including ancient Greek texts. I can translate ancient Greek into English, French, German, Spanish, Russian, Chinese, and many other languages. I can also translate ancient Greek into modern Greek.
I am still under development, but I am able to translate ancient Greek with a high degree of accuracy. I can translate simple sentences, as well as more complex passages. I can also translate ancient Greek poetry and literature.
If you have an ancient Greek text that you would like me to translate, please feel free to share it with me. I will do my best to provide you with an accurate and informative translation.
But …. I just pasted in a chunk from Arethas of Caesarea on Romans here – I’m pretty sure that there’s no English translation of that! – and it just echoed the text. So… in fact it’s not there yet. But it’s food for thought.
But how many of us even understand what these things are doing? A web search was clear enough – just collect URL’s and make a list of words and topics.
But AI? Well, as I understand it, these “generative AI” products are basically a chat-bot on the front of a “large language model” (LLM). An LLM is a “next word predictive engine”, apparently. So basically some code for pattern recognition on the front of a search engine; indeed pre-loaded with a lot of text to search from the web. The AI generative image makers do much the same, apparently.
The amount of hype and exaggeration around “AI” is staggering, as it is with every new fad, but it is not magic. In IT “it” is never magic. If you think “it” is magic, then you don’t understand “it”. Everything is ones and zeros and lumps of metal and silicon. The rest is attempts to sell stuff.
Now I don’t fully understand it. But it did set me to wondering about whether I am wasting my time. For it wouldn’t be the first time that technology has rendered my work useless.
When I came online originally, I was scanning existing English translations of ancient texts and putting these online. Bandwidth was low, and text-only pages were the only way to get stuff online. I did so for some years, until the technology rendered it pointless. Bandwidth became enormous, so file size didn’t matter. The PDF arrived, with exact images of the book pages. OCR improved, so the PDF was searchable. Google Books came along, with every book under the sun prior to 1923, all freely downloadable. I haven’t done any more since then. There’s no point. I don’t regret doing it, but … in a way it was wasted effort.
Since then I’ve concentrated on texts for which no translation exists. At one time I commissioned these. Now that I am retired, I sit here and make my own.
But again the technology is taking this away. Is there any point in an amateur like myself labouring over a Latin text, with my limited Latin, to produce an awkward translation if Google Translate can do it in an instant, and be pretty nearly “good enough”?
Prior to January 2022, the question was academic. Google Translate was rubbish for Latin. And then, suddenly, it wasn’t. In fact it could make sense of sentences better than I could. I can polish the result, and correct minor errors, and do something worthwhile; but basically it is doing the job. Furthermore, it is quite likely to improve further.
So I’m getting this feeling of dejà vu. Is there any point? I’m not sure, and I’m not going to stop right now, if only because I’m still enjoying it. But it is food for thought.
The world-wide web is a very different place from what it was. One horrible aspect of the new craze for “AI” is that, for the first time, the products are commercial. You have to pay to use them.
This is a novelty, and an unwelcome one. It marks a big shift from the free, open internet that we have had until now. Bye-bye the internet to which I contributed, where it was expected to be free. No longer. Worse yet, those who contributed freely find their work turned against them.
I saw this evening a report that StackOverflow, the computer programmers’ forum site, has lost 50% of its traffic. The bots hoovered up all the replies to technical questions, shared freely by ordinary people, out of the kindness of their hearts, and embedded them in new tools like GitHub Copilot. This, needless to say, is a commercial product. And it’s killing the original site.
Will the internet change, until we have to pay for everything, via a million subscriptions? It is beginning to look like it.
The new AI is also biased in various directions, probably for commercial reasons, certainly in line with horrible American politics, but also simply in selecting what some corporation wants us to see. That corporation wants us to see “important stuff”. They decide what is important.
For instance, if you type into Google search “Who is Roger Pearse”, you get some rubbish at the top about some “Roger Pearce”, selected by Google; but then you get stuff from my blog, and material by me. My name is not common, and I write on a specialised subject, and have done so for 24 years. In a fair and level internet, I would naturally appear.
The same query in Bard AI produces “I’m designed solely to process and generate text, so I’m unable to assist you with that.” Which is not too bad, except that, if I repeat this for public figures, like “Joe Biden”, I get an article back. A source is given, which is – of course – Wikipedia.
Indeed if I ask “Who was Petrus Crabbe”, a very obscure figure, it begins with the text mainly from the Wikipedia article. I myself wrote this article, in a moment of madness, so I know just what is on the web about him. Bard AI is using Wikipedia plus one other source linked from it. No doubt ChatGPT is doing the same. But I don’t think that ChatGPT intends to send any money to me in return for my generous efforts.
So AI is only returning “important people”. In this case it is defined as people for whom there is a Wikipedia article. I do not have an article about me in that toxic hell-site, nor do I wish to. Of course if you asked someone in the national television industry who Roger Pearse is, they would have no idea. But… the practical effect of the coding around AI is to reduce the information, to only “approved sources”, to only “important people”.
Yet originally the web was a levelling phenomenon. That was part of the charm. Anybody could start a website. Anyone could start a search engine. You rose or fell on merit.
And now? Well, what we see in AI is what someone in a major corporation chooses that we should see. Little people don’t matter.
I don’t see any reason immediately to change what I am doing. But it is, as I said, food for thought.
We are not sleep-walking into the future of nightmares : we are running into it with our eyes wide-open. Technology cannot be dis-invented; and no law passed will prevent its use. Sin will ensure that we are more and more unfree.
The technology problems reflect the problems in US society today.
I read the title, and the snarky little voice in the back of my head said “Oh, totally a waste of time, FAR more effective to HAVE TO RE-INVENT EVERYTHING before we can start working with it in some way!”
That’s besides that translation is very difficult, literal vs figurative and then there’s poetry and allusions and– goodness.
Translating is, in itself, a form of gaining deeper understanding and expanding on the work.
A few thoughts, as a 25-yo data scientist in the industry:
Firstly, I think it’s premature to negatively assess the prospects of free and open LLMs, for a few reasons:
1. Digital Piracy is notoriously difficult to enforce. Meta had their super-popular LLaMA model get leaked entirely (although perhaps this was deliberate). The methodologies to train these models are probably only about 1-2 years ahead of publically available academic research.
2. There are already a number of successful and popular free and open LLMs, see the GPT4All project which compiles a number of them, as well as HuggingFace.
3. There are already services undercutting the big players e.g. GitHub Copilot has TabNine and Codium.
4. The sentiment among the big tech companies themselves seems to be basically that the cat is out of the bag and that they won’t be able to monopolize this technology. For example, earlier this year there was this leaked Google memo (make of that what you will):
https://www.semianalysis.com/p/google-we-have-no-moat-and-neither
5. The people who understand this tech the best are “hackers” at heart. There is a strong vested interest in keeping things free and open. It will be a fight for sure, but I think there’s a lot of reason to have hope.
In my view, the biggest cause for concern is the potential for these big tech companies convince legislators to give them a de facto monopoly via regulation. In practice, I don’t think this would be successful internationally and in the long run is unlikely to actually stop people using these models, they’re simply too useful.
In terms of StackOverflow, it’s worth noting that they are currently being subject to a major moderation strike which has been ongoing for nearly two months, mainly over the issue of AI-generated content being allowed on the site:
https://meta.stackexchange.com/questions/389811/moderation-strike-stack-overflow-inc-cannot-consistently-ignore-mistreat-an
Also, it’s worth noting that all content posted on there is released under the Creative Commons license.
I think that algorithmic translation of ancient texts is unlikely to be reliable enough to destroy the value in doing a translation (semi-)manually. I think you’re already yourself using these tools effectively (e.g. Google Translate) to speed up the process without prejudicing the final translation. It seems to me that there’s too much in the translation process that involves things outside of the text itself – databases, comparison of different manuscripts, even palaeography. Sure, 90% of the process might be able to be sped up, but it’s that last 10% that requires a lot of judgement and case-by-case handling.
Just my two cents as a long-time reader of your blog, and as someone who admires your work and wants you to continue. I don’t think what you do is in vain at all.
When I started training as a translator a long time ago there were a few principles that experienced translators shared with us students. One was: “You can’t translate what you don’t understand.” If you couldn’t make sense of a sentence at the most basic grammatical level you couldn’t produce a translation, because you couldn’t guarantee your client that the translation was correct. This didn’t mean that you had to fully understand the text as an expert would (not that that doesn’t help); it meant that you could make sense of something like “Tunneling is a consequence of the wave nature of matter, where the quantum wave function describes the state of a particle or other physical system, and wave equations such as the Schrödinger equation describe their behavior” even if you weren’t a quantum physicist, and, using the necessary tools, translate it into another language.
Now, statistical translation does away with this completely. I’m not a programmer, but I do know that nowhere in the vast machinery at the back of LLM’s is there any bit of code that models “understanding”, nor any prospect of doing that some day. The way it “improves” is by enlarging the databases, increasing the processing capacity and making the calculations more complex, but the parameter by which the quality of the output is measured is not “correct/incorrect” but “more/less likely”. The first requires understanding, the second only statistics. There’s nothing wrong with enhancing that of course, but expecting “likely” to somehow become “correct” (i.e., that the understanding necessary to pass judgment on the correctness of a translation just happens along the way) seems like feeding the machine apples in the hope that, if you put in enough apples, someday it will produce oranges. Hope is a wonderful thing, but I’m quite sure that’s not how you develop AI’s.
This is proved by the fact that a statistical translation engine can never be assertive regarding the quality of a result the way a real translator can, because it doesn’t perform the elementary task of reading the original and the translation, getting the meaning of each, and seeing that they match. If you challenge a translation (or any other output) saying that yes, that is statistically the most likely answer, but this happens to be the exception, the AI can’t argue back, because it literally uses no arguments, that is, no reasoning based on an understanding of the text. But what you want from a translator is precisely the capacity to confidently argue and convince you that one translation is right and the other is wrong.
Another principle I was taught was that “95% of the text takes 5% of your time, and 5% of it the rest.” Meaning that progress is not linear: most of the time the translation goes like a breeze, but every few lines you get stuck in a word or phrase that needs research, consulting sources, and a lot of pondering. You can produce a first draft of a translation in a relatively short time, but solving all the little marks you left to check later will take much longer, at least if you want a certain degree of quality in your translation. Well, in my own experience what automatic translation does is equivalent to the 95% text / 5% time part, without the marks.
(I realize all of this refers to the production of professional translations, not to using AI’s to help you understand a text written in a language you’re not fluent enough in.)
I apologize for the essay. What I wanted to say was: No, it’s not a waste of time to make translations of ancient texts. If anything it’s a good use of your time, in these crazy days more than ever.
Thank you @Diego. These are thoughtful and interesting points. The errors that I see are precisely of the kind that you suggest – subject and object get switched, for instance.
Hi @Nathan, I appreciate your take on this, as someone who clearly knows more about this than I do. Being retired I don’t keep up with tech news as much as I did.
I wasn’t aware of that issue with mods at StackOverflow, so that of course would distort things. Being free, StackOverflow ought to be able to see off a paid-for service, other things being equal. That is encouraging.
I don’t think that the big companies will be able to arrange a monopoly at the moment. There’s no obvious reason for any government to favour one over another.
It is good to hear of open-access initiatives!
I’m the same. I refuse to use Chat GPT because of the alarming amount of personal info it wants from you. I do laugh at the Bing chat bot which runs each time you do a Bing search. It shows you what keywords it extracts from your query to search, then points to the top three sites (occasionally more) it uses to formulate an answer. Invariably wikipedia ends up there. Why does one bother?
Language translation – that’s big business in Europe I believe so auto translation would be a money spinner, no? As for ancient languages like Greek, I wonder if, like some old lexicons, AI just picks up glosses from the Latin translations in PG (which sometimes don’t match the Greek).
Your work has not been a waste, Roger. You have helped so many people especially back in the day when so little was available online. This means knowledge advanced faster than it would have advanced otherwise and perhaps even created a level of interest in this material to warrant its digitization or preservation by other means.
I still have and use your Gospel Solutions book and I still think it was an astonishing and brilliant effort, more so now that I have been involved with a few books myself behind the scenes and know how hard it is to pull such a volume together.
I am very grateful for all the help you have given me over the years.
Thank you very much indeed, IG. Yes, I think we have to peer under the hood a bit and enquire just where this stuff is coming from. Wikipedia, largely.
I wonder if Bard still thinks, like Google Translate, that “homozygous” means “same-sex marriage”?
A lot of programs are 95 percent helpful, and then the last 5 percent stays terrible forever. Humans either live with terrible, or spend a lot of time learning to recognize and correct the program’s handling of edge cases and things not obvious to a computer.
Oh, and computer translation is not programmed to call attention to references. Which makes it not terribly useful for translating the Fathers. It is still a first pass.
Good points – thank you.
I’m finding that I really have to check every word with Google translate, even tho it is superficially fine.
As someone who is fluent in 4 languages (Greek, French, English, Portuguese) I can tell you that AI translations will not render translating a waste of time. A translation is a product of its time and the example I would give would be Jules Verve translations into Greek. Jules Verne is not the best French author of his time, but he has had duration. As a result you can actually track the evolution of what is accepted written Greek language based on his translations. The first translations were a very archaic Katharevousa, eventually more of the demotic entered and what you will buy today is in Demotic Greek. To give an analogy, if you were to put in an untranslated ancient text, you wouldn’t want Jacobean English to come out, you would want an actual modern vernacular. I am pretty sure that over times there can be tweaks that are supposed to fix these sort of issues but fundamentally computers do not actually understand what they are translating. This can be very much an issue with allegoric texts where the analogies and passages can get lost. Finally several fundamental expectations really do not pass translation well when moving between languages. In English government is something fundamentally bad to the point of something being from the government is used to fight some product or notion. In Greek or French though government is generally good, private sector thing is bad. This is the sort of stuff that it will take a long while for AI to understand and there will always be people there. Comical Ali/Baghdad Bob sounded very funny/deranged in English translation, but in the original Arabic he sounded quite refined.
Thank you. That’s very interesting!