Carry your library in your pocket

Let’s face it, we all have too many scholarly books.  We can’t work without them, and we end up with piles of books, often read only once, and piles of photocopies.  When we’re on the road, we can’t access them.  And who has not realised, with a sinking feeling, that some most interesting observation is in that pile of data somewhere, but that we cannot quite recall where?

The answer is to convert our books into PDF files.  Easy to say, I know.  But technology has come on, and what would once have taken forever no longer does.

This afternoon I took three books, each of 200+ pages, and made PDF’s of them all.  It took about half an hour each.  How did I do it?

First, you need a modern scanner.  The old ones groaned slowly as they scanned each page.  The modern ones can do a scan in 5 seconds.  I was using a Plustek OpticBook 3600, and even that is not bang-up-to-date.  It’s far faster than my old one, tho.  I controlled it from Abbyy Finereader 8, but really any bit of software would do.  I set the scanner to scan grey-scale, at 300 dpi (quite enough to be readable), and adjusted the page-size down from A4 to whatever the book size was, by trial and error.  I scanned an opening at a time, without splitting the pages.  I set the software to scan multiple pages, so that I didn’t have to hit a key each time (I really didn’t want to hit Ctrl-K 300+ times today!), and I set the interval that the software waits between scans to 5 seconds.  And then I went for it. 

The result was a bunch of images of the twin pages.  These I saved as a PDF.  I then passed them through Finereader 9 (which has excellent OCR) to create a PDF with page images and text hidden under the images (because the text won’t be perfectly recognised by the software anyway).  This means that the PDF is now searchable, and that I scan search a directory of files for keywords. 

I didn’t proof any of the OCR, tho — no time.  The idea is not to upload digital text, but merely to allow me a better chance of finding things.

I used Finereader, but probably other software would be better.  I noted that the PDF sizes varied alarmingly between 200Mb and 20Mb!  So I think Adobe Acrobat would be good for this, from what I have heard.

The end result is that I have three searchable PDF’s which I can stick on a key-drive (flash drive), slip into my pocket and look at anywhere.  I can look at them at lunchtime at work, for instance.

Unscrupulous people might be tempted to borrow books from the library, scan those, and save themselves the purchase price.  Of course I can’t advocate that you break the law in this way; still less exchange them online, as I hear some people do.  But we need to be able to manage our own libraries this way, I think.  Paper books have their uses, but scholarly books need this feature, as do their users.  We need a change in approach from copyright holders to make it possible.

I admit that my sympathy for the copyright industry is not as high as it might be, since their sympathy for those who use their products seems non-existent.  Why else do we have laws that criminalise anyone who makes a personal copy of an out-of-print and unavailable book?  Why do we have laws that create copyright for a century, but print-runs of 200, other than to create a dog-in-the-manger?  Why else do they campaign to increase the scope and reach of copyright, year upon year, while making it impossible for scholars to access out-of-print and obscure texts and even 1937 obscure theses? (a sore point, this last one, as regular readers will know).  But really we need better law, and we need better products from textbook manufacturers. 

In the mean time, I hope these notes will help people convert their libraries into a usable form.  The key thing to remember is that we are not trying to produce something perfect; just something usable, and produce it quickly.

Share

More fun with a thesis

I’ve already blogged on how Boston College library demand that I get permission from a religious order before they will supply me with a copy for research purposes of a 1937 thesis written by a nun. 

The nun belonged to the Sisters of Mercy, and the library have sent me a link to their website.  So I duly wrote and asked permission.  I got back an email saying that they had no record of any such nun.  The library have sent me a PDF of the first couple of pages of the thesis, which says that she was a member of that order.  So I have forwarded it to the order.

What a pathetic paper-chase!  All over the supposed copyright status of a long forgotten thesis.  It highlights that our copyright laws are now actively working against the interests of scholarship.

Share

Fun with PhD thesis access

Seventy-two years ago a nun submitted a PhD thesis to Boston College in the USA which contains an English translation of the Peristephanon of Prudentius.  The work was never published and is rare.  So I wrote to the college and asked for a copy.

My request was declined.  Apparently it might be in copyright.  Shock! Call the lawyers!  “Do you have permission to see this item, sir?” The librarian demands that I write to this now-deceased nun’s order and ask for permission, before she will make me a copy.  I’ve been chuckling about this all evening.

I mean… I have to ask the Pope (or his representative), before they will send a copy of a 72 year old thesis to a scholar to use for research purposes?  It’s pretty daft, isn’t it.  And if I can find someone with “authority” to allow me to look at this, I shall have to be careful how I ask, in case they wonder if I’m taking the mick. 

Ah, libraries…

Of course it may be that the environment in which the library has to work is more risky than I think.  UK television depicts Americans as people who go around either sueing each other or blowing each other’s heads off on a daily basis.  Obviously it must be true — the TV programmes are mostly made in the USA.    If so, no wonder the library is a bit gun-shy.  No wonder they want to waste my time, and that of the recipient, just in case. 

But I had not realised that gangs of nuns might be so much of a threat to them as that.  Rampaging gangs, equipped with semi-automatics and a hot-line to a law-firm; man, it’s dangerous out there in Boston.

I’ve written back and shifted the onus on them, by asking to whom I should write.  That will cost them something to find out, although not much.  Once this nonsense makes work for them, rather than just me, they may see sense.

Share

Throw the photocopies away

I’m surrounded by photocopies; parts of books, articles, etc.  Filing cabinets, boxes of photocopier paper.  But really, they aren’t convenient.  I can’t carry them around with me.  I don’t look at them often.

Today I ordered a Fujitsu Scansnap S300 document reader.  It’s designed to take bunches of photocopies and turn them into PDF’s.  It’s not really a scanner, as I understand it — it has no TWAIN driver.  It’s portable, mobile, and can be powered from a USB port (although it works better from mains).

I think that I would be better off if my photocopies were in electronic form.  If I can turn the page images into PDF’s, then I can carry them around on a disk.  I can email them to myself, if I need to.  I can read them in the evenings in a hotel, access them at lunchtime in the office, and so on.  And I can get some floor-space back!

Once they’re in PDF form, I can run Abbyy Finereader 9 on them.  That will give a rough output, which will allow me to do electronic searches.  So I can have all the articles that I have, on a portable disk, and just search them when someone asks me a difficult question.

You know; do I really need to buy any more academic books?  After all, we don’t sit down and read them cover to cover, do we?  So… why have paper, if we can convert them to PDF easily and make them searchable in the process?

Share

The machine that can print off a book for you in minutes

The Daily Mail has the story of a bookshop chain that are installing these machines here:

It promises to bring the world of literature to the ordinary book-buyer at the touch of a button.

In the time it takes to brew a cappuccino, this machine can print off any book that is not in stock from a vast computer database.

The innovation, launched by book chain Blackwell yesterday, removes the need to order a hard-to-find novel, or the wait to buy one that has sold out.

Share

UK copyright law ‘abject failure’ for information access

What we can see online tends to depend on copyright laws.  These do vary.  How much they vary has been highlighted by a new report, which evaluated them for fitness for purpose. 

The UK law was a surprise failure, because of some of its unique ‘features’, because it has been allowed to become out of date, and because it has been too influenced by publishing industry lobbying.  Out-Law.com reports:

The UK was the only country to be given an overall ‘F’ score by the report. All the other countries were rated between A and D. “‘A’ to ‘D’ rates how well the country in question observes consumers’ interests in its national copyright law and enforcement practices. ‘F’ is assigned if the country abjectly fails to observe those interests,” said the report. 

“UK copyright law is substantially different from that of other countries,” said the report. “Copyright is treated as property right…and hence copyright owners have the right to decide whether and how the copyrighted work is used.”

“There are no fair use exceptions in UK law, only some limited permitted acts. There is no provision that may be termed “private copying” exception and UK copyright law does not distinguish between private or corporate copyright infringement.  

All of which makes authoring a website in the UK risky for those who live there, and thereby stifles initiative.  The report authors are part government funded.

Share

Corpus Scriptorum Historiae Byzantinae

This collection of 50 volumes contains the Byzantine historical writers. Thanks to Google books these are online, and thanks to Les Cigales éloquentes we can access them. The editions are not always reliable; but they are sometimes all we have.

This list is copied from there:

Authors
Links
Agathias
Dexippus, Eunapius, Petrus Patricius, Priscus, Malchus, Menander, Olympiodoros, Candide, Nonnos, Théophanee, also the panegyrics of Procopius and Priscianus
Ducae, Michaelis Ducae nepotis
Ioannis Cinnamus, Nicephore Bryennos
Ioannis Malalas
Leo Diaconus and various texts on the “Histories”of Nicephorus Phocas and Ionnes Tsimiscis
Nicetas Choniates
Theophylactus Simocatta, Genesius
Michael Glycas
Merobaudes et Corippus
Constantinus Manasses, Ioel, Georgius Acropolita
Zosimus
Ioannis Lydus
Paulus Silentiarus, Georgius Pisida, Nicephore Constantinopolitanus
Theophanus Continuatus, Ioannes Cameniata, Symeon Magister, Georgius Monachus
Georgius Cedrenus
Georgius Phrantzes, Ioannes Cananus, Ioannes Anagnostes
Codinus Curopalates
Ephraemius
Leo Grammaticus , Eusthatios
Laonicus Chalcocondylas
Georgius Codinus
Historia politica et patriarchica constantinopoleos, Epirotica
Michael Attaliota
Constantin Porphyrogenete
Theophanis (with the Ecclesiastical History of Anasatasius Bibliothecarius in volume 2)
Georgius Syncellus
Anne Comnene
Jean Cantacuzene
Chronicon Pascale
Georgius Pachymeres
Nicephorus Gregoras
Procopius
Zonaras

All of these are in Google books, apart from volume 3 of Zonaras which is at Archive.org

Share

Give it away and sell more

An interesting post by Charles Jones at AWOL.  Apparently the Chicago Oriental Institute have found that, now that they give away online electronic copies of their obscure, specialist-only, publications, they are selling more of their print backlist.  Sales are up by 7%.

Not everyone would have predicted this, including me.  Some market research is needed to determine why.  But in the mean time, I can offer a wild guess at no charge.  Probably the increase is from people who simply never knew the publication existed, or that they needed it.

Interesting as another part of the march towards the new era.

Share

Academic books are doomed

Ever wanted to consult a text or translation of an ancient author in volume of the Sources Chrétiennes and then realised that the library is closed, or doesn’t have it?  Or to look up an author in the Clavis Patrum Graecorum?  It’s a pain, isn’t it? 

I have here a volume of Isidore of Pelusium’s letters, and I’ve just had to walk down to the library and renew the loan this morning.  That was a pain.  And I’ll still have to return it, to lose access to it, in due course.  I can’t afford to buy a copy, not with the recession and all. 

But I have a scanner; why don’t I just copy the pages I want?  Hey, why don’t I just scan the whole thing and make a PDF which I can keep forever? (In my case, I actually just don’t have time; but work with me on this a bit, hmm?) 

Those thoughts must occur to an awful lot of people.  They must occur to every student.  They must occur even more to every post-graduate, or young PhD.  All of them have no money, and lots of need for the book, and they have the means to do something about it.

I’ve gradually become aware that people are making PDF’s of these copyright but unobtainable books.  More, that little networks exist whereby people swap them around.  We’re all aware that this happens with music, and how upset it makes the big recording companies.  But music mp3’s are a luxury.  Access to a complete collection of the Sources Chretiennes, whenever you want, wherever you are?  That’s essential, for many people.

At the moment, the only people buying these books are the major libraries.  This is natural.  But the question is, why bother to buy them, why bother to have libraries other than as museums, when in fact the books are being pirated to PDF?  The only reason is so that those who don’t have the right contacts, who don’t know the right bootlegger, can still access the text.  Well, I myself am such a person.  But I don’t suppose for a moment — recalling my own student days, and illegal music swapping — that people at college are using them.  Most of them must be accumulating huge collections of books, reference books, articles, lexica, in PDF form.

If this is how people want their information, is there any point in taking a PDF, sending it to a publisher, having it typeset and printed, sending out copies to libraries, borrowing the paper copies, scanning it back in again, and OCR’ing it, and storing it on your hard disk?  Why do this?  Why not just sell the PDF?

It’s over.  The whole process of publishing an edition, translation, study — still more a handbook or patrology — is finished.  The whole business of having a library is finished too — why bother?  Just ask around, see if anyone has a PDF.

This must be how things are now.  Every year, this will get more so.  Why should it not?  It’s easy convenient, and superior in almost every respect for the user.  Why pay to produce things that are inconvenient?

There are a couple of teething problems with this model of book circulation.  For instance, some books can’t be read onscreen.  You really do need a printed copy of (e.g.) Fabricius, as I remarked earlier this week, to master it.  The PDF’s that I have seen aren’t of good enough quality to send to a print-on-demand service.  But I imagine this is the next step.  People will make sure they scan b/w PDF’s at 400 dpi.  Give it a couple of years.

The next step must be to start supplying books in electronic-only form. One problem is that the editorial process of producing a book markedly enhances the quality of the content.  This is true for novels as well as textbooks — I have seen early drafts of books, prior to a professional editor working on them, and the difference is amazing.  If this is cut out of the loop, something must replace it; and so far there is nothing.  The mechanisms of modern publishing are not just an overhead; we all benefit from some of them.

Finally authors need to publish books in order to get jobs.  A mechanism to replace this is needed, and dead-tree printing will continue until this is solved.  But the printers will find sales dropping, as occasional sales to scholars pretty much cease.  Probably this will make little difference, as they mainly sell to libraries.  But their clock will be ticking.  The financial viability of the old model is draining away.  Stupid publishers will try to pass laws to stop all this.  It won’t work, of course, because the incentive to pass around books in PDF is so enormous.  At most it might retard scholarship in some areas and some countries.

So I think that this chicken must be dead. It just hasn’t realised it yet. 

Share