Let’s face it, we all have too many scholarly books. We can’t work without them, and we end up with piles of books, often read only once, and piles of photocopies. When we’re on the road, we can’t access them. And who has not realised, with a sinking feeling, that some most interesting observation is in that pile of data somewhere, but that we cannot quite recall where?
The answer is to convert our books into PDF files. Easy to say, I know. But technology has come on, and what would once have taken forever no longer does.
This afternoon I took three books, each of 200+ pages, and made PDF’s of them all. It took about half an hour each. How did I do it?
First, you need a modern scanner. The old ones groaned slowly as they scanned each page. The modern ones can do a scan in 5 seconds. I was using a Plustek OpticBook 3600, and even that is not bang-up-to-date. It’s far faster than my old one, tho. I controlled it from Abbyy Finereader 8, but really any bit of software would do. I set the scanner to scan grey-scale, at 300 dpi (quite enough to be readable), and adjusted the page-size down from A4 to whatever the book size was, by trial and error. I scanned an opening at a time, without splitting the pages. I set the software to scan multiple pages, so that I didn’t have to hit a key each time (I really didn’t want to hit Ctrl-K 300+ times today!), and I set the interval that the software waits between scans to 5 seconds. And then I went for it.
The result was a bunch of images of the twin pages. These I saved as a PDF. I then passed them through Finereader 9 (which has excellent OCR) to create a PDF with page images and text hidden under the images (because the text won’t be perfectly recognised by the software anyway). This means that the PDF is now searchable, and that I scan search a directory of files for keywords.
I didn’t proof any of the OCR, tho — no time. The idea is not to upload digital text, but merely to allow me a better chance of finding things.
I used Finereader, but probably other software would be better. I noted that the PDF sizes varied alarmingly between 200Mb and 20Mb! So I think Adobe Acrobat would be good for this, from what I have heard.
The end result is that I have three searchable PDF’s which I can stick on a key-drive (flash drive), slip into my pocket and look at anywhere. I can look at them at lunchtime at work, for instance.
Unscrupulous people might be tempted to borrow books from the library, scan those, and save themselves the purchase price. Of course I can’t advocate that you break the law in this way; still less exchange them online, as I hear some people do. But we need to be able to manage our own libraries this way, I think. Paper books have their uses, but scholarly books need this feature, as do their users. We need a change in approach from copyright holders to make it possible.
I admit that my sympathy for the copyright industry is not as high as it might be, since their sympathy for those who use their products seems non-existent. Why else do we have laws that criminalise anyone who makes a personal copy of an out-of-print and unavailable book? Why do we have laws that create copyright for a century, but print-runs of 200, other than to create a dog-in-the-manger? Why else do they campaign to increase the scope and reach of copyright, year upon year, while making it impossible for scholars to access out-of-print and obscure texts and even 1937 obscure theses? (a sore point, this last one, as regular readers will know). But really we need better law, and we need better products from textbook manufacturers.
In the mean time, I hope these notes will help people convert their libraries into a usable form. The key thing to remember is that we are not trying to produce something perfect; just something usable, and produce it quickly.