I’ve got 26 .htm files now, which contain the output from the OCR process. My task now is to go through each, rejoin separated lines, make sure that paragraphs appear at the right places, and add page numbers. I’ve done the first two — some 60 pages. It will be slow.