System aims to automate transcription of historic manuscripts

Valencia, Spain, May 6 (EFE).- Spain's Polytechnic University of Valencia is leading a consortium of European institutions to develop a new system of optical recognition that would allow for rapid machine-aided transcription of historic manuscripts.

The goal, tranScriptorium project coordinator Joan Andreu Sanchez told Efe, is creation of a prototype to prove "how automatic and interactive techniques already in development could be used in a real environment" to transcribe documents dating from as early as the 15th century.

Such work is currently done manually.

The techniques tranScriptorium is exploring are unrelated to the system known as Optical Character Recognition, or OCR.

"For OCR you use segmentation techniques, that is, the characters are isolated and then recognized," Sanchez explained. "But manuscript writing is joined-up and there are no techniques to separate it automatically, so the recognition process cannot work character by character, rather as a totality of characters, words and lines."

The new program also has the capacity to learn from examples and can be applied to any language, he said.

In practice, tranScriptorium researchers have roughly 50 pages of a given 1,000-page manuscript manually transcribed to provide a basis for software models "to provide reasonable results for the rest of the pages, which accelerates the work," Sanchez said.

Once perfected, the software will be made available for free, he said.