ARTICLE | doi:10.20944/preprints201905.0231.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Optical Music Recognition; historical document analysis; Medieval manuscripts; neume notation; fully convolutional neural networks
Online: 20 May 2019 (08:45:34 CEST)
Even today, the automatic digitisation of scanned documents in general but especially the automatic optical music recognition (OMR) of historical manuscripts still remain an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval so-called square notation developed in the 11th-12th century, which is already composed of staff lines, staves, clefs, accidentals, and neumes, that are roughly spoken connected single notes. The aim is to develop an algorithm that captures both the neume and pitch, that is melody information that can be used to reconstruct the original writing. Our pipeline is similar to the standard OMR approach and comprises a novel staff line and symbol detection algorithm, based on deep Fully Convolutional Networks (FCN), which perform pixel-based predictions for either staff lines or symbols and their respective types. Then, the staff line detection combines the extracted lines to staves and yields an F1-score of over 99% for both detecting lines and complete staves. For the music symbol detection we choose a novel approach that skips the step to identify neumes and instead directly predicts note components (NCs) and their respective affiliation to a neume. Furthermore, the algorithm detects clefs and accidentals. Our algorithm recognises these symbols with an F1-score of over 96% if the type is ignored and predicts the true symbol sequence of a staff with a diplomatic symbol accuracy rate (dSAR) of about 87%. If only the NCs without their respective connection to a neume, all clefs, and accidentals are of interest the algorithm reaches an harmonic symbol accuracy rate (hSAR) of approximately 90%.
ARTICLE | doi:10.20944/preprints201909.0101.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Optical Character Recognition; Document Analysis; Historical Printings
Online: 9 September 2019 (12:08:16 CEST)
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processing. The drawback of these tools often is their limited applicability by non-technical users like humanist scholars and in particular the combined use of several tools in a workflow. In this paper we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. A comfortable GUI allows error corrections not only in the final output, but already in early stages to minimize error propagations. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. Experiments showed that users with minimal or no experience were able to capture the text of even the earliest printed books with manageable effort and great quality, achieving excellent character error rates (CERs) below 0.5%. The fully automated application on 19th century novels showed that OCR4all can considerably outperform the commercial state-of-the-art tool ABBYY Finereader on moderate layouts if suitably pretrained mixed OCR models are available. The architecture of OCR4all allows the easy integration (or substitution) of newly developed tools for its main components by standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.