Automatic Text Recognition applied to Spanish Golden Age gothic script: creation of an HTR model based on 16th century Spanish Romances of Chivalry on the Transkribus platform

Authors

  • Stefano Bazzaco Università di Verona Author

DOI:

https://doi.org/10.17979/janus.2020.0.09.10398

Keywords:

Automated Character Recognition, OCR, HTR, Spanish Romances of Chivalry, gothic script, Transkribus, READ Project

Abstract

The present investigation centres on the main aspects of massive digitalization of texts and the automated recognition of digitalized images thanks to OCR/HTR softwares. Finally, we present an experiment on HTR recognition dealing with XVI Century Spanish Romances of Chivalry and is delivered a model to transcribe in a semi-automated and collaborative way these texts.

Downloads

Download data is not yet available.

References

Alvermann, Dirk y Bruno Blüggel, “Transkribus at Greifswald. Idea, practice, results, perspective”, ponencia dictada en Transkribus User Conference, 2–3 November 2017, Technical University of Vienna, Vienna <https://readcoop.eu/wp-content/uploads/2017/07/Alvermann_Bluegel_G reifswald.pdf > [07/10/2020].

Bazzaco, Stefano, “El Progetto Mambrino y las tecnologías OCR: estado de la cuestión”, Historias Fingidas, 6 (2018), pp. 257-272.

Bognolo, Anna y Stefano Bazzaco, “Tra Spagna e Italia: per un’edizione digitale del Progetto Mambrino”, eHumanista/IVITRA, 16 (2019), pp. 20-36.

Borghi, Maurizio y Stavroula Karapapa, “Dal cartaceo al ‘digitale di massa’: biblioteche virtuali, diritto d’autore e il caso Google Books”, en Teoria e forme del testo digitale, introducción, edición y notas de Michelangelo Zaccarello, Roma, Carocci Editore, 2019, pp. 95-113.

Buzzetti, Dino y Jerome McGann, “Critical editing in a digital horizon”, en Electronic Textual Editing, eds. Lou Burnard y Katherine O’Brien O’Keeffe, Nueva York, Modern Language Association of America, 2006, 53-73.

Carbonell, Jamie G; Michalski, Ryszard S.; Mitchell, Tom M., “An overview of machine learning”, en Machine Learning: An Artificial Intelligence Approach, eds. Jamie G. Carbonell, Ryszard S. Michalski y Tom M. Mitchell, Berlin-Heidelberg, Springer-Verlag, 2013, pp. 3-23.

Caton, Paul, “On the term ‘text’ in digital humanities”, Literary and Linguistic Computing, n. 28 (2013), pp. 209-220.

Fiormonte, Domenico y Valentina Martiradonna, “La representación digital de la génesis del texto: un caso de estudio”, en En el taller del escritor: génesis textual y edición de textos, Bilbao, Universidad del País Vasco, 2010, pp. 147-176.

Griffin, Clive, Los Cromberger. La historia de una imprenta del siglo XVI en Sevilla y Méjico, Madrid, Eds. Cultura Hispánica, 1991.

Haebler, Conrado, Tipografía Ibérica del siglo XV. Reproducción en facsímile de todos los caracteres tipográficos empleados en España y Portugal hasta el año de 1500, La Haya-Leipsig, a costa de Martinus Nijhoff y Karl W. Hiersemann, 1902.

Italia, Paola, Editing 2000. Per una filologia dei testi digitali, Roma, Salerno Editrice, 2020.

Kichuk, Diana, “Quantità e qualità dei testi online: il problema della digitalizzazione di massa”, en Teoria e forme del testo digitale, introducción, edición y notas de Michelangelo Zaccarello, Roma, Carocci Editore, 2019, pp. 135-166.

Leifert, Gundram et al., “CITlab ARGUS for historical handwritten documents”, 2016, arXiv:1605.08412 <https://www.researchgate.net/publication/269577757_CITlab_ARGUS_for_historical_handwritten_documents> [07/10/2020].

Lucía Megías, José Manuel, Elogio del texto digital. Claves para interpretar el nuevo paradigma, Madrid, Fórcola Ediciones, 2012.

Mancinelli, Tiziana y Elena Pierazzo, Che cos’è un’edizione scientifica digitale, Roma, Carocci Editore, 2020.

Mancinelli, Tiziana, “Early printed edition and OCR techniques: what is the state-of-art? Strategies to be developed from the working-progress Mambrino project work”, Historias Fingidas, 4 (2016), pp. 255-260.

Mühlberger, Günter et al., “Transforming scholarship in the archives through Handwritten Text Recognition. Transkribus as a case study”, Journal of Documentation - Emerald Publishing, vol. 75, n. 5 (2019), pp. 954-976.

Norton, Frederick John, Printing in Spain, 1501-1520, Cambridge, Cambridge University Press, 1966.

Nunberg, Geoffrey, “Google’s Book Search: A disaster for scholars”, en The Chronicle of Higher Education, 31 agosto 2009 <https://www.chronicle. com/article/googles-book-search-a-disaster-for-scholars/> [08/10/2020].

Ogilvie, Brian, “Scientific Archives in the Age of Digitization”, Isis, vol. 107, n. 1 (2016), pp. 77-85.

Orlandi, Tito, Informatica testuale. Teoria e prassi, Roma-Bari, Editori Laterza, 2010.

Priani, Ernesto, “El texto digital y la disyuntiva de las humanidades digitales”, Palabra Clave, n. 18 (2015), pp. 1215-1234.

Reul et al., “State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines”, en Proceedings of the DHd 2019 Digital Humanities: Multimedial & Multimodal, Mainz, 2019 <https://arxiv.org/ftp/arxiv/papers/1810/1810.03436.pdf> [08/10/2020].

Robinson, Peter, “Il contesto ‘collaborativo’ degli studi letterari e la dimensione ‘sociale’ delle edizioni scientifiche”, en Teoria e forme del testo digitale, introducción, edición y notas de Michelangelo Zaccarello, Roma, Carocci Editore, 2019, pp. 115-133.

Roling, Marco, “Does Handwriting Text Recognition work for damaged archives?”, 2020 <https://www.cortsfoundation.org/pdf/RolingMDP_HT R_on_damaged_archives_V20200317-cmp.pdf> [08/10/2020].

Roncaglia, Gino, La quarta rivoluzione. Sei lezioni sul futuro del libro, RomaBari, Laterza, 2009.

Sahle, Patrick, “What is a scholarly digital edition (SDE)?”, Digital Scholarly Editing. Theory, Practice and Future Perspectives, eds. Matthew Driscoll y Elena Pierazzo, Cambridge, Open Book Publishers, 2016, pp. 19-39.

Smith, David A. y Ryan Cordell, A Research Agenda for Historical and Multilingual Optical Character Recognition, NULab – Northeastern University, 2018 <https://repository.library.northeastern.edu/downloads/ neu:m043p093w?datastream_id=content> [08/10/2020].

Springmann, Uwe y Anke Lüdeling, “OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus”, DH Quartely, vol. 11, n. 2 (2017), sin paginación <http://www.digitalhumanities.org/dhq/vol/11/2/000288/000 288.html> [08/10/2020].

Terras, Melissa, “The Rise of Digitization: An Overview”, en Digital Libraries, ed. Rico Rukowski, Olanda, Sense Publishers, 2010, pp. 3-20.

Terras, Melissa, “Cultural Heritage Information: Artefacts and Digitization Technologies”, en Cultural Heritage information: Access and Management, ed. Ian Ruthven y Gobinda Chowdhury, 2015, pp. 63-88.

Thöle, Karen, “Transcribing a highly abbreviated incunable (and some more manuscript sources)”, ponencia dictada en Transkribus User Conference, 2–3 November 2017, Technical University of Vienna, Vienna <https://read.transkribus.eu/wp-ontent/uploads/2017/07/Thoele_Incunable .pdf> [08/10/2020].

Zappulli, Andrea y Sabrina Iorio, “La digitalizzazione dell’Archivio Storico del Banco di Napoli”, DigItalia. Rivista del digitale nei beni culturali, año XIII, n. 2 (2018), pp. 46-51.

Downloads

Published

2020-10-29

Issue

Section

Artículos