Vortrag zur Layoutanalyse von Kulturzeitschriften
Im Rahmen der Tagung "Forschung mit Schriftquellen im digitalen Zeitalter" (22.-23.02.2016, TU Darmstdt) des Projekts eCodicology wird es einen Vortrag zum Thema "Image and Text in Numbers: Layout Analysis of Hispanic Cultural Magazines in Modernity" geben. Der Vortrag präsentiert anhand des Korpus von Revistas culturales 2.0 die Möglichkeiten einer automatisierten Layoutanalyse mithilfe eines Tools, das von eCodicology (Swati Chandna et al.) entwickelt wurde.
Zur weiteren Information hier das Abstract des Vortrags in englischer Sprache:
Nanette Rißler-Pipka (Universität Augsburg)
Image and Text in Numbers: Layout Analysis of Hispanic Cultural Magazines in Modernity
Thanks to the cooperation between the IAI (Iberoamerikanisches Institut, Preußischer Kulturbesitz, Berlin) and ISLA (Institut für Spanien-, Portugal und Lateinamerikastudien) at the University of Augsburg we are able to work with the digital collection of Hispanic cultural magazines provided by the Ibero-American Institute (http://www.revistas-culturales.de/de/digitale_sammlungen). That means, source of the analysis is only image and no text data (until now there is no text provided by OCR, because OCR for periodicals or magazines is particularly difficult to generate in a satisfying quality). The Layout Analysis consists of two sets of questions and two different methods: On the one hand we observe the change in layout in different titles of magazines, in different timeslots and in different subgenres by observing the images manually and by reading the magazines and literature exemplarily. This would correspond to the traditional method also used before we had the magazines digitized. On the other hand we test if the tool "CodiHub" developed by "eCodicology" (Swati Chandna, KIT) can also be used to analyse the layout of cultural magazines automatically. We will discuss the advantages and problems regarding the practical use of the tool. That means also it will be rather a report on a work in progress than the presentation of results, that could already be able to answer the question of layout phenomena of historical magazines in literary and cultural studies. Nevertheless we try to answer the question how useful the automatic layout analysis for our corpus can be and what has to be done to adopt the tool for this field.
The hypothesis established on the basis of manually and hermeneutic observation for some titles of magazines should be proved or disproved by the numbers given as results from "CodiHub". As the corpus consists of titles of magazines that are quite easily to distinguish and represent different genres (magazines of art, theatre, literature or literary chronicles) and the number of titles is manageable, we are confident to establish some hypothesis manually.
The corpus consists currently of 23 titles with a total of approx. 22.202 pages (magazines from: Argentina, Peru, Ecuador, Puerto Rico and Cuba). It is a collection of historical magazines published around the turn of the century (1869-1931). They are like all historical documents menaced by the decay of paper material and through digitization they are at least temporary saved in another form. By the help of "CodiHub" we create for each page metadata which gives us more information about the numerical relation between image and text. These tables of measurements can help us determine various co-relations in the data.