CoNSSA: Corpus of Novels of the Spanish Silver Age

The corpus contains novels written by Spanish authors published between 1880 and 1939. The original corpus contains in total 358 prose texts, however, due to copyright issues, 219 can be published currently. The corpus is designed considering the data of two authoritative Histories of Literature and each text is annotated with several types of metadata. Further details on the corpus can be found below.

CoNSSA: Corpus of Novels of the Spanish Silver Age

CoNSSA, Text+ and TextGrid Repository

This corpus was already published through GitHub and Zenodo.

As part of the activities of the Text+ consortium within the German National Research Data Infrastructure (NFDI), this new version of the corpus is now available in TextGrid Repository.

In contrast to the version published in GitHub and Zenodo, this new version (2.0.0) also contains:

A better modeling of the FRBR model of the works, editions and texts in the TEI Header
Data from further editions exported from the library catalog K10plus
Each work is now described using library classification systems such as the Regensburger Verbundklassifikation (RVK), the Basic Classification (or Basisklassifikation, BK), and the Göttinger Online-Klassifikation (GOK). By that, we apply to research data the same classification systems that are used for describing primary and secondary literature in library catalogs
References for works and authors to Wikidata, VIAF, and the authority files of the German-speaking area (GND) and the authority files of the Spanish National library (BNE)

Why publish this corpus in TextGrid Repository if it was already available in GitHub and Zenodo?

Here are some reasons, briefly mentioned:

Persistent identifiers for each document
Repository awarded with the CoreTrustSeal
Repository for XML TEI with specific functions (transformation to HTML or plain text, creation of Table of Contents for each text)
Search functions (see next section)
Filtering functions through metadata
Links to GND
Combination with further corpora published in TextGrid Repository
Download options (Shelf)
User-friendly analysis through tools such as Voyant Tools
Automatic annotation with tools from the CLARIN Switchboard
Options for manual annotation
Integration of the corpus in future developments
Further visibility and harvesting options through other portals (re3data, OpenAIRE, CLARIN Virtual Language Observatory)

Searching in TextGrid Repository

Following searches are possible in TextGrid Repository:

Search for words:
- Madrid
- dictador
Further options for searches are available:
- Españ*
- mujeres~, hombres~
Search for authors (with complete name, partial name or GND-ID):
Search for author's gender:
- work.subject.id.value: authorGender AND work.subject.value: female
- work.subject.id.value: authorGender AND work.subject.value: male
Search for year of first publication
- published in: work.dateOfCreation.value:1900
- published after: work.dateOfCreation.value:>1900
- published before: work.dateOfCreation.value:<1900
- published between: work.dateOfCreation.value:>1900 work.dateOfCreation.value:<1910

Of course, these searches can be combined to construct pretty complex queries using information of the author, the edition and the text. For example, the following query should find all texts written by women, published between 1890 and 1900 in which the root Españ appears in the text:

work.subject.id.value: authorGender AND work.subject.value: female AND work.dateOfCreation.value:>1890 AND work.dateOfCreation.value:<1900 AND Españ*

For further information about querying TextGrid Repository, consider the documentation.

Please, take in consideration that all these examples of searches are filtering the results after data type text/xml, which is visible on the left menu of the results' page.

Description of the corpus

A full description of the corpus can be found online in the chapters 3.1 and 3.2 of the following publication (Open Access):

Calvo Tello, José. 2021. The Novel in the Spanish Silver Age: A Digital Analysis of Genre Using Machine Learning. Digital Humanities Research 4. Bielefeld: transcript. https://www.transcript-verlag.de/978-3-8376-5925-2.

Besides, an article written in Spanish about the main characteristics of the corpus is accessible online (Open Access):

Calvo Tello, José. 2021. ‘Corpus de novelas de la Edad de Plata, en XML-TEI’. Signa: Revista de la Asociación Española de Semiótica 30 (0): 83–107. https://doi.org/10.5944/signa.vol30.2021.29299.

History of the corpus

The corpus was composed as a part of the PhD of José Calvo Tello at the University of Würzburg (Germany). It was part of the project Computational Literary Genre Stylistics (CLiGS), led by Prof. Dr. Christof Schöch. The project was located at the Professorship of Prof. Dr. Fotis Jannidis.

The goal of the project was to analyze the Spanish novel and its subgenres (adventure, erotic, realistic novel, etc.) in the so-called Silver Age period (1880-1939).

Current version

Due to the changes mentioned before, the corpus is now in its version 2.0.0. The changes relating to the implementation of the FRBR model in TEI lead to change the location of much metadata in the TEI-Header, which forced updating the xPaths to extract this information. Following the Semantic Versioning, the incompatibility of the previous xPaths leads to a new major version of the corpus, changing its version from 1. to 2.