CoNSSA: Corpus of Novels of the Spanish Silver Age

CoNSSA: Corpus of Novels of the Spanish Silver Age

The corpus contains novels written by Spanish authors published between 1880 and 1939. The original corpus contains in total 358 prose texts, however, due to copyright issues, 219 can be published currently. The corpus is designed considering the data of two authoritative Histories of Literature and each text is annotated with several types of metadata. Further details on the corpus can be found below. [more...]

Aggregation 1–2 of 2

CoNSSA: Corpus of Novels of the Spanish Silver Age

CoNSSA, Text+ and TextGrid Repository

This corpus was already published through GitHub and Zenodo.

As part of the activities of the Text+ consortium within the German National Research Data Infrastructure (NFDI), this new version of the corpus is now available in TextGrid Repository.

In contrast to the version published in GitHub and Zenodo, this new version (2.0.0) also contains:

  1. A better modeling of the FRBR model of the works, editions and texts in the TEI Header
  2. Data from further editions exported from the library catalog K10plus
  3. Each work is now described using library classification systems such as the Regensburger Verbundklassifikation (RVK), the Basic Classification (or Basisklassifikation, BK), and the Göttinger Online-Klassifikation (GOK). By that, we apply to research data the same classification systems that are used for describing primary and secondary literature in library catalogs
  4. References for works and authors to Wikidata, VIAF, and the authority files of the German-speaking area (GND) and the authority files of the Spanish National library (BNE)

Why publish this corpus in TextGrid Repository if it was already available in GitHub and Zenodo?

Here are some reasons, briefly mentioned:

  1. Persistent identifiers for each document
  2. Repository awarded with the CoreTrustSeal
  3. Repository for XML TEI with specific functions (transformation to HTML or plain text, creation of Table of Contents for each text)
  4. Search functions (see next section)
  5. Filtering functions through metadata
  6. Links to GND
  7. Combination with further corpora published in TextGrid Repository
  8. Download options (Shelf)
  9. User-friendly analysis through tools such as Voyant Tools
  10. Automatic annotation with tools from the CLARIN Switchboard
  11. Options for manual annotation
  12. Integration of the corpus in future developments
  13. Further visibility and harvesting options through other portals (re3data, OpenAIRE, CLARIN Virtual Language Observatory)

Searching in TextGrid Repository

Following searches are possible in TextGrid Repository:

Of course, these searches can be combined to construct pretty complex queries using information of the author, the edition and the text. For example, the following query should find all texts written by women, published between 1890 and 1900 in which the root Españ appears in the text:

For further information about querying TextGrid Repository, consider the documentation.

Please, take in consideration that all these examples of searches are filtering the results after data type text/xml, which is visible on the left menu of the results' page.

Description of the corpus

A full description of the corpus can be found online in the chapters 3.1 and 3.2 of the following publication (Open Access):

Besides, an article written in Spanish about the main characteristics of the corpus is accessible online (Open Access):

History of the corpus

The corpus was composed as a part of the PhD of José Calvo Tello at the University of Würzburg (Germany). It was part of the project Computational Literary Genre Stylistics (CLiGS), led by Prof. Dr. Christof Schöch. The project was located at the Professorship of Prof. Dr. Fotis Jannidis.

The goal of the project was to analyze the Spanish novel and its subgenres (adventure, erotic, realistic novel, etc.) in the so-called Silver Age period (1880-1939).

Current version

Due to the changes mentioned before, the corpus is now in its version 2.0.0. The changes relating to the implementation of the FRBR model in TEI lead to change the location of much metadata in the TEI-Header, which forced updating the xPaths to extract this information. Following the Semantic Versioning, the incompatibility of the previous xPaths leads to a new major version of the corpus, changing its version from 1. to 2.