Summary of Digital Humanities Research

« cette actualité évolue aujourd’hui rapidement, et que la juxtaposition d’entreprises pour l’heure assez éclatées en matière de mise à disposition des textes médiévaux sur support électronique […] amènera sans aucun doute à repenser les stratégies d’édition. »

VIEILLARD, F. y Olivier GUYOTJEANNIN , O. Coord. (2001) Conseils pour l'édition des textes médiévaux, París, École nationale des chartes, Groupe de recherches La civilisation de l'écrit au Moyen âge, 1 vol. p.10

«There can be no doubt that electronic publication offers immense opportunities for radical changes in the way historical records are presented to the user»

HARVEY, P.D.A. (2001) Editing historical records, London, British Library, p.4

The core of the DH research on the project centred on digital editing and digital publishing, in particular exploring how the editorial process has been affected by digital transformations, how we are able to capture digitally both textual sources and metatextual/paratextual entities (people, places, subjects, events …) and how these facilitate new forms of carrying out and publishing research.

Modelling the rolls

Modelling has been a mainstay of research in the digital humanities, and the practical and intellectual tensions between representation and object represented have driven much scholarship in the field. One of the central aims of the project was to create a model which could be used for a variety of purposes: representing information contained in the source texts, facilitating its structured analysis and display and allowing it to be published in both print and digital form, before being archived for potential future re-use.

The encoding model for editing the rolls, developed by Arianna Ciula, Elena Pierazzo and Eleonora Litta in collaboration with the historical team aimed to capture the following information contained in the Gascon Rolls:

  • Physical structure of corpus: rolls, membranes
  • Logical structureof corpus: entries and sub-entries
    • entries recorded within a given regnal year but belonging to another
    • entries extending over two membranes
    • entries with a title spread over multiple entries
    • Structure of entries: entry ID, opener (date, marginalia), main entry, closer
    • Other structural information:
      • English and French regnal years
      • document typology
      • documents contained within an entry (inspeximus),
      • ordinances
      • agreements in a list
      • full/partial duplication
      • dating information

It is not our aim to explain the technical basis in great detail here, but this screenshot from an early presentation gives a sense of what is going on – the text, which is marked up in XML according to the Text Encoding Initiative (TEI), encodes different sections of the documents, for later publication and analysis. (The colours used are not present in the actual encoding, but are used to facilitate understanding of what is going on).

Image created by project team, showing simplified view of TEI encoding

By encoding fragments of text in this way, we are able to process them in different ways, for example displaying the opener in a particular way, making it possible to search for text in particular fields, or constructing indices. This is a simplified view, and in the fully edited texts there are also ‘tags’ for many other features, including people or places, among many other aspects. So, by using TEI markup on names, such as <persName>John de Stonor</persName>, for example, we are able to query complex relationships between people, places and the documents they are mentioned in.

Modelling entities in the rolls

The TEI guidelines provide a complex set of encoding scenarios for a wide variety of humanities research questions, but in practice, we found it necessary to combine this approach with a second modelling framework which allowed us to both capture the different entities (people, places and subjects) contained in the rolls, and to model their relationships – for example, bringing together multiple mentions of the same person (which may not always use the same label to describe that person, e.g.  ‘Henry’, ‘The King’, ‘the son of ‘) or making assertions about the familial relationships or roles of different people.

The Henry III Fine Rolls and Gascon Rolls projects took slightly different approaches to this, but the objective was the same – to capture both information about the rolls and the entities contained within them, which would feed people and place indices and allow for more complex analysis which would be impossible in print publication. The Gascon Rolls project specifically used an open-source tool called EATS for this task, developed by Jamie Norrish: https://github.com/ajenhl. One of the key reasons for using EATS was that we wished to make the two editing processes as streamlined as possible (ideally part of the same workflow), and to enable complex structured markup while hiding unnecessary complexity in order to allow historians to focus on editing.

In practice, this meant that researchers were able to edit one of the Gascon Rolls, and when they came across an entity, e.g. a person, they could click on a button within the XML editor (the project used the Oxygen XML editor), which would then fire up a call to the database of entities mentioned in the whole corpus. If the person already existed within the database, the editor could simply click on that name, allowing the connection to be made automatically, and if not, the editor could create a new record, without leaving the XML editor.

In terms of the final publication, one of the many advantages of this approach was that we were able to use the entity information to provide faceted browsing as part of the search function. So, if you type ‘Paris’ into the main search box, and then type a letter into the ‘Person’ facet, you are given a list of people whose names begin with that letter, and you can then provide more focused searching. http://www.gasconrolls.org/en/search/#q=document_type%3Acalendar

Another outcome is that the entity pages show relationships between documents, people, places, and even between entities themselves. See, for example the entry for London (city): http://www.gasconrolls.org/en/indexes/entity-000928.html.

 

Another wider advantage of using EATS as a platform is that data can easily be exported into other formats as desired, such as RDF/OWL and Topic Maps serialisations. The data about entities is referenced by XML editions of the Gascon Rolls source texts, but is in a sense independent, and so could easily be re-used by other projects without affecting Gascon Rolls project content.

Publishing

The project’s digital publication was largely based on the xMod platform designed by Paul Spence and Paul Vetch, and later re-developed by Miguel Vieira and Jamie Norrish as Kiln: http://kcl-ddh.github.io/kiln/. The multi-platform framework integrates numerous tools which provide core functionality for the site, and allows for complex publications to be developed using standards-compliant XML-encoded source content. In addition to providing most of the site functionality (calendar and transcription view, indices and search functionality), the publishing framework also produces RDF data (a core component of Linked Data, about which more later), and in theory facilitates PDF and EPUB generation (although this was not developed on this project).

From a broader historical perspective, the starting point for the project was to replicate the earlier print publications with search functionality as an additional requirement, but over time a transition in the site  brought digital-first aspects of publishing to the fore. Given how far the publishing landscape has changed in some respects, if we were starting the project for the first time now, we would undoubtedly design some things differently, but the experimental visualisations – some of which appear on the live site – give some idea of how historical and digital humanities researchers on the project have attempted to re-envision the display of historical material of this kind.

Visualisations

Using the datasets available in Sesame and Solr, two technologies used by the project, various data visualisations have been explored.

Digital humanities scholars such as Johanna Drucker (2011a and 2011b) have argued that we need a greater, and more critical, understanding of digital visualisation techniques, which have been in vogue in the last decade, and which have significantly altered the way that information is presented in the commercial and media worlds. In the wider humanities, there is still a deficit in terms of how to integrate these into scholarship and publishing. Having acknowledged that, there is nevertheless much potential in using digital visualisation to present information in new ways, and the project produced a number of experimental displays in attempts to harness the potential of structuring complex data in this way. We invite critical reflection on the results (most of which were initially produced by Emma Tonkin), and on how to integrate them into future historical criticism in ways which align with established research practices.

We would like to emphasise the experimental nature of these visualisations, some of which do not use the final state of the content. We believe that they stand as useful thought-pieces on how to present historical content of this nature, but advise caution in extracting lessons about the content itself.

Gascon Rolls Co-occurrence matrix

One experiment involved visualising character co-occurrence in the Gascon Rolls, displaying the density of co-occurrence according to three different criteria: by cluster, frequency or name. In the following screenshot, these are sorted by cluster; this identifies groups of entities that frequently appear together.

Figure 1: A subset of entities in the Gascon Rolls, sorted by cluster

http://www.gasconrolls.org/en/d3-test2/

Temporal network of characters

Another experiment produced a ‘dramatis personae of the Gascon Rolls’ in Vimeo https://vimeo.com/106275332. This visualisation may give some indication of the relative importance and interrelationship of various characters in a subset of project content over a period of time.

Classically, social networks model the social structures that link individuals, such as trust or leadership relationships, co-occurrence at events or collaboration. These graphs usually aggregate data over a certain time window during which data collection occurs. However, social networks are dynamic. Links between individuals decay or are broken and others are formed. The use of time-varying graphs (temporal networks) allows us to explore evolution in the social network over time (Santoro et al. 2011).

The Gascon Rolls covers a significant time period (once complete, well over a hundred years). Changes that occur over time should therefore be more visible in the Gascon Rolls dataset than in more temporally focused datasets. Although the dataset used for this experiment was incomplete, there is enough material available to explore contemporary methods of time-based data visualisation. We developed a prototype in Python and R, using material drawn from the Sesame endpoint, to visualise the unfolding of the Gascon Rolls as a temporal graph.

Bubble view of people mentioned per roll

This view visualises people by roll, demonstrating the relative frequency of mentions of that person in the period covered by the roll. Clicking on each person entity brings up detail about how many times they are mentioned and canonical versions of the name given.

This was built in D3 on the basis of Solr data to evaluate the information held in each calendar, enabling a visual overview of the entity references in each. As can be seen in the table below, such an overview can provide striking evidence of the variation in entity reference counts and relative frequency. Similar visualisation was also performed on various other facets indexed by Solr, such as original name, person mention, place mention and entity key type (i.e., broadly, the sorts of subject covered by each calendar). This should be considered a proof-of-concept rather than a completed interface and was performed in part to explore the nature of the data held within the index.

This visualisation is built on the basis of circular packing. The packed circle chart has the benefit that it results in visually attractive layouts and patterns. However, a corresponding limitation is that it may be more difficult to accurately compare the area of circles, meaning that, if it is necessary within a given use case for this visualisation for the user to compare exact counts of entity references, this visualisation can usefully be supplemented with an indicator providing item count – perhaps provided as a tooltip or as a result of a mouse click.

http://www.gasconrolls.org/en/research-tools/people/

Visualisation of fragmentary family trees

 

Fragmentary family trees were visualised as part of the prototype above. An example is shown in the figure below.  The data provided us with some challenges regarding the visualisation of family trees due to its incomplete and at times ambiguous nature. This figure represents an attempt at an ‘informal’ representation, avoiding explicit handling of the unknowns, hence the use of non-traditional visual elements, such as curved connecting lines.

Due to the complete RDF indexing it should now become possible to build more extensive family tree diagrams, although it is suggested that the inclusion of externally sourced linked data and the use of data linking techniques should improve the overall outcome of such activities.

Family trees generated using this method are placed in https://github.com/etonkin/family_trees_GSR
These are generated using a related method using an open source tool, Graphviz. The code present in this repository generates all family trees detectable within the Gascon Rolls as a corpus.

Visualisation of relationships between entities: exploring pardons

By making use of the calendared date itself, it is also possible to explore other types of relationship. For example, we explored the use of data drawn from calendars to identify possible relationships between those accused of criminal activities and the individuals who gave an undertaking of mainprise on their behalf, essentially the equivalent of posting bail. This was visualised using entity  co-occurrence within a paragraph context.

In C61_37 in particular we find many such relationships. A representative example is given in the figure below:

By aggregating these, we built a network of mainprise relationships between individuals. Full automation of this process is complicated by the fact that a variety of conventions and shorthand annotations are used to encode this data. Differing conventions are used by individual cataloguers, so it is difficult to extract wholly automatically. Overall, over two hundred complete records of this kind were identified over the full dataset. The data was cleaned manually where necessary.

This shows a large number of simple relationships – individuals who sought mainprise on a single occasion, and were granted it by individuals who are recorded only once as granting mainprise. A few more interesting and complex cases appear. For example, the following figure (left) shows a number of individuals who acted as mainpernors to up to five people. The accompanying figure (right) shows that an individual who acted as a mainpernor to three individuals then required mainprise; indeed, inspection of the data shows that the individual coded as 007356, John de Weston, mainprised Reyner de Berefrey, Jean Trie and Jean Roundell. He himself had been forced to seek pardon for ‘his failure to come before the king to satisfy him for his redemption for the disseisin of Alice, widow of William Angmund' by force and arms, for which he was outlawed in Sussex’.

This dataset is available at https://github.com/etonkin/mainpernors_GSR

Map

The project also produced a series of geospatial representations of its research data.

Interactive Map

This map plots the names of places mentioned in the Gascon Rolls, demonstrating the relative frequency of mentions of particular places/regions (with medieval boundary layers) and linking through to the authority file information for that particular location, allowing the user to see a list of calendars where it is mentioned, and to then link through to the location where they are mentioned in the rolls themselves.

The map is built using the OpenStreetMap API, a freely available resource.

http://www.gasconrolls.org/en/research-tools/interactive-map/

Interactive map by roll

Each roll displays a map view of the entities encoded within, e.g. http://www.gasconrolls.org/en/edition/calendars/C61_43/document.html

Authority files for location with map

These pages demonstrate information about each place entity encoded in the project, providing the following information

  • A map view of the entity
  • Where the place reference occurs
  • Variant names
  • Related places (where, for example, a parish forms part of another place)
  • Other entities which occur in the same entries where the given place entity is referenced
  • RDF representation of the entity

See, for example http://www.gasconrolls.org/en/indexes/entity-011279.html

 

Emma Tonkin described the challenges in representing and visualising spatial data in a series of project blogposts:

 

LinkedData

Linked Data is a standards-based model for representing structured data in a manner which allows it to be interlinked with other data and to then be queried semantically. It is often available as open data, and has some connection to Tim Berners-Lee’s proposal for the Semantic Web.

In this project, we aimed to use Linked Data principles where possible in order to make the data accessible to semantic and cross-project research which people might wish to do in future. We provide RDF/XML files for each person or place, for example: http://www.gasconrolls.org/en/indexes/entity-025964.html

OAI

An OAI-PMH (open archives initiative protocol for metadata harvesting) endpoint was created using a stylesheet transformation over the XML data. This was successfully tested using a freely available OAI-PMH data validation service.

Content (data) re-use

Due to the use of established standards on the project, research content can - and has been - re-used for other purposes. In 2015, content from the Gascon Rolls project was shared with The National Archives’ Traces Through Time project, which experimented with the application of data analysis techniques (using Named Entity Recognition) to historical archival data.[1] Later that year, sample content from the project (along with other medieval projects DDH was involved in) was deposited with the MESA (Medieval Electronic Scholarly Alliance) network https://mesa-medieval.org/search?a=gsr to aid discovery and facilitate potential wider collaborations.

In a collaboration between the Department of Digital Humanities and King’s Digital Lab, we have recently re-published the data as part of a wider Legacy Data project at King’s College London which seeks to:

  • Facilitate use of project data in teaching programmes;
  • Enhance project data citability;
  • Make it easier for researchers to access and build on existing project data, and to integrate it with other collections;
  • Enhance research impact (beyond academic use) of project data (e.g. linking and integration within archival collections).

More information about that is available here: https://data.kdl.kcl.ac.uk/dataset



[1] https://dcicblog.umd.edu/cas/wp-content/uploads/sites/13/2016/05/2.pdf