​Chapter 9Beyond accessibility: Using Shared Canvas in digital facsimile editions

Abstract

The genre of the scholarly edition is diverse in its forms, but it generally involves a reproduction of original source material in the context of a scholarly apparatus consisting of annotations which add scholarly value to the text by explaining the editorial decisions made in a particular passage, drawing attention to an interesting place in the text, or linking sections of the facsimile with secondary scholarship. This scholarly apparatus acts as a contextualization of the primary material with which it is concerned. Instead of being embedded within the original material, these additions are placed in margins, footers, and appendices separate from but engaged with the material. In every case of a footnote, marginal note, or other part of a scholarly apparatus, we can say that something provided by the editor is a “body” associated with something else that acts as the “target” of the commentary. This association of “body” and “target” is the essence of Open Annotation, an emerging standard on the World Wide Web for associating Web content in this manner. This sense of annotation—of associating one piece of content with another for some express purpose—is the foundation of the Shared Canvas data model, which we present in this chapter as one approach to representing digital facsimile editions in an open, shareable form.

 

Le genre de l’édition critique est divers dans toutes ses formes, mais il représente généralement une reproduction des documents de source originale dans le contexte d’un appareil académique consistant en annotations qui ajoutent une valeur académique au texte, en expliquant les décisions éditoriales prises dans un passage en particulier, attirant l’attention vers un endroit intéressant dans le texte, ou reliant des sections de la reproduction avec un document académique secondaire. Cet appareil académique sert à contextualiser le document primaire auquel il est lié. Au lieu d’être intégrés dans le document original, ces ajouts sont placés dans des marges, des bas de page et des annexes séparées, mais impliquées dans le document. Dans chaque cas, qu’il s’agisse d’un bas de page, d’une note en marge, ou d’une autre partie d’un appareil académique, nous pouvons dire qu’une chose fournie par l’éditeur est un « corps » associé à une autre chose qui agit comme « cible » du commentaire. Cette association de « corps » et de « cible » est l’essence même de l’Annotation ouverte, une norme émergente dans le World Wide Web pour associer ainsi du contenu Web. Ce sens d’annotation — soit associer un élément de contenu à un autre dans un but précis — est la base même du modèle de données Shared Canvas, que nous présentons dans ce chapitre comme une méthode visant à représenter les éditions de reproductions numériques, dans une forme ouverte, partageable. 

Keywords

Scholarly, facsimile, edition, digital, framework, annotation

How to Cite

Smith, J., & Viglianti, R. (2017). ​Chapter 9Beyond accessibility: Using Shared Canvas in digital facsimile editions. Digital Studies/le Champ Numérique, 6(6). DOI: http://doi.org/10.16995/dscn.16

Download

Download HTML

1601

Views

208

Downloads

The genre of the scholarly edition is diverse in its forms, but it generally involves a reproduction of original source material in the context of a scholarly apparatus consisting of annotations which add scholarly value to the text by explaining the editorial decisions made in a particular passage, drawing attention to an interesting place in the text, or linking sections of the facsimile with secondary scholarship (Price 2009). This scholarly apparatus acts as a contextualization of the primary material with which it is concerned. Instead of being embedded within the original material, these additions are placed in margins, footers, and appendices separate from but engaged with the material. In every case of a footnote, marginal note, or other part of a scholarly apparatus, we can say that something provided by the editor is a "body" associated with something else that acts as the "target" of the commentary. This association of "body" and "target" is the essence of Open Annotation, an emerging standard on the World Wide Web for associating Web content in this manner.[1] This sense of annotation—of associating one piece of content with another for some express purpose—is the foundation of the Shared Canvas data model, which we present in this chapter as one approach to representing digital facsimile editions in an open, shareable form.

In this chapter we want to explore the digital facsimile edition as expressed in the Shared Canvas and the subsequent International Image Interoperability Framework (IIIF) data models. Along the way, we will briefly explore the history of the facsimile and the typical form of the digital facsimile edition as it has developed over the last few decades. Throughout the rest of this chapter, when we refer to the "digital edition," we primarily intend the facsimile form; however, many of the principles introduced by the Shared Canvas and IIIF data models are applicable to other forms of the digital edition.

The Shared Canvas data model was developed with respect to Medieval manuscripts to provide a way for all of the representations of the manuscript to co-exist in an openly addressable and shareable form (Sanderson et al. 2011). A relatively well-known example of this is the Archimedes Palimpsest (Archimedes Palimpsest Project 2016). Each of the pages in the palimpsest was imaged using a number of different radiation energies (or colours of light) to bring out different characteristics of the parchment and ink. For example, some inks are visible under one set of energies while other inks are visible under a different set of energies. Because the original writing and the newer writing in the palimpsest used different inks, the images made using different energies allow the scholar to see each ink without having to consciously ignore the other ink. In some cases, the ink has faded in the visible spectrum to the point that it is no longer visible to the naked eye. The IIIF data model builds on the Shared Canvas data model with an emphasis on sharing image-based resources. While the Shared Canvas data model does not require a privileged resource, actor, or even file format when communicating the relationships between elements in a digital facsimile edition, the IIIF data model requires that projects use a specific style of image server, namely the IIIF Image Application Programming Interface (API), and a specific file structure based on the linked data style of JSON (JSON-LD), namely the IIIF Presentation API (see IIIF 2016). Because the IIIF data model is equivalent to the Shared Canvas data model, we will focus on the Shared Canvas data model for the remainder of this chapter.

Our experience with the Shared Canvas data model pertains to the Shelley-Godwin Archive, a multi-year project supported by the National Endowment for the Humanities and aimed at putting on-line under an open data license all of the images, transcriptions, and other information required to create a digital facsimile edition as well as other forms of scholarly output based on the papers of Mary and Percy Shelley and their close family (see MITH 2015 for published materials). While we focus in this chapter on the Archive as an example of implementation when discussing the details of the Shared Canvas and IIIF data models, the core parts of the data models are supported by a growing number of projects and tools listed on the IIIF showcase website (see IIIF 2016 for a list of projects and tools using the Shared Canvas and IIIF data models). As part of the Archive, we extended the data model to support digital facsimile editions of nineteenth-century and later manuscripts. The primary problem that we encounter in these manuscripts is the messiness of the handwriting. Mary's notebooks in which she composed Frankenstein were not written for the public or intended to be readable beyond her own use of the material. The pages have crossed out sections, marginalia, interlinear additions, and occasionally letters crossing other lines of text. Our challenge in this context was to make sense of these notebooks by a combination of facsimile images, annotation, and transcription descriptively marked up with the Text Encoding Initiative (TEI) XML format.​

The facsimile edition

Facsimile editions, whether print or digital, play a prominent role in textual scholarship because they provide a wide audience with access to materials that would otherwise be expensive to access, typically requiring the researcher to travel to the location of the original artefact in order to study it. The purpose of the facsimile is to replicate the original object, such as a manuscript, as faithfully as technology allows, typically through photographic reproduction, with particular attention paid to those aspects of the original that are most useful for scholarly research, especially research into the original intent of the author, the scribe, or other aspect that might not be obvious in an edited reading text or, in some cases, in the physical object itself (Kramer 2006). Lacking a trusted facsimile edition, a researcher must use the original manuscript or secondary reports, such as transcriptions, without the assistance of authenticating those reports against the original. Facsimiles provide wider access to restricted or inaccessible materials. Digital facsimile editions can reduce to almost nothing the incremental cost for access.

In print, a facsimile edition must prioritize some aspects of the original over others based on the intended scholarly audience or purpose. For example, a facsimile edition produced for one audience might try to be true to the colour on the page as perceived by the human eye, reproducing the experience of looking at the original. Another edition might try to show how the page might appear in other wavelengths of light, such as ultraviolet or x-ray wavelengths, revealing different inks used at different times, or previous writing that was erased, such as was done with the Archimedes Palimpsest discussed above.​

Affordances of the digital edition

It is important to consider how digital editions changed with the World Wide Web. As described in more detail by Daniel Sondheim et al. later in this volume, we have almost two decades of tradition to draw on in the development of the online digital edition (Price 2008). Many of these editions model their design on earlier projects published on CD-ROM and other formats developed before the Internet became ubiquitous. Just as early print books mirrored the forms of the manuscript book, early Web-based digital editions mirrored the forms of the CD-ROM or similar digital editions: a self-contained product that allowed access to the text through a particular interface. In some cases these editions reflected the publisher's desire to control the reader's experience with the content because that was the only model available at the time that could provide sustainable funding for publishing copies of the digital edition.

It might be supposed that the linking affordances of digital media brought a new way of working with scholarly editions, but in fact, argues Jerome McGann, "ANY scholarly-critical edition is 'research in hypertext format,'" regardless of the media in which the edition is produced (McGann 1996, emphasis in the original). For example, print editions consist of footnotes, links to other resources, commentary, etc., that have the scholar leafing through volumes of material following trails of references. The digital medium simply makes it easier to follow these trails. Early digital editions simply carried over into electronic form the critical apparatus developed over the years in print media, resulting by the mid-1990s in projects such as the Cervantes Project at Texas A&M University, a collection of digital resources centered on Cervantes and the text of Don Quixote (see Cervantes Project 2016 for the collection of resources produced by the project). In particular, we can examine the Electronic variorum edition of the Quixote (see Urbina 2016), part of the Cervantes Project, as an example of the early digital facsimile edition on the World Wide Web.

Figure 1: Screenshot of the Cervantes Variorum Edition layout with images and text.

 Screenshot of the Cervantes Variorum Edition layout with
  images and text.

As an example of the "radiant textuality" of McGann's vision for the scholarly edition, the Variorum edition contains the text and image scans of three editions based on twenty-three witnesses, displaying the text or image alone or together, and providing a multitude of potential combinations. Selections at the top of the screen allow navigation by edition, copy, chapter, and page, multiplying potential combinations in ways that could not easily or possibly be accomplished in a fixed form on paper. The electronic interface eases navigation and allows readers to focus on those copies, editions, and other material that interest them without requiring them to actively ignore content that they do not need. This demonstrates two advantages of the digital edition, facsimile or otherwise, over print editions: the tailoring of the presentation to the needs of the reader, and the subsequent unburdening of the reader from having to ignore content that is not germane to the research at hand. This reduction in the cognitive load frees readers to focus more of their energy on their research by allowing them to keep before them only those parts of the edition that interest them. Other projects go even further, providing tools for analyzing the text rather than simply navigating through it, such as Robinson's edition of Chaucer's The wife of bath, which provides computer-assisted collation and automatic generation of information presented in the critical apparatus (Chaucer 1996). Such editions explore how interactive tools might be integrated into the presentation, going beyond the affordances of any print edition.

Regardless of whether digital editions focus on a single source or on multiple sources, virtual reunification and facsimile images have become a given of digital scholarly publication, providing an expansive set of capabilities and materials far beyond what can be offered in any print edition. This trend has renewed interest in diplomatic editions, where one document is transcribed in great detail. In print, such editions have typically been used as a surrogate of the original source document, but in the digital medium, transcription is often paired with a facsimile image and works as an interpretative guide to the document, identifying hard to read, deleted, and added text. Elena Pierazzo argues that while transcriptions in printed diplomatic editions were restricted by the limits of typography, in the digital medium it is necessary to choose "where to stop" in the absence of such restrictions (Pierazzo 2011). Defining such limits on the basis of the editors' research interests rather than typography would form the basis, argues Pierazzo, of a "new publication form called the 'digital documentary edition'" (Pierazzo 2011, 463). This approach is in line with a larger editorial theory of genetic editing, established mostly in continental Europe, which focuses on the genesis of a work by editing sketches and draft material. Peter Robinson has recently criticized the digital propagation of this trend, warning that by "making only digital documentary editions, we will distance ourselves and our editions from the readers" (Robinson 2013, 127). Robinson is concerned that, by focusing on the document rather than on the work, these editions bring back an inward-orientation to the humanities, while digital editions ought to take into account public readership as well as scholarly community. This preoccupation stems from an often siloed view of digital editions, where each resource is self-contained. However, digital facsimile editions, particularly when modelled according to linked open data principles as in the case of the Shared Canvas data model,[2] open themselves to integration with other editorial resources that may be more focused on the "work," encouraging a collaborative side of digital editing that Robinson himself identifies as the "potential of editions in the digital world [...] as an ever-continuing negotiation between editors, readers, documents, texts and works" (Robinson 2013, 127).

The Shared Canvas data model​

The Shared Canvas uses an open data model that comes out of work related to medieval manuscripts such as the Archimedes Palimpsest. It uses linked data to bring together all of the needed Web resources to build digital facsimiles in a collaborative manner, allowing different scholars to contribute different aspects of a particular digital facsimile edition. Before we discuss the additional affordances that the Shared Canvas data model brings to digital facsimile editions, let us review the parts of the data model and how they relate to aspects of the digital edition, beginning with the concept of linked open data and building up through Open Annotation, the basis for much of the Shared Canvas data model.

Linked open data is data or content published and licensed such that "anyone is free to use, reuse, and redistribute it—subject only, at most, to the requirement to attribute and/or share-alike," with the additional requirement that when an entity such as a person, a place, or something that has a recognizable identity is referenced in the data, the reference is made using a well-known identifier—called a universal resource identifier, or "URI"—that can be shared between projects (for more information on the definition of open data, see the Open Knowledge International 2016a, from which we draw the definition here). Together, the linking and openness allow conformant sets of data to be combined into new datasets that work together, allowing someone to publish their own data as an augmentation of an existing published data set without requiring extensive reformulation of the information before it can be used by anyone else. Open Annotation builds on linked open data by using well-known identifiers (URIs) to associate two different resources or parts of resources (for example, paragraphs in a document or regions of an image) with each other. In the Open Annotation data model, an annotation is simply some resource, called a "body," associated with another resource, called a "target," such that the "body" indicates something about the "target." Annotations may also have other information associated with them such as who made the annotation, what their motivation was for making the annotation, and when they made the annotation.

Figure 2: Showing the relationships among canvases, zones, and sequences.

Showing the relationships among canvases, zones, and
  sequences.

The main innovation of the Shared Canvas data model is the "canvas" representing the abstract concept of a manuscript page instead of privileging any single image or other concrete representation of it. Each folio has two canvases representing its recto and verso. Because Shared Canvas is built on linked data, each canvas has a unique URI by which it is referenced when indicating which images, transcriptions, or other information should be associated with the manuscript page represented by the canvas. Associated with the canvas URI is just enough information to allow us to draw a blank space onto which we can render images and transcriptions and provide a human-readable label that lets us easily identify which page we are visualizing from the manuscript. This information consists only of this label, by which to identify the canvas, and an "extent" (a height and a width) representing the aspect ratio of the page and defining a coordinate system by which images and text can be positioned on the canvas. Because canvas has no intrinsic placement within a sequence, we can build any number of orderings for a set of canvases representing, for example, different expert opinions on how a set of pages might have appeared in the original compilation, or cases in which the folios of one manuscript were rebound with other manuscripts (for example, if the folios in one manuscript were used as endpapers in a number of other manuscripts). The Shared Canvas data model allows us to create multiple sequences representing all of the different ways in which the folios might be arranged, and the data model does not require that the digital edition privilege one sequence over all other sequences, though a typical edition will present an initial sequence when first viewed. Sequences of canvases serve as the backbone of the digital facsimile in the Shared Canvas data model. When using a visualization tool that understands this data model (i.e., a "Shared Canvas viewer" ), the sequence is used to determine the next or previous canvas as the user pages through an edition. By the time this book is available, several other Shared Canvas viewers should be available).

Figure 3: The principal parts of an annotation.

The principal parts of an annotation.

While canvases represent the entire surface of a page, a Shared Canvas "zone" represents a smaller area of interest on the canvas that acts just like a canvas except that it represents only part of a page. Users cannot page through zones as they can a sequence of canvases, but otherwise, zones act like a canvas for the rest of the Shared Canvas data model: they have an extent that defines a separate coordinate space on the canvas with which they might be associated, and they can have a rotation angle, allowing users to treat the area represented by the zone as if they were rotating the page so that, for example, they can read text that runs vertically up the page. Zones can also represent areas in which the precise location and/or relation of some elements are known, while others are not. For example, if someone has a collection of fragments that are known to belong on the same page, but it is not known precisely where on the page they should go, the user can represent the fragments as zones, allowing the images and transcriptions to be matched-up without having to worry about how everything fits together outside of the fragments.

This association of images, texts, and zones with canvases and other zones is done through the Open Annotation data model, which associates images of each page with the canvas for that page by "annotating" the canvas in relation to an image. This can be as simple as stating that the entire image maps to the entire canvas, but part of the image can be associated with part of the canvas just as well. With the Archimedes Palimpsest, multiple images are associated with each canvas, one for each energy of the electromagnetic spectrum with which the original pages were scanned. A Shared Canvas viewer allows the reader to select which colour image to see because some colours bring out the current text as seen by the naked eye, while other colours bring out the original text that was erased before the parchment was used again. Images may also be associated with zones as well as parts of a canvas. This is useful when a manuscript might have a flap, in which case the reader should be able to see what the page looks like with the flap down or with the flap up. The Shared Canvas data model can mirror the materiality of the original manuscript by using one image to represent the entire page and then layering-on smaller images, or parts of images, to represent the smaller features on the page through zones.

As described above, it is often desirable to associate a transcription with a page image. Transcriptions may be associated with the canvas in a number of ways. The easiest is to simply annotate a zone of the canvas with the corresponding text of the transcription, but Open Annotation allows transcriptions to be kept in TEI documents and mapped to canvases and zones as well. Through this mechanism, fragments of TEI-encoded texts can be positioned in a coordinate space, even layered over images, thus providing a framework for much tighter integration of image and transcription in facsimile editions. Using both the TEI model and the Shared Canvas data model allows a project to gain the best of both. A project can create a TEI document that represents the structure or other aspect of the original material without having to base any of the decisions about the TEI schema on how the resulting document might render as a webpage. The TEI document can focus on the semantic structure of the document or on the physical description of the document, whichever is appropriate for answering the questions posed by the researcher. Regardless of the emphasis chosen for the TEI document, the representation of the document through the Shared Canvas data model is based on its physical structure.

The Shared Canvas data model assumes that the transcription can be mapped to the canvas in near pixel-perfect fashion. Many optical character recognition (OCR) processors can provide bounding boxes for the words and lines in its output text, but automatic zoning is not available when working with handwritten manuscripts such as personal diaries or notebooks. The Shelley-Godwin Archive has developed a type of zone that works on a line and character basis instead of a pixel basis. Instead of associating the transcribed words and lines with locations on the page images, the Archive is associating lines of text with broader areas of the canvas, allowing side-by-side viewing of the transcription with images of the original pages.

The Shared Canvas data model can accommodate more than images and transcriptions. Any content that can be published to the World Wide Web can be associated with different parts of the facsimile edition, including scholarly commentary, alternate readings, and secondary scholarship, whether these be in the form of text, image, video, audio, or simply a notation pointing out connections between different publications. The Shared Canvas data model brings all of these pieces together in a document called the "manifest," a listing of all the components of the particular facsimile edition that an editor feels should be included: the set of canvases, the ordering of the canvas as sequences, annotations associating images or transcriptions with the canvases, and scholarly commentary. A manifest may contain multiple sequences representing a variety of scholarly opinions on how the canvases should be ordered as well as ranges of canvases and zones that represent sections of the larger work. Simple metadata can also be included in the manifest, such as the title of the work, the original author, and the editor. The "edition," in a sense, is the manifest; however, in a Shared Canvas viewer, the "edition" is not limited by this manifest. Anyone creating annotations following the Open Annotation data model can create relationships amongst anything that is addressable on the World Wide Web. A viewer that can work with the Shared Canvas data model, such as our "Shared Canvas viewer," will enable the user to create Open Annotations to create associations between the facsimile and other web objects: anything from shared highlights and bookmarks to class notes or even another interface to a bibliography of secondary material about a work. This flexibility in the Shared Canvas data model comes from its use of Open Annotation and linked data principles, enabling a range of affordances that have not been considered fundamental in prior digital editions on the World Wide Web.​

Affordances of the Shared Canvas data model​

Just as the overarching theme of early digital editions was providing access for readers to previously inaccessible source materials, so the theme of linked data and the Shared Canvas data model is providing maximal agency to all participants in the scholarly community. The model we describe here is part of a larger movement and philosophy of knowledge generation. The Open Knowledge Foundation (a non-profit foundation founded in 2004 and dedicated to the promotion of open data and open content in all their forms) encourages the publication of datasets that can be mixed together to form new data collections, allowing the discovery of connections that would not be apparent by looking at any one dataset by itself (see Open Knowledge International 2016b for more information on the Open Knowledge Foundation). The Foundation has been a key proponent of the "open government data" movement stemming from the public use of information obtained under various open-records laws, developing software and tools for publishing government data under open licenses and in open formats. Just as the popular open-source blog software WordPress allows users to augment its behaviour by installing plugins, so too datasets become "plugins" providing certain aspects of a larger distributed work. Instead of a project having to build an exhaustive database of content in order to answer some research question, it can focus on those pieces that have not been done and made available elsewhere or for which the participants have special expertise.

This approach to knowledge generation encourages a collaborative model wherein various components (in this case, of a scholarly edition) are published by different groups. For example, one group could establish the canvases for the pages in a set of manuscripts and publish metadata for each of the pages while another group provides the images associated with the canvases. A third group could provide the transcriptions. Finally, an editor could tie all of these data publications together through the Shared Canvas manifest document. One of the fears that arises in freely publishing open data is that someone else might take the data and build on it to the detriment of the original data publisher. However, just as open-source developers build their reputation through the series of software that they publish or to which they contribute, so the people publishing data develop a reputation through the data they provide to the community. This data is informed by expert decisions of the specialist scholar.

Some digital editions restrict access due to the costs of maintaining the program code behind the edition because they consider the contextualization of the facsimile edition provided by this program code as the primary publication. However, by using the Shared Canvas data model, we can see the data as the primary publication. The critical component of a digital edition is the archive of materials that can be drawn together to build the facsimile edition. Focusing on data as the primary publication moves the digital edition into the arena of open-access publication. Just as scholars are more concerned with widespread access to their research results, making digital editions available as free data allows scholars to focus on the intellectual work of creating the digital edition instead of how the digital edition might be sustained as a website or other presentation (Suber 2012).

One of the changes that linked data bring to digital projects, even projects that are not editions of original source material, is the devolution of agency in the community through the open world assumption of the Resource Description Framework (RDF), which forms the basis of much of linked data. The assumption is that the truth of a statement is independent of who might know or be aware of the statement. Thus, linked data assumes that no one has perfect knowledge of all statements about a particular subject. One important consequence of this assumption is that anyone may create an annotation using Open Annotation without having to own either the target of the annotation or the content that the annotation asserts is in some way associated with the target. For example, someone could create an Open Annotation consisting of a "body" pointing to a paragraph in a scholarly article and a "target" pointing to a scene in Hamlet without having to own, obtain permission, or otherwise have access to modify the scholarly article or the edition of the play that the annotation is about. The publisher of the article retains agency over the article, the publisher of the edition of Hamlet retains agency over that edition, and the author of the annotation retains agency in making the assertion and resulting association. Thus, someone could publish a list of annotations that link secondary scholarship with primary materials without having to own or grant access to either the secondary scholarship or the primary materials. The Shared Canvas data model, being built on linked data principles, would allow a digital edition to incorporate such a published list of annotations as a way to link the scholarly conversation with passages in the primary text.

Another use of Open Annotation in the Shared Canvas data model is in allowing editing of the primary materials without requiring specialized tools. We are exploring this in our continued development of our curation tools for the Shelley-Godwin Archive. By modeling edits as annotations of the digital representations of the primary materials as well as annotations of the connections between the digital representations as expressed through the Shared Canvas data model, we can allow anyone to offer changes to the digital edition as long as their tools understand Open Annotation. Because Open Annotation allows an annotation to document its own provenance—information such as the identity of the person creating the annotation—we can use trust models and other filtering mechanisms to provide semi-automated, if not fully automated, moderation of proposed edits to a digital edition, assuming appropriate trust in the tool chains managing the annotations.[3]

One of the greatest departures from traditional digital facsimile editions that the Shared Canvas data model allows is the use of any appropriate tool with the data contained in the edition instead of relying on the tools provided by the publisher. We already saw how this might happen for editing a digital edition, but the openness of the data model allows a broader range of choices for interoperable tools and processes. For example, some projects provide collation tools to compare different versions of a work. Others might provide tools for assembling a custom edition from a subset of the available transcriptions and images. Regardless of the tools provided, they must be provided as part of the digital edition with which they work. By publishing an edition using the Shared Canvas data model, the text and other materials in the edition become available for use by any tool that the reader can bring to the material. The scholars producing the edition are free to focus on the information peculiar to the original source material and not expend resources building yet another collation interface or custom edition builder. The Shared Canvas data model ensures that the reader can make the necessary connections by using the necessary software to experience the manifold dimensionality of tying together primary source material, secondary scholarship, and social commentary without relying on any one publisher or any one path through the material. Through linked data, the Shared Canvas data model ensures that readers are able to explore a text in the way that they wish to do so, allowing them to select appropriate parts of the available materials in a way that fits best their own research. This agency can act as a counterweight to Robinson's concern discussed above about distancing the edition from the reader.​

Future directions for the Shared Canvas and IIIF data models and digital editions

This data model of linked open data encourages us to reconsider what we mean by publication of a scholarly edition. Instead of seeing the research product as a polished and final presentation of an edition on the World Wide Web, we should consider the resulting product to consist of the information required to construct the edition and see the particular presentation of the edition as ephemeral, much like a museum display of an artefact: the artefact is the object of interest, which does not lose scholarly value when removed from its contextualizing display. This model represents an important emerging technique for digital facsimiles, namely the separation of content such as page scans and transcriptions from the relationships among the various content components, and the separation of these from their presentation. The independent publication of each component allows other scholars opportunity to incorporate this work more easily into their own research.

Textual scholarship can benefit from considering how other problems of content and visualization yield to this kind of componentization. Just as new geometries were discovered when the assumptions of Euclidian geometry were questioned, we can find new applications for the techniques used in the Shared Canvas data model by questioning their fundamental assumptions. For example, what if we were to explode the dimensional constraints of the canvas itself? Let us consider first what happens when a canvas is not constrained by a size. What does it mean for a canvas to be unbounded (or "infinite")? If it is unbounded on one dimension, perhaps horizontally, but only unbounded on one end, then it could be an infinite scroll with a definite beginning. If it is unbounded in both directions, then it is an infinite scroll with no beginning or end, at least in its definition. If the canvas is unbounded both horizontally and vertically, that is, if the canvas does not have any edges on either dimension, then it is an infinite space waiting to be tiled with imagery, such as Scott McCloud's Infinite Canvas concept for comics, in which the panes of the comic are able to progress in any direction on the surface (McCloud 2000).

There are other dimensional geometries to consider besides the flat surface: spherical and toroidal, for example. A spherical canvas could represent the Earth's surface, providing a way to paint geographical information onto a canvas. For example, such a concept would allow us to match locations mentioned in a text with locations on the Earth's surface without requiring the use of a particular piece of software. A toroidal canvas is not something we encounter often in the humanities, but many video games have toroidal spaces. For example, the classic Pac-Man game is played on the surface of a torus: going through the top of the screen brings the user's avatar to the bottom, and passing through the left of the screen brings the avatar to the right of the screen, etc. If we connected the top of the screen to the bottom, and the left to the right, the canvas would form a torus, such as found in many video games of the latter part of the twentieth century.

Another assumption is that the canvas represents a surface. It is easy to see how the infinite horizontal (or vertical) canvas could represent a scroll of indeterminate length, but what if the canvas represented something other than space, such as time? The result would be a set of timelines or temporal canvases representing, for example, different television channels or fictional eras in novels. Instead of representing space, these timelines represent the multitude of fictional and non-fictional histories that inform our sense of humanity. Using a central temporal canvas and a number of zones, we could represent different historical epochs and calendars similar to how rata die ("fixed date") and Julian dates are used in calendrical systems (Dershowitz and Reingold 1997; for a more in depth discussion of temporal framing, see deTombe's chapter in this volume). Within the context of a particular calendar, events might be well ordered and their relationships understood, but we might not know how that calendar fit on the overall timeline of history, just as zones might represent information about fragments on a page even if we do not know precisely how the fragments fit relative to each other.

Temporal canvases and calendars can be used to describe data that is already available. The Broadcast Monitoring System (BMS), part of The Media Monitoring System, is an example of a tool whose output can be modeled using the temporal canvas idea. The BMS is a set of hardware and software components that provide near real-time transcription and translation of broadcast video in a number of languages (for example, Spanish on Univision or Arabic on Al Jazeera as provided through the DISH satellite network), storing that information for a significant period of time and allowing keyword searching of the material (The Media Monitoring System 2016). We could model each video channel as a timeline annotated with video segments, text transcriptions, and text translations, similar in nature to the annotations used in creating a digital facsimile. The primary difference is that the annotations would be painted onto a timeline instead of a surface. This example also illustrates that not all annotations need be textual since here we are associating segments of video with times in a timeline. Similarly, a scholarly edition of a movie could represent the timeline of the movie as a blank temporal canvas annotated with video segments, transcriptions, perhaps translations in a number of languages, and scholarly commentary. A future Shared Canvas movie player could select a translation to show as if it were closed-captioning, without modifying the video of the movie. More importantly, an instructor could add notes for a class or highlight particular scenes without having to modify anything in the primary material other than adding a link to their annotations.

Another implication of a timeline might be the dating of published editions of a text. Here, we would create annotations asserting that a particular digital facsimile edition was published at a particular date or within a range of dates. Such annotations could also be used to date events mentioned within a text. Some work has been done in the TEI to represent dates and ranges in a consistent and accurate manner, but doing so using Open Annotation would allow someone to record the publication date without having access to modify any of the primary materials in a digital edition (Holmes, Jenstad and Butt 2013).

One important and seemingly implied assumption in the Shared Canvas data model at the moment is that visual non-image annotations are text and that musical annotations represent a sound recording. However, with the advent of the Music Encoding Initiative (MEI) mirroring for music what the TEI has done for text, music annotations can be represented as music notation on the page instead of as a sound recording. That is, instead of only providing a way for the reader to hear the music described in a manuscript, MEI in coordination with the Shared Canvas data model would allow the reader to see music rendered in various formats, be it a modern form of a medieval music score or a standardized typesetting of handwritten notes. Being able to include transcriptions of music notation can be useful when pages contain both text and music notation, which is not uncommon in medieval manuscripts (the current main focus of Shared Canvas data model use).

For works and artefacts that include more than written words and images, the Shared Canvas data model provides a way to tie different pieces together and accommodate tools for looking at a slice of the whole depending on which aspect one wants to emphasize. For example, one of the most complex art forms is opera. It requires a large number of skill sets for both its composition and its performance. Composer, librettist, publisher, theatre producers, singers, and designers all contribute to the creation of a variety of artefacts in multiple versions that make the preparation of a scholarly edition of an opera a daunting task. Does it consist of just the composer's music with the librettist's words, or also elements of staging? Which version of its performance is the most "correct" or "authoritative"? What role should choreography, set designs, or costumes take in a scholarly edition?

Figure 4: Wireframe of the interface of a hypothetical digital edition of an opera showing the video of a critically acclaimed performance, the score or libretto, and a schematic rendering of the stage placement of participants and the choreography.

Wireframe of the interface of a hypothetical digital edition of
 an opera showing the video of a critically acclaimed performance,
  the score or libretto, and a schematic rendering of the stage
  placement of participants and the choreography.

Because there are many answers to these questions, one would need a way to accommodate multiple perspectives on this complex editorial work.[4] For example, in Figure 3, we can follow along in the score or the libretto while we watch a performance. We can see how the text relates to the choreography. By using linked data technologies, no one project has to do all of the work: separate projects can focus on their particular area of interest. As long as the projects use a shared set of names for entities, their products (which might be smaller data sets using a data model similar to Shared Canvas) can be merged into a larger work, just like all of the roles and elements in producing an opera.​

Conclusion

Through the open-data model we are able to imagine a digital facsimile edition that moves beyond the capabilities of the CD-ROM-style presentation and makes use of all of the affordances of the World Wide Web. The digital humanities already make extensive use of open source software and open access venues for publication. As part of an effort to complete the triad with linked open data, the Shared Canvas data model provides a ready framework for producing sharable digital scholarly editions amenable to remixing by the scholarly community. For the community to realize the full promise of linked data and the Shared Canvas data model, it must develop canonical identifiers for various domains—such as named entities, manuscript folios, or works—and ways to discover Shared Canvas components and other linked data by publishing data in appropriate repositories and indices. In addition, digital editions must move beyond centralized data models and embrace distributed architectures mirroring the World Wide Web.[5]


Notes

[1] The World Wide Web Consortium (W3C) supports two efforts concerning annotation: the Open Annotation Community Group charged with "driving use cases and requirements, and further discussion of annotation issues that are outside the scope of the Web Annotation [Working Group]"; and the Web Annotation Working Group (WAWG) "chartered to develop a set of specifications for an interoperable, sharable, distributed Web annotation architecture" (W3C 2010).

[2] Linked data is data that uses HTTP URIs to identify web objects in such a way that browsing to the URI provides information about the identified thing. Linked data allows one project to reference information in another project without having to replicate all of the information or worry about name collisions when merging data from other projects (for more information on linked data, see Berners-Lee 2009).

[3] For annotations expressed using the Open Annotation data model to be trusted outside of a trusted tool-chain, the annotations must be signed in such a way that the authorship or integrity cannot be disputed. That is, once the annotation is made, it should be algorithmically possible to tell (1) if the annotation has been modified since it was signed, and (2) if the person claiming to sign the annotation is indeed the person so claimed. Open Annotation is built on the Resource Description Framework (RDF). As such, it can take advantage of the signing algorithms or other cryptographic standards developed for RDF, of which a number may be emerging, though none are sufficiently far along at the time of this writing to be of any practical use (see Cloran and Irwin 2005; Reagle 2000; and Sayers and Karp 2004).

[4] Frans Wiering suggests taking a multidimensional approach for digital scholarly editions of music (Wiering 2009). A multidimensional model would accommodate transcriptions of each source relevant to the work being edited, but also includes sources from other media like audio and video to document performances. Wiering talks about "two dimensional slices" that can be taken from such model to create different views and reading paths. Wiering's model is only theoretical; linked data technologies may offer a way to render his model in practice.

[5] Thanks go to Casey Boyle for sharing the idea of linking together social media with secondary scholarship and primary texts. Thanks to Robert Sanderson for being patient as we poked and prodded at the Shared Canvas data model.


Works cited / Liste de références

Archimedes Palimpsest Project. 2016. "Introduction to the Palimpsest." Archimedes Palimpsest. Accessed August 8. http://www.digitalpalimpsest.org/.

Berners-Lee, Tim. 2009. "Linked data." W3C. Accessed October 22, 2016. https://www.w3.org/DesignIssues/LinkedData.html.

Cervantes Project. 2016. [Homepage]. Accessed August 8. http://cervantes.tamu.edu/V2/CPI/index.html.

Chaucer, Geoffrey. 1996. The wife of Bath's prologue on CD-ROM, edited by Peter Robinson and Norman Blake. Canterbury Tales Project. Cambridge: Cambridge University Press.

Cloran, Russel and Barry Irwin. 2005. "XML digital signature and RDF." Poster presented at ISSA 2005 New Knowledge Today Conference, Sandton, South Africa, 29 June–1 July. http://icsa.cs.up.ac.za/issa/2005/Proceedings/Poster/026_Article.pdf.

Dershowitz, Nachum and Edward W. Reingold. 1997. Calendrical calculations. Cambridge: Cambridge University Press.

Holmes, Martin, Janelle Jenstad, and Cameron Butt. 2013. "Encoding historical dates correctly: Is it practical, and is it worth it?" Paper presented at Digital Humanities 2013. University of Nebraska, Lincoln, July 19. http://dh2013.unl.edu/abstracts/ab-179.html.

IIIF (International Image Interoperability Framework). 2016. [Homepage]. Accessed August 8. http://iiif.io//.

Kramer, Manfred. 2006. "What is a facsimile? The history and technique of the facsimile." Imagination, Almanach 1986–1993, Sammelheft, translated by Eric Canepa. Accessed August 8, 2016. http://www.omifacsimiles.com/kramer.html.

McCloud, Scott. 2000. Reinventing comics: How imagination and technology are revolutionizing an art form. New York: William Morrow.

McGann, Jerome. 1996. "Radiant textuality." Victorian Studies 39.3: 379-390.

MITH (Maryland Institute for Technology in the Humanities). 2015. "Shelley-Godwin Archive." Accessed August 8, 2016. http://mith.umd.edu/research/shelley-godwin-archive/.

Open Knowledge International. 2016a. "The open definition." Accessed August 8. http://opendefinition.org/.

Open Knowledge International. 2016b. [Homepage]. Accessed August 8. https://okfn.org/.

Pierazzo, Elena. 2011. "A rationale of digital documentary editions." Literary and Linguistic Computing 26.4: 463–477.

Price, Kenneth M. 2008. "Electronic scholarly editions." In A companion to Digital Literary studies, edited by Susan Schreibman and Ray Siemens. Oxford: Blackwell. http://www.digitalhumanities.org/companionDLS/.

Price, Kenneth M. 2009. "Edition, project, database, archive, thematic research collection: What's in a name?" Digital Humanities Quarterly 3.3. Accessed September 21, 2016. http://www.digitalhumanities.org/dhq/vol/3/3/000053/000053.html.

Reagle, Joseph M. 2000. "XML signature requirements." Accessed October 16, 2016. http://www.ietf.org/rfc/rfc2807.txt.

Robinson, Peter. 2013. "Towards a theory of digital editions." Variants 10: 105-132.

Rolsky, David. 2009-2010. "Calendar modules." Perl DateTime Project. Accessed August 8, 2016. http://datetime.perl.org/wiki/datetime/page/Calendar_Modules.

Sanderson, Robert, Benjamin Albritton, Rafael Schwemmer, and Herbert van de Sompel. 2011. "Shared canvas: A collaborative model for Medieval manuscript layout dissemination." Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries, Ottawa, Canada, June. http://arxiv.org/abs/1104.2925/.

Sayers, Craig and Alan H. Karp. 2004. "RDF graph digest techniques and potential applications." HP labs. http://www.hpl.hp.com/techreports/2004/HPL-2004-95.html/.

Suber, Peter. 2012. Open access. Cambridge, MA: MIT Press.

The Media Monitoring System. 2016. [Homepage]. Accessed October 16. http://mms.tamu.edu/.

Urbina, Eduardo. 2016. "Electronic Variorum Edition of the Quixote (EVE-DQ)." Cervantes Project. Texas A&M University. Accessed August 8. http://www.csdl.tamu.edu:8080/veri/index-en.html.

W3C. 2010. "Web annotation working group." Accessed October 22, 2016. http://www.w3.org/annotation/.

Wiering, Frans. 2009. "Digital critical editions of music: A multidimensional model." In Modern methods for musicology: Prospects, proposals, and realities, edited by Tim Crawford and Lorna Gibson, 24-46. Abingdon: Routledge.

Valid XHTML 1.0!

Share

Authors

James Smith
Raffaele Viglianti

Download

Issue

Dates

Licence

Creative Commons Attribution 4.0

Identifiers

File Checksums (MD5)

  • HTML: fa4729673c87e56535ae40dd744ec59f