Introduction

This paper investigates the layers of remediation and hidden labour embedded in the digitization of early modern texts, utilizing a bibliographic case study to demonstrate the stakes of innovations in digitization practices, as well as accessibility to in-person archival visits. In Ghosts, Holes, Rips and Scrapes, Zachary Lesser defines ghosts as “[a] residue of the linseed oil in which lampblack, soot produced by burning oil, is suspended to create printer’s ink,” which creates a faded image on the adjacent leaf, due to its acidity (Lesser 2021, 33). He reveals the utility of discovering ghosts, arguing that “[a] bibliographic ghost returns to the world that has forgotten it to reveal lost collections, lost sammelbands, lost histories” (Lesser 2021, 33). In the spirit of reconstructing these “lost histories,” this paper explores the metadata of Thomas May’s Two Tragedies, tracing its phases of remediation, asking (1) what can its metadata tell us about this book’s digitization history, and (2) whose hidden labour can be unveiled by exploring it? Through the case study of The Tragedie of Cleopatra, the paper argues that these ambiguities necessitate a more integrated approach, combining innovative digitization processes with opportunities for in-person archival research to ensure that the provenance and materiality of texts are accurately represented. I demonstrate the urgent stakes of providing scholars opportunities for in-person archival research via identifying a case study in which it is impossible to tell whether a phenomenon is an embodiment of Lesser’s concept of a “ghost” or a mere relic from the digitization process. These stakes are especially high for scholars relying on digitizations to conduct research, as addressing these uncertainties requires innovation in digitization processes and sustained investment in methodologies that bridge the gap between digital and physical archives.

The Functional Requirements for Bibliographic Records (FRBR) model defines “Item” as “a single exemplar of a manifestation,” referring to a specific physical or digital work (Tillett 2003, 24). This view of physical or digital works as versions of the work shifts the notion of bibliographic records to a more disaggregated approach to digital objects. This shift is mirrored in Howarth’s analysis, where she argues that “[w]ith a shift in focus away from data aggregates—the bibliographic record as a whole—to component pieces of data (or disaggregated data), those data elements have the potential to be shared and used in diverse, even novel ways” (Howarth 2012, 773). Howarth’s assertion aligns with the conceptual framework of the FRBR model in that both suggest disaggregating traditional records into more flexible, reusable components. This allows for the potential integration of linked data, not only within institutional contexts, but also in social networks and broader digital platforms, enhancing the way digital documents and their representations are studied, categorized, and shared. By redefining the “Item” as a more adaptable digital entity, we can align bibliographic practices with the digital age, fostering a deeper understanding of both physical and digital objects. This reconceptualization both deepens the potential theoretical insights for book historians and paves the way for rethinking digitization methodologies that bridge the gap between physical archives and their digital surrogates.

Scholars have long worked to reckon with the affective implications of virtually perusing pre-modern manuscripts and early printed books. Medieval scholars such as Elaine Treharne use the term “dismemberment” to describe this strange process (Treharne 2013, 475). It is the transformation of Benjamin’s notion of the “aura” when manuscripts, which were once solely accessible in person, become digitized and remediated by the screen one uses for viewing access to the object (Benjamin 1968, 4). There have been generations of scholars investing in articulating the emotional experience of physically handling pre-modern manuscripts, dating back to Derek Pearsall’s canonical “The Value/s of Manuscript Study” (Pearsall 2000, 174). An array of responses followed Pearsall’s text throughout the 2000s and 2010s, such as that of Angela Bennet Segler, which hyperbolizes the “experience of contemporaneity between touching [and] brushing bodies” by handling a manuscript via the notion of “manuscript virginity” (Bennet Segler 2013, 52). It is the feeling of seeing a manuscript in person, and the disconnect that occurs in the process of remediation, that has historically caused scholars wonder, confusion, and discomfort. Dot Porter demonstrates the extent of the latter with the use of the uncanny valley to describe the effects of digitization on a manuscript’s “manuscript-ness” (Porter 2018). Because of this strangeness, within the scholarly landscape, there has been what Robert Binkley refers to as a “fetishism” of utilizing printed books, as opposed to their digitized counterparts (Binkley [1935] 1948). The infamous distance that separates a researcher from a digitized object has led to decades of scholarship attempting to work through the tension of the significance of materiality and the accessibility of digitized texts.

Scholars have also examined how materiality and digitization shape the layered temporality that distances readers from their screens, extending previous discussions of the affective transmission produced by handling pre-modern manuscripts to the digital sphere. Bonnie Mak examines the layered temporality of digitizing texts, arguing that digitizing technologies “transmit” the conditions of that text over time (Mak 2014, 1,516). As Mak notes, this context is framed by an array of metadata, which itself is complicated by a blend of old and new information (Mak 2014). Furthermore, Michael Gavin finds reverberations of the strange effects of amalgamated temporality of metadata, arguing that “[b]ibliographic catalogues provoke a kind of sublime experience, an awareness of ambient textuality, whispering: Books like this, but different, exist” (Gavin 2019, 76). Thus, according to Gavin, the process of creating metadata for a pre-modern book incorporates it into a vast organizational schema, generating a sense of awe at the awareness of an expansive network of related texts that exist beyond immediate perception. The well-established scholarly tradition of exploring the implications of digitization for academic researchers has elucidated how digital mediation transforms our engagement with texts, altering both their material presence and conceptual interpretation, while also prompting the development of robust methodologies to address the epistemological challenges posed by digital surrogacy. My case study exposes the inherent limitations of current digitization practices and argues that without a deliberate and sustained commitment to providing scholars access to in-person archival research and furthering our digitization technology, our understanding of historical texts will remain potentially compromised.

Case study: May’s The Tragedie of Cleopatra

To explore the stakes of digitization, this study employs a case study approach that reveals the layers of remediation separating researchers from Thomas May’s The Tragedie of Cleopatra. As part of a broader investigation into ancient queens in early modern drama, the methodology began with a systematic examination of multiple editions of the text and their metadata. As shown in Figure 1, the English Short Title Catalogue (ESTC) identifies two versions of The Tragedie of Cleopatra, prompting a closer look at the origins of the metadata and the intervening layers of remediation. The ESTC, as one of the most centralized sources for early modern bibliographical data, serves as the initial point of analysis. Stephen Tabor contends that the ESTC must continuously evolve due to its collaborative nature: “The very success of the project in mobilizing contributions from such diverse sources creates monumental house cleaning problems. As long as the file grows, and people keep working on it […] the file will continue to sprout typos, mis-statements, and ghosts” (Tabor 2007, 384). This evolving, collaborative framework underscores the inherent challenges in maintaining quality control in digital projects, rendering their fallibility inevitable. In line with Bridget Whearty’s argument that “if humanities researchers wish to be information-literate about our data, we have to understand how they come into being” (Whearty 2022, 10), this methodology aims to bridge the gap between digital surrogates and in-person archival research. The following section details the systematic procedures employed in this investigation, demonstrating how innovations to bibliographic research technology are still needed to conduct research in early modern book history, particularly when scholars do not have access to in-person archival visits.

Figure 1
Figure 1

ESTC record of May’s The Tragedie of Cleopatra.

The first step in the bibliographic workflow was observing the ESTC number of Two Tragedies, which is important for identifying digitizations of proper editions, and the internal system number, author, variant title names, as well as its publication details (Figure 2). From this page, one can also see that it contains 190 pages formatted as a duodecimo, with a complex signature “A2(-A1+chi2) B-D12 E4, 2A12(-A1,2,11,12+chi2) B-D12 E6” which indicates formatting irregularities in the construction of Two Tragedies (leaves A1, A2, A11, and A12 are missing), with Two Tragedies instead containing two extra leaves, both called “chi²,” as well as this information written out briefly. Additionally, one can see where the 1996 CD-ROM is located and microfilm is produced, as well as where print and digital copies of this edition are, the latter of which are found on Internet Archive and Early English Books Online (EEBO) (May 1654a; 1654b). In Figure 3, this information is also available for the other printed version of the play. Once the EEBO scan had been identified, two editions of the play were accessible to read side-by-side.

Figure 2
Figure 2

ESTC record of May’s Two Tragedies.

Figure 3
Figure 3

ESTC record of May’s The Tragedie of Cleopatra.

The layers of remediation separating the viewer and the material Two Tragedies became increasingly apparent after examining the metadata on Internet Archive (Figure 4) and downloading a torrent client (Figure 5). As Lisa Gitelman argues, dematerialization “can only be experienced in relation to a preexisting sense of matter and materialization” (Gitelman 2006, 86). Figure 6 contains information about the book from Internet Archive’s website, the top of which is straightforward: the scan’s uploading date and the associated names with it, including the bookseller, the playwright, and Thomas Pennant Barton, a previous owner of the book. Figure 6 also states that the call number is BRLL, and that the camera that took the images is a Canon EOS ED Mark II. The bottom entries are a bit less reader-friendly, but (with the help of Chat GPT-3.5) seem to indicate Internet Archive’s identifiers for this book (Figure 7). Parikka identifies the power of media archaeology as replacing a chronological perspective with a layered one, a “spirit of thinking the new and the old in parallel lines,” as opposed to linearly (Parikka 2012, 2). This reconfiguration allows for a deeper understanding of how past and present media cultures are interconnected, revealing how old technologies continue to shape modern practices and discourses, and offering insights into the material foundations of contemporary media.

Figure 4
Figure 4

Image courtesy of Internet Archive.

Figure 5
Figure 5

Image courtesy of ZBIGZ.

Figure 6
Figure 6

Image courtesy of Internet Archive.

Figure 7
Figure 7

Image of consultation with Chat GPT-4.0.

Thanks to the generous assistance of the British Library and Boston Public Library, more information became available about this book’s records and provenance. According to the Early Printed Collections Cataloguing and Processing Manager of the British Library at the time, this book was added to the ESTC in 1987, when the catalogue was expanded to include publications pre-1701 (Early Printed Collections Cataloguing and Processing Manager of the British Library, email messages to author, October 10–16, 2023). As noted by the Boston Public Library’s website about Barton’s collection, it was initially obtained by the library in 1873 and placed in the library’s Upper Hall, per his widow’s request: “When the Barton Collection was transferred to the Boston Public Library, it was housed, as per Cora Barton’s stipulations, within separate alcoves in the Upper Hall, where the more scholarly volumes in the library’s collection were held” (Boston Public Library 2024). The Curator of Rare Books and Manuscripts at the Boston Public Library at the time was able to provide a wealth of information about this book’s provenance and history. Regarding its provenance, although one cannot know for sure who the previous owner (“J.F.,” who inscribed their initials on the front flyleaf) is, according to the Curator of Rare Books and Manuscripts, we do know that half of this book is interleaved, which was likely done by “J.F.,” because this is “certainly not something Barton would have done” (Curator of Rare Books and Manuscripts at the Boston Public Library, email messages to author, October 8–December 13, 2023). In trying to identify how this book ended up at the Boston Public Library, the Curator of Rare Books and Manuscripts reported the fact that Barton’s collection was obtained via an en-bloc purchase in 1873. They were then able to provide me access to Barton’s correspondence with booksellers, many of which contain itemized invoices of the purchased books; however, they cautioned that this would take a lot of manual work, describing it as “[a] bit like looking for a needle in a haystack, but every once in a while, one can find what they’re hoping for” (Curator of Rare Books and Manuscripts at the Boston Public Library, email messages to author, October 8–December 13, 2023).

As shown in Figure 8, an itemized invoice from John Russell Smith to Barton from December 1858 shows a book simply listed as “May,” which was lot 877 in the auction of books on December 9 that belonged to the late John Harward. The Curator of Rare Books and Manuscripts graciously directed me towards the auction catalogue (Figure 9) from the HathiTrust Digital Library, which shows that item 877 in the Harward auction is described as “May (Thomas) Julia Agrippina, 1654–Cleopatra, 1654, morocco, gilt edges” (S. Leigh Sotheby & John Wilkinson 1856–1858).

Figure 8
Figure 8

Images of an 1858 letter containing an invoice from John Russell Smith to Barton. At the top of the second page, the line item for May’s book is labelled “877.” Scan courtesy of the Boston Public Library.

Figure 9
Figure 9

Image of the auction catalogue. Item 877 in the Harward auction is described as “May (Thomas) Julia Agrippina, 1654–Cleopatra, 1654, morocco, gilt edges.” Image courtesy of HathiTrust Digital Library.

Through examining the scanned copy of the auction catalogue, which contains annotations identifying the buyers of each lot, it is clear that the buyer of lot 877 was John Russell Smith, who likely purchased it for Barton. It was thus confirmed that “[w]e can conclusively say, then, that Barton bought his copy from John Russell Smith, who had purchased it for him at the December 9, 1858 auction of John Harward’s books” (Curator of Rare Books and Manuscripts at the Boston Public Library, email messages to author, October 8–December 13, 2023).

Now having additional information about the book’s provenance, as well as information regarding its history previous to becoming a part of Barton’s collection, I return to Figure 6, where someone was credited by name with uploading this scan. Who are they, and how did they acquire it? Thanks to the power of social media, I was able to chat with the person who was Head Cataloguer at the Internet Archive scanning centre at the Boston Public Library from 2015 to 2018. Although they did not remember May’s obscure text, they provided insight into the invisible labour that had allowed it to appear on my screen, detailing the process by which books are scanned. They explained their role in making sure the images were in focus, and no pages were cut off, all while handling the book’s metadata via MARC codes (Head Cataloguer of the Internet Archive scanning centre of the Boston Public Library [2015–2018], Instagram direct message correspondence with author, October 7–11, 2023).

It became apparent, through analyzing the detailed information on the scanning and uploading procedures, that the digitized version of Two Tragedies exhibits an unusual anomaly: it ends mid-play. This edition ends with Plancus saying: “No conquer’d Prince. / Did ever find a nobler way to death” (The Tragedie of Cleopatra, V.i). Here, Cleopatra has just stabbed herself, and Plancus is arguing that Cleopatra died as royally as she lived, despite Octavian Caesar winning the war. The final page contains a catchword, “had,” confirming its abrupt ending. The presence of this catchword demonstrates that there was more to say; however, nothing but a ghost remains.

In the aforementioned EEBO scan, there is a different final page containing Caesar’s memorialization of Cleopatra, where he says:

We will no longer strive ‘gainst destiny.

Though thou art dead, yet live renowm’d for ever […]

No other Crown or Scepter after thine

Shall Aegypt honour: thou shalt be the last

Of all the raigning race of Ptolomey. (The Tragedie of Cleopatra, V.i)

Caesar thereby memorializes Cleopatra, declaring her death honourable despite her suicide. Without this scene, the text ends with Cleopatra killing herself in the face of Caesar; however, readers with access to this final speech witness Caesar taking control of Cleopatra’s memorialization and catalyzing his empire.

In place of this speech, in the Internet Archive scan, there is a ghost disrupting the play’s unfinished ending with a backwards imprint of another page (Figure 10). Through inverting the image of the ghost (Figure 11), a character list appears, which does not exist elsewhere in this edition, more clearly shown in the Early English Books Online scan of The Tragedie of Cleopatra (Figure 12). This imprint matches Lesser’s assertion that ghosts are often “on the final verso of the play” (Lesser 2021, 46). While these preliminary observations appeared promising, further verification was needed to be certain that this was the phenomenon being observed.

Figure 10
Figure 10

Two Tragedies (1654), courtesy of Internet Archive.

Figure 11
Figure 11

Inverted Two Tragedies (1654) scan.

Figure 12
Figure 12

The Tragedie of Cleopatra (1639), courtesy of Early English Books Online.

So, what does one do after finding a ghost? Naturally, question if it was really there. I had the privilege of asking a colleague who frequents the Boston Public Library to investigate. By asking them to see the book in person, they could aid me in determining whether my observations were a ghost or a mere relic of the digitization process. My colleague was able to frequent the Boston Public Library and look at the book in person with their own eyes. As seen in Figure 13, my colleague was able to see that Two Tragedies does not contain a ghost, but rather does contain the character list for the play, upright. The once-probable ghost was merely evoked from the layers of remediation separating me from this book. If one were unable to verify the ghost (or lack thereof) in person, these two phenomena would be otherwise indistinguishable.

Figure 13
Figure 13

Images of Two Tragedies (1654) at the Boston Public Library, courtesy of D. J. Schuldt.

Conclusion

The findings of this case study raise challenging questions surrounding how to interpret the results without glorifying in-person archival visits, which require scholars to receive enough funding to support their travels and lodging. Sarah Werner rightfully asks: “What could we come up with if we put some open-minded bibliographers and keen coders in a room together?” (Werner 2011). Perhaps the answer lies in widespread digital innovation: scholars such as Bill Endres argue for the use of 3D scanning of manuscripts as a means of better capturing the materiality of manuscripts (Endres 2024, 189). Werner and Endres both demonstrate the value of interdisciplinary collaboration. As Ashley Reed states with regard to digital humanities projects more broadly, “we should acknowledge and foreground the interdependencies between different kinds of labour and recognize the ecologies of creativity that make both art and scholarship possible” (Reed 2016, 38). It is this commitment to developing digital processes while centring humanistic knowledge that I argue should continue to be extended to digitization resources in particular. I thus contend that the digitization of early modern texts introduces ambiguities, digital anomalies that can obscure or distort the material history of these works, thereby challenging the reliability of digital surrogates for historical inquiry. This research highlights the stakes of providing scholars with opportunities for in-person archival study, particularly in cases where digital surrogates present anomalies that resist categorization. These stakes are especially high for scholars relying on digital scans to conduct research, as addressing these uncertainties requires innovation in digitization processes, and sustained investment in methodologies that bridge the gap between digital and physical archives, ensuring the reliability of digital surrogates for literary-historical research.

Although the “ghost” was ultimately identified as a mere relic of remediation, it nevertheless remains an object of scholarly interest. Epistemologically, the study challenges the reliability of digital surrogates by demonstrating how technical imperfections can mimic historical artifacts, thereby necessitating a more critical evaluation of digital reproductions as evidentiary sources. Methodologically, it underscores the imperative of corroborating digital observations with physical examination, an approach that, unfortunately, requires funding for scholars’ work. Institutionally, the study advocates for enhanced quality control in digitization projects and emphasizes the importance of maintaining accessible physical archives, thereby opening opportunities for early career researchers to access and analyze manuscripts in person, fostering a more comprehensive scholarly practice.

This case study demonstrates that digital anomalies are not isolated curiosities, but, rather, symptomatic of systemic challenges that undermine the reliability of digital surrogates. By revealing how technical imperfections can mimic historical artifacts, the study highlights the necessity of rigorous triangulation methods, namely, cross-referencing metadata and consulting multiple digital sources, particularly when direct verification through technician interviews or in-person archival visits is not feasible. The prevalence of these anomalies calls into question the assumption that digitization processes are foolproof and neutral, thereby advocating for enhanced quality control measures in digitization projects and sustained access to physical archives, particularly for early career scholars who might not have access to as many fellowship opportunities as their more senior colleagues. Ultimately, these findings underscore the urgency of reevaluating current methodological practices to ensure that digital reproductions serve as robust and trustworthy sources in bibliographic inquiry.

Competing interests

The author has no competing interests to declare.

Contributions

Editorial

Section Editor

  • Davide Pafumi, The Journal Incubator, University of Lethbridge, Canada

Copy and Production Editor

  • Christa Avram, The Journal Incubator, University of Lethbridge, Canada

Copy and Layout Editor

  • A K M Iftekhar Khalid, The Journal Incubator, University of Lethbridge, Canada

References

Benjamin, Walter. 1968. “The Work of Art in the Age of Mechanical Reproduction.” In Illuminations, edited by Hannah Arendt, 219–253. Harcourt, Brace & World.

Bennet Segler, Angela. 2013. “Touched for the Very First Time: Losing My Manuscript Virginity.” In Transparent Things: A Cabinet, edited by Maggie M. Williams and Karen Eileen Overbey, 39–55. Punctum Books.

Binkley, Robert. (1935) 1948. “New Tools for Men of Letters.” Yale Review 24 (Spring): 519–537. Reprinted in Selected Papers of Robert C. Binkley, edited by Max H. Fisch, 179–197. Harvard University Press. Accessed April 22, 2025. https://www.wallandbinkley.com/rcb/works/new-tools-for-men-of-letters.

Boston Public Library. 2024. “Thomas Pennant Barton Collection (Rare Books & Manuscripts).” Accessed April 22, 2025. https://guides.bpl.org/barton/overview.

Endres, Bill. 2024. “Digitization as Scholarly Intervention and Interpretive Act: A Case for 3D Capture in Studying the Agency of Materiality.” Manuscript Studies: A Journal of the Schoenberg Institute for Manuscript Studies 9 (2): 185–221. Accessed April 22, 2025.  http://doi.org/10.1353/mns.2024.a945373.

Gavin, Michael. 2019. “How to Think About EEBO.” Textual Cultures 11 (1–2): 70–105. Accessed April 22, 2025.  http://doi.org/10.14434/textual.v11i1-2.23570.

Gitelman, Lisa. 2006. Always Already New: Media, History, and the Data of Culture. MIT Press.

Howarth, Lynne C. 2012. “FRBR and Linked Data: Connecting FRBR and Linked Data.” Cataloging & Classification Quarterly 50 (5–7): 763–776. Accessed April 22, 2025.  http://doi.org/10.1080/01639374.2012.680835.

Lesser, Zachary. 2021. Ghosts, Holes, Rips and Scrapes: Shakespeare in 1619, Bibliography in the Longue Duree. University of Pennsylvania Press.

Mak, Bonnie. 2014. “Archaeology of a Digitization.” Journal of the Association for Information Science and Technology 65 (8): 1515–1526. Accessed April 22, 2025.  http://doi.org/10.1002/asi.23061.

May, Thomas. 1654a. Two Tragedies: Viz. Cleopatra, Queene of Aegypt, and Agrippina, Empress of Rome. Printed for Humphrey Mosely, London. Available at Internet Archive. Accessed April 22, 2025. https://archive.org/details/twotragediesvizc00mayt/page/n217/mode/2up.

May, Thomas. 1654b. Two Tragedies Viz. Cleopatra, Queene of Ægypt, and Agrippina, Empress of Rome. Printed for Humphrey Mosely, London. Available at ProQuest. Accessed April 22, 2025. https://www.proquest.com/books/two-tragedies-viz-cleopatra-queene-ægypt/docview/2264181258/se-2.

Parikka, Jussi. 2012. What Is Media Archaeology? Polity Press.

Pearsall, Derek. 2000. “The Value/s of Manuscript Study: A Personal Retrospect.” Journal of the Early Book Society for the Study of Manuscripts and Printing History 3: 167–181.

Porter, Dot. 2018. “The Uncanny Valley and the Ghost in the Machine: A Discussion of Analogies for Thinking about Digitized Medieval Manuscripts.” Paper presented at University of Kansas Digital Humanities Seminar, September 17. Available at Dot Porter Digital, October 31. Accessed April 22, 2025. http://www.dotporterdigital.org/the-uncanny-valley-and-the-ghost-in-the-machine-a-discussion-of-analogies-for-thinking-about-digitized-medieval-manuscripts/.

Reed, Ashley. 2016. “Craft and Care: The Making Movement, Catherine Back, and the Digital Humanities.” Essays in Romanticism 23: 23–38. Accessed April 22, 2025.  http://doi.org/10.3828/eir.2016.23.1.4.

S. Leigh Sotheby & John Wilkinson. 1856–1858. Very Valuable Portion of the Extensive Library of John Harward, Esq. of Stourbridge […]. J. Davy and Sons, Printers. Available at HathiTrust Digital Library. Accessed April 15, 2025. https://babel.hathitrust.org/cgi/pt?id=hvd.32044080257959&seq=66.

Tabor, Stephen. 2007. “ESTC and the Bibliographical Community.” The Library 8 (4): 367–386. Accessed April 22, 2025.  http://doi.org/10.1093/library/8.4.367.

Tillett, Barbara B. 2003. “The FRBR Model (Functional Requirements for Bibliographic Records).” Paper presented at the ALCTS Institute on Metadata and AACR2, San Jose, CA, April 4–5. Accessed April 22, 2025. https://www.loc.gov/catdir/cpso/frbreng.pdf.

Treharne, Elaine. 2013. “Fleshing out the Text: The Transcendent Manuscript in the Digital Age.” postmedieval: A Journal of Medieval Cultural Studies 4: 465–478. Accessed April 22, 2025.  http://doi.org/10.1057/pmed.2013.36.

Werner, Sarah. 2011. “Fetishizing Books and Textualizing the Digital.” Wynken de Worde (blog), July 24. Accessed April 22, 2025. https://sarahwerner.net/blog/2011/07/fetishizing-books-and-textualizing-the-digital/.

Whearty, Bridget. 2022. Digital Codicology: Medieval Books and Modern Labor. Stanford University Press.