Absorbing DiRT: Tool Directories in the Digital Age

In the summer of 2017, Quinn Dombrowski, an IT staff member in UC Berkeley’s Research IT group, approached Geoffrey Rockwell about the possibility of merging the DiRT Directory with TAPoR, both popular tool discovery portals. Dombrowski could no longer offer the time commitment required to maintain the organizational structure of the volunteer-run tool directory (2018). This decommissioning of DiRT illustrates a set of problems in the digital humanities around tool directories and the tools within as academic contributions. Tool development, in general, is not considered sufficiently scholarly and often suffers from a lack of ongoing support (Ramsay & Rockwell, 2012). When tool discovery portals are no longer maintained due to a lack of ongoing funding, this leads to a loss of digital humanities knowledge and history. While volunteer-based directories require less outright funding, managing and motivating those volunteers to ensure that they remain actively involved in directory upkeep requires a vast amount work to ensure long-term sustainability (Dombrowski, 2018). This paper will explore the difficult history of tool discovery catalogues and portals and the steps being taken to save the DiRT Directory by integrating it into TAPoR. In particular, we will: – Provide a brief history of the attempts to catalogue tools for digital humanists starting with the first software catalogues, such as those circulated through societies, and ending with digital discovery portals, including DiRT Directory and TAPoR. – Discuss the challenges around the maintenance of discovery portals – Consider the design and metadata decisions made in the merging of DiRT Directory with TAPoR.

In the summer of 2017, Quinn Dombrowski, an IT staff member in UC Berkeley's Research IT group, approached Geoffrey Rockwell about the possibility of merging the DiRT Directory with TAPoR, both popular tool discovery portals. Dombrowski could no longer offer the time commitment required to maintain the organizational structure of the volunteer-run tool directory (2018). This decommissioning of DiRT illustrates a set of problems in the digital humanities around tool directories and the tools within as academic contributions. Tool development, in general, is not considered sufficiently scholarly and often suffers from a lack of ongoing support (Ramsay & Rockwell, 2012). When tool discovery portals are no longer maintained due to a lack of ongoing funding, this leads to a loss of digital humanities knowledge and history. While volunteer-based directories require less outright funding, managing and motivating those volunteers to ensure that they remain actively involved in directory upkeep requires a vast amount work to ensure long-term sustainability (Dombrowski, 2018). This paper will explore the difficult history of tool discovery catalogues and portals and the steps being taken to save the DiRT Directory by integrating it into TAPoR. In particular, we will: -Provide a brief history of the attempts to catalogue tools for digital humanists starting with the first software catalogues, such as those circulated through societies, and ending with digital discovery portals, including DiRT Directory and TAPoR. -Discuss the challenges around the maintenance of discovery portals -Consider the design and metadata decisions made in the merging of DiRT Directory with TAPoR. Keywords: tool directories; tools; TAPoR; DiRT Directory; digital infrastructure À l'été 2017, Quinn Dombrowski, un membre du personnel informatique du groupe de recherche informatique de l'Université de Californie Berkeley, est allée discuter avec Geoffrey Rockwell de la possibilité de fusionner le répertoire DiRT avec TAPoR, tous deux étant des portails populaires pour découvrir des outils. Dombrowski ne pouvait plus consacrer le temps requis pour maintenir la structure organisationnelle de ce répertoire d'outils géré par des bénévoles (2018). Ce démantèlement de DiRT démontre plusieurs problèmes existants dans les Humanités numériques qui concernent des répertoires d'outils et les outils eux-mêmes en tant que contributions académiques. Le développement d'outils, en général, n'est pas suffisamment considéré dans le domaine académique et souffre souvent d'un manque de soutien continu (Ramsay & Rockwell, 2012). Lorsque des portails pour découvrir des outils ne sont plus maintenus à cause d'un manque de financement continu, il y a des pertes de connaissances et d'histoire dans les Humanités numériques. Bien que les répertoires gérés par des bénévoles nécessitent moins de financement initial, organiser et motiver ces bénévoles exigent une grande quantité de travail pour garantir qu'ils continuent à participer activement à la maintenance d'un répertoire et pour assurer sa viabilité à long terme (Dombrowski, 2018). Cet article examine l'histoire difficile de catalogues et de répertoires pour découvrir des outils, ainsi que les mesures prises pour sauver le répertoire DiRT en l'intégrant à TAPoR. Nous allons en particulier fournir une histoire brève des tentatives de catalogage des outils pour les Humanités numériques, ce qui commence par les premiers répertoires de software, tels que ceux qui se sont diffusés dans la société, et ce qui finit par des portails numériques pour découvrir des outils, y compris les répertoires DiRT et TAPoR. Nous allons discuter des défis relatifs à la maintenance de tels portails. Nous allons aussi considérer les décisions concernant les métadonnées et la conception, qui datent de la période de fusion du répertoire DiRT avec TAPoR. to create its own historiography, to know itself through its history of intellectual contributions. This paper is about one type of resource, the tool directory, developed to try to keep track of the tools Digital Humanists have made. We will discuss the problems of tool development, discovery, and preservation; provide a brief history Humanities disciplines that deal in discourse. Humanists not only tend to study discourse as a privileged form of expression, but we also think of (print) discourse as the medium for our academic exchanges. This has been a perennial problem in the Digital Humanities because it means that the tools or new media works that we both study and express ourselves in are difficult to value in the academy (Rockwell 2011). A set of tools like Voyant (voyant-tools.org) might have hundreds of thousands of users a year, but it is difficult to formally justify it as a scholarly contribution to a tenure and promotion committee that counts publications.

Mots
The problem is not limited to the scholarly value of tool building. Most would agree that tools and their associated documentation can bear meaning, but DH is still struggling with ways to formally evaluate them without the apparatus of journals and peer-review. The problem is the infrastructure of valuation starting with the ways we remember what has been done and why. This is a problem the Digital Humanities shares with overlapping fields like Instructional Technology and Game Studies, both of which also value software things as objects of study and objects of creation (for example, see Newman 2012 on the preservation of games). To properly value software things we need a stack of infrastructure, starting with records of what was done, as software has a way of disappearing so quickly as to be almost ephemeral. For example, FANGORN and SNAP are both historical tools that were designed specifically for Humanists to assist with text analysis, but they are no longer maintained for active use (TAPoR 2019). There are some organizations doing this preservation work. For example, the Internet Archive's Software Library preserves decades of computer software that can be accessed and used through their JSMESS emulator (Internet Archive 2014).
TAPoR 3.0 and the DiRT (2019) Directory are tool discovery portals for the digital age that try to meet the need for knowledge about tools by recording sufficient information about tools and other resources that can be discovered and surveyed, but, unlike the Internet Archive, they are not preserving the software itself. In this case, they provide access to the metadata and important information about a host of digital tools and software so that researchers can determine the best tool for their project, and also understand where the tools came from by examining the history. Nonetheless, this is only one model for how knowledge about tools can be gathered and organized.
The role of tool directories in the digital age should concern Digital Humanists as tool directories have a long history of supporting and providing recognition for First, keep infrastructure small and simple enough that it can survive during dry funding spells. While the initial idea of the TAPoR "portal" was to integrate various resources from social media, text repositories, tools as web services, and ways of chaining tools in one place, this proved very hard to maintain. There were, and are, better resources available that the TAPoR project was trying to replicate in order to have a full-service portal. In version 2.0, TAPoR narrowed its focus to the discovery of tools. The tools themselves were spun off into projects like TAPoRware, TATToo, and most importantly, Voyant. TAPoRware was a set of tools designed specifically for TAPoR 2.0. They were simple tools that could be deployed as demonstration web services.
TATToo was an embeddable toolbar that could be put into other websites where it would operate on the content of whatever page it was on (See Rockwell et al. 2010).
Next, scale infrastructure down to what can be led and maintained by a faculty member with university support. Faculty already have access to a certain number of resources, depending on local computing support. Faculty at most research-intensive universities can get small local grants, involve research assistants, involve students, apply for grants and so on. Infrastructure that is scaled to the support that a faculty member can obtain on their own can survive the dry years; however, this necessitates a faculty lead for the project, rather than a librarian, IT staff (such as Dombrowski), or other alt-ac roles.
Keep infrastructure modular so that it can connect with other projects easily.
Rather than trying to create a vertical portal that includes everything and would be complicated to maintain. In version 2.0, TAPoR focused on doing one thing well that others weren't doing and doing it in a way that could fit with other projects.
This can take multiple forms. While DiRT focused on technological integration by developing an API, the TAPoR project took the approach of "political" integration by making it easy to be written into other projects' grant proposals. The latter approach does not lead to further proliferation of infrastructure that must be maintained, making it, by definition, more sustainable. Do one thing well and then build out features as new opportunities, partners and projects need them. Version 3 of TAPoR began adding features that made sense for the projects like Text Mining the Novel (https://novel-tm.ca/), which was contributing funding. As well, projects could take new features that need to be implemented and implement them in a more broadly reusable and integrated fashion within an existing framework. This may take some rework in the existing code and appears to be more cumbersome, but it avoids bloated code in the long run. TAPoR 3.0 has, wherever possible, been implemented by the University of Alberta's Arts Resource Center in a way that any custom code can be reused (and maintained) for other projects.
Finally, beware the siren call of crowdsourcing. Since the success of the Suda On Line (Mahoney 2009) there has been the hope that projects could get human labour from the crowd. The DiRT Directory has shown that often the work of motivating and organizing volunteers can be as time consuming as the work those volunteers do.
As promising as crowdsourcing is, its value lies more in how it can engage a broader community than how much work it saves (Rockwell 2012). As shown above, there are other ways of securing ongoing support that allow for more ambitious projects, this is how the TAPoR project has survived, and hopefully will continue to survive. With the addition of DiRT Directory's tool data and a larger mandate, TAPoR 3.0 plans to involve more scholarly associations in the support of the infrastructure which may provide other avenues for sustainability.
DiRT Directory's successes and failures in this regard may prove informative. In order to reduce the risk of being shut down by UC Berkeley's central IT division on account of being unrelated to Berkeley-specific IT service offerings, ownership of DiRT Directory was formally transferred to centerNet, an ADHO member organization. In this arrangement, centerNet provided an organizational home for the project, and would coordinate opportunities for partnerships and joint development with other centerNet projects (such as, the project directory DHCommons, which itself faced sustainability challenges similar to DiRT's). In principle, centerNet's member centers would serve as an ongoing source of volunteers for maintaining DiRT. In practice, however, the volunteer model is crucially dependent on the active involvement of a project director, and the arrangement with centerNet had no provisions for financially supporting the director position. This could be done through a buy-out of time to ensure the director could continue to work on DiRT even in the absence of a DH-specific position funded by her employer, or for replacing the director if she became unavailable, perhaps through a more financially sustainable dedicated graduate student position. Fundamentally, a tool directory's survival is dependent on funding-which may be modest but must be fairly consistent-to pay for a position of some sort that can ensure the currency of the listings, either through their own labor or through engaging a community of volunteers.

Absorbing DiRT
The decision to merge the DiRT Directory with TAPoR 3.0 was a difficult one. It not only highlights the lack of ongoing support for tool discovery portals, but it also represents the end of a project (Ruecker et al. 2012). As a part of this project, we specifically aimed to merge the two directories together into a larger tool discovery portal that combined the best parts of both original projects. This process used the following steps: 1. First, we examined the metadata structure of the DiRT Directory to see what information the site was holding on each tool. At the same time, we explored possible ways to integrate DiRT's data with TAPoR 3.0's.
2. Next, we mapped a crosswalk of the metadata on DiRT and the metadata on TAPoR 3.0. This allowed us to see which fields are shared between the sites and determine which fields would need to be added to TAPoR 3.0, and which fields on TAPoR 3.0 would need to be populated for the DiRT Directory's tools. This process required meeting with programmers at the University of Alberta's Arts Resource Centre. Kamal Ranaweera and Omar Rodriguez-Arenas worked with us, providing technical support and advice throughout the project, and, most importantly, completing the actual migration of the data to TAPoR 3.0.
3. After finalizing the data and fields mapping, we moved on to data cleaning. Quinn Dombrowski provided a spreadsheet in comma-separated-value format of all 988 tools. This file was uploaded to OpenRefine (http://openrefine.org/), which was used to make overarching changes to the data. For example, using our fields maps, we relabelled DiRT Directory's platform data to match TAPoR 3.0's web-usable data. The final step of this process was to delete any duplicates or empty tools. This process brought the total tool count to 950. 4. The final step of the project was to hand over the data to the Arts Resource Centre and allow the ingestion process to begin.
Overall, the process went very smoothly. We began the project in September 2017 and successfully integrated the tools from DiRT to TAPoR 3.0 in May 2018.
While the integration process is complete, there is an ongoing data cleaning project as we not only integrated almost a thousand new tools into TAPoR 3.0, but we also added some new fields to the descriptive metadata of the tools, and we expanded the scope of TAPoR 3.0 beyond text analysis. Most obvious, is a lack of consistency in the descriptions across the tool directory. Moving forward, we continue to try and find the most effective way to share important information about tools through trial and error.

Discussion
Records of software take many forms, and in this essay, we have outlined a history of one form, directories of Digital Humanities tools, but there are other types of records like grant proposals, design and development documents, manuals, brochures, web documentation, reviews, conference papers and code. Developing memory infrastructure like directories is not as simple as preserving documentation.
It is also a matter of structuring the records of tools so that they can be managed and found (Bowker and Starr 2000). As Bowker (2008) points out, the development of memory infrastructure is a structuring process of developing practices that are supported by infrastructure and in turn reinforce the need for infrastructure. This is where we are in the Digital Humanities; we have experimental infrastructure, but it hasn't yet been woven into the practices of the field partly because the practices are still emerging. DH has not become disciplinary in the sense of a self-perpetuating field that has stable practices and infrastructure. This means that there isn't yet the recognition and support for infrastructure like directories and portals. It is possible to get grants to build them as experimental infrastructure, but we haven't found a way to weave them into a changing discipline so that they are maintained. By contrast, we have developed journals in the field that do have long term support. This paper documents attempts to develop disciplinary infrastructure at, and as a moment of, disciplinary formation. The attempts, failures, and successes say much about our formation.
Inevitably one wonders how directories of tools could be better supported.
How might knowledge about tools be preserved and made available? Are directories the best way to do so, or should we give up and depend on Google to manage our history? Some directions suggest themselves: It may be time to go back to including reviews or notices about tools in journals. Journals in the Digital Humanities like DHQ (http://www.digitalhumanities.org/dhq/) have proven maintainable and could integrate tool reviews and support for directories into their online practices. Notices about notable new tools could be included in journal issues and then archived in a tool directory like TAPoR 3.0.

Grant et al: Absorbing DiRT
Art. 4, page 13 of 18 As mentioned above, we could learn from the MERLOT model, where there are editors who get credit for maintaining a sub-portal on best learning resources for a discipline (https://www.merlot.org). Tool directories could develop Associate Editor positions that would offer scholarly credit for curating and managing a list of specified tools. This would help maintain the directory while also providing an opportunity for moving tool directory maintenance into the traditional scholarly outputs that are more easily recognized by tenure and hiring committees. Finally, it may be prudent to recognize that tool directories have a life span tied to funding, thus making long term support unnecessary. The important thing is to find a way to preserve the data so that it can be passed on and reused as new projects arise with new models (Rockwell et al. 2014).

Conclusion
In conclusion, absorbing the DiRT Directory into TAPoR 3.0 forced our team to wrestle with the some of the major problems facing the Digital Humanities as a field.
First and foremost, the debate on the importance of tools and tool development, as well as, the role of tool directories in encouraging the maintenance of tools and software. Furthermore, the process of integrating the two directories encouraged us to consider the issues of long-term access and support. This leaves us with an important question to consider: what will happen if TAPoR 3.0 is no longer able to be maintained or supported? If the experience and data is archived in an accessible form, does it really matter if any particular tool like TAPoR 3.0 disappears?
We would like to thank Dr. Andrew Piper (McGill University) and NovelTM: Text Mining the Novel, a project funded by a SSHRC Partnership Grant, for their support with this project.