The field of language documentation has made great advances in the last decade by embracing digital tools and methods. This has greatly improved the quality of data collected, and the types of data that can be returned to the minority language speakers with whom linguists work. Although there have been many positive developments from embracing digital methods, this shift has also created an even greater divide between linguists and the project participants who do not have access to digital technologies (Language documentation and digital methods). In this paper I look at the responsibilities of researchers when it comes to using digital tools when working with people who do not have digital literacy. I draw on my own experiences of fieldwork with speakers of Tibeto-Burman languages in Nepal who have a range of digital proficiency (Observing the digital divide). I also discuss two language documentation projects that seek to bridge this digital divide, and the methods that they are employing in this endeavour, the Iltyem-iltyem sign website and the Aikuma language documentation phone application (Bridging the digital divide). From observation of my own practice and the work of others I then discuss the challenges we face in bridging the digital divide, and how we can meet them (Future directions and challenges). Although this paper is based on language documentation, the implications of this discussion are relevant to all in digital humanities whose work involves people who may not be fully digitally literate.
Of the approximately 6000 languages spoken in the world today, current trends indicate that around 90% of these will be moribund by the end of this century (Krauss 1992). While languages have risen and fallen into silence across human history, never has this process occurred on such a massive scale. 96% of the worlds languages are spoken by just 4% of the population (Crystal 2000, p. 14), and many of these languages have scant documentation, meaning that there is a great deal of human linguistic diversity that will vanish without record. In the same decades that people came to realise this impending threat, digital technologies have become a mainstay of language documentation methods.
Placed as we are, at the start of a global haemorrhage of linguistic diversity, there is an imperative to document as much of this variety as we possibly can. This is a moral imperative, as language attrition is closely linked to imbalances in access to economic and educational advancement (Crystal 2000, p. 77-79), but it is also a scientific imperative, as each language is a natural experiment in the breadth and limitations of human cognition (Evans 2010). With the improvement in digital technologies and methods, linguists are now able to collect more data of higher quality thanks to portable digital audio and video recorders, and this data can be processed in more sophisticated and efficient ways (Seifart 2012). I will briefly outline a contemporary language documentation workflow to demonstrate how deeply embedded digital processes are. I will then discuss some of the limitations of this workflow in regards to the relationship between a researcher and the community they work with.
Language documentation includes the recording of natural or elicited speech, which is then transcribed and analysed. Earlier generations of recordings were made on analogue tapes, but contemporary solid-state audio recorders and digital video cameras mean that the recordings are born digital. This allows us to record more materials, as there are fewer physical storage constraints, and allows for more expedient return of these materials to the speakers, as they can be transmitted in a number of formats depending on individual needs and preferences. There are a number of different transcription programs that allow for the transcription text to be time-aligned to the relevant speech segment, and then lexical databases can be used to build a corpus where these transcriptions can be collated and analysed. These databases can be used to create other outputs, including written versions of narratives, wordlists and dictionaries for the community, or academic publications. Archiving in digital archives such as the Pacific And Regional Archive for Digital Sources in Endangered Culture (PARADISEC) (Thieberger 2004) makes this work accessible to future audiences. This leads to issues relating to the return of archived data to communities that may no longer use their ancestral language, so that they may reconnect with original materials, which has been a starting point for many contemporary revival projects. Being able to link analysis back to the original recordings through time aligned accessible recordings has also set a new benchmark for transparent and replicable analysis, bringing linguistics further in line with the scientific method (Thieberger and Berez 2012).
These opportunities that digital methods have opened up have certainly been welcomed in the field of language documentation, but embracing the digital humanities has also led to new methodological issues. One that has become apparent in my own research is that the increased use of digital workflow has created even greater initial distance between myself and the people I work with, who do not have ready access to computers or Internet infrastructure. This topic is of particular importance as the relationship between linguist and community is an ongoing negotiation and discussion. Many researchers already recognise that it is a relationship of asymmetrical power, with the linguist usually being more socio-economically mobile and more educated than the people with whom they work (Bowern 2008, p. 165; Chelliah and de Reuse 2011, p. 163; Crowley 2007, p. 144). The increase in digital tools in language documentation is another factor that may be responsible for furthering the division between digitally literate linguist and people with whom they work.
Of course, this digital bridging is not always one-directional. Many linguists have proactively sought to improve their own digital literacy to maintain their relationship with the communities they work with. This may involve purchasing a smartphone or learning how to build web-based dictionaries instead of print ones. In the case of one colleague in North-East India, this engagement has included creating a Facebook account to maintain contact and build relationships with people in a number of communities including Tangsa-speaker, Tai communities interested in language revival (particularly Tai Ahom and Tai Khamyang), and students at Gauhati University assisting in linguistic research (Stephen Morey p.c.). Therefore, we should not assume that the transmission of digital literacy is always one-way.
As a researcher who began language documentation research in the last five years, digital methods have been an expected feature of my workflow. This has had many benefits for my research outputs and for the kinds of materials I can return to the communities with whom I work. It has also created an unexpected dimension to my relationship with the people I work with. In this section I reflect on my own research experiences with speakers of Lamjung Yolmo and Kagate, two different Tibeto-Burman languages of Nepal. Although there are many similarities between the two languages, my experiences with speakers' access to technology, and understanding of digital data, have been very different and highlight some of the challenges that researchers in the digital humanities face.
The Lamjung Yolmo speakers with whom I worked had limited access to technology. This was, for some, in combination with having only limited general literacy. Other speakers were formally educated, but lacked digital literacy through poverty and unreliable electricity in their villages. For this project I had received ethics clearance from my university, and had plain language statements, and consent forms, which I also presented to people verbally. What I soon discovered is that these are useful tools for initial discussion, but a formal ethics apparatus is just the beginning of a more complex set of interactions, discussions and education on both sides. Two particular interactions around technology are particularly memorable, and highlight the nature of the divide between the digitally literate and illiterate. The first involves a 32-year-old illiterate woman and the second her 17-year-old high school-educated nephew.
Like many of her generation, 32-year-old Sanu (names have been changed) only attended school sporadically for a couple of years around the age of 6-8 before she was kept at home to work. She has minimal functional literacy, and only recently purchased her first mobile phone. I spent a lot of time with Sanu and her family while living in their village, and we spent some time listening to recordings so she could assist me in transcription. After several of these sessions she turned to me and observed that we could hear the laughter of the participants in the recording. I explained that the recorder picked up all sounds, just like a telephone. She then asked if laughter like this was kept in all recordings. I realised that until that point in the work process Sanu had not fully realised the implications of recording in the language documentation process, and I had no reason to consider that she didn't. We talked about this a great deal more, and came to a better understanding of our own ideas about the recording and documentation process.
In comparison to his aunt's experiences, 17-year-old Rajesh's opportunities demonstrate a new focus for Yolmo speakers in Lamjung on education as a route to economic mobility. Rajesh completed his high school certificate and was then given the opportunity to study at a college in the Chitwan area. I spent time with Rajesh during his final years of local schooling. He was bright, and a highly engaged student, but he had never used a computer until I sat with him and taught him how to use my laptop. Explaning basic user-interface concepts, such as the difference between an application icon and a directory folder icon, drove home for me the difference in our digital literacy. Although I had explained to him the recording and transcription process, this experience reminded me there were so much underlying that he needed to learn first.
These experiences were in stark contrast to my interactions with speakers of Kagate with whom I worked. Although I have spent less time working with the Kagate community, this work has been much more efficient, which is partly due to the participants' engagement with technology. One of the Kagate speakers I have worked most closely with to date is 25-year-old, digitally literate, and worked for an Internet Service Provider before going overseas to do a Masters in IT. His knowledge of computer systems allowed him to understand the language documentation workflow and participate in it more fully. I have been able to purchase a computer and audio-recorder to leave with him and have trained him to transcribe recordings. His understanding of the Internet meant that he was able to comprehend the implications of Internet archiving, and communicate this to members of his family in a way that related to their own experiences of digital technologies.
Although my experiences have been different within these two communities, this is not necessarily a static situation. There are some Lamjung Yolmo speakers who have more access to computers and technology than the people I have come to work most closely with. Likewise, there are many speakers of Kagate who still live in villages with very little electricity and minimal technological access. I do not advocate that we necessarily seek out participants who are more technologically proficient to reduce the challenges we face, nor do I advocate exclusively focusing on those who lack access to digital at the expense of people who may make very engaged project participants. Instead we need to be mindful that different people bring different experiences to a language documentation project.
In working with Lamjung Yolmo and Kagate my experiences have been overwhelmingly positive, but the discussions of the ethical and practical implications of the digital workflow have been very different. In my work with Lamjung Yolmo speakers I have kept coming back to the same question; for participants who do not understand digital permanence and the pervasive spread of the Internet, how do we really measure informed consent for a project so deeply entrenched in digital methods? For me, this has involved ongoing discussion, demonstration and trust-building. This process has not always been easy, and has occasionally been time consuming. This has led me to another question; what responsibility do we have for explaining how our digital tools and processes work? Digital methodologies can make our work even more esoteric and inaccessible for the very people who we work most closely with. Linguists often talk about our role in empowering community members to meet their own linguistic needs and aspirations (Cheilliah and de Reuse 2011, p. 141; Crowley 2007; Florey 2008), but how far does this extend to providing them with the digital literacy to enable this? To undertake this aim we have to build more time and equipment costs into our projects, and to explain to funders that this is an important part of our work, even if it doesn't necessarily lead to research outputs.
As Warschauer (2003) observes, digital technologies are best implemented in a meaningful context of literacy, education and community. This means that language documentation projects may offer one of the best avenues for sharing digital literacy with new populations. In my own work I have approached this in an ad hoc, participant-by-participant manner, but others are building digital education into the basis of their language documentation projects. I will now discuss two different projects that have formalised solutions to the concerns that I have raised in my own experience, and look at how participant education can co-exist with research outputs, and build richer documentation experiences for both endangered language speakers and linguistic researchers.
There are projects that are directly attempting to address the kinds of challenges that I discuss above. In this section I discuss two that I believe have innovative methodologies that engage the language speakers while still maintaining strong research outputs, making them compelling examples of the power the digital humanities have to engage new audiences with our research. The first is the Iltyem-iltyem sign language website and the second is the Aikuma mobile phone application. These projects have been developed around building digital capabilities for endangered language speakers as part of the documentation process, unlike the traditional documentation workflow, which makes no formal requirement for this and instead involves active intervention on behalf of individual researchers to address digital disparities between themselves and the communities they work with. I should note that these are not the only two projects to directly tackle digital capacity-building in the target communities, but they reflect two different approaches that address the same underlying issues.
The Iltyem-iltyem website was created as an online sign language dictionary resource for the alternate sign languages used in Indigenous communities in Central Australia. These sign language systems have been observed in many Central Australian language communities (Green and Wilkins, forthcoming; Kendon 1988), and sign is used as a strategy of speech avoidance as well as often being used with speech. Spanning a number of locations, the project included speakers/signers from five different languages; Anmatyerr, Warlpiri, Alyawarr, Ngaatjatjarra and Kaytetye. The project has been developed with the needs of these communities in mind. Many speakers of these languages who use the traditional alternate sign system are senior community members who have had relatively little interaction with digital technology and the Internet. Nevertheless they are enthusiastic about recording their knowledge for future generations. Below is a screenshot of the website, with videos for different signs (I've searched for the sign for "water").
The project was conceived with the Internet being the best platform for delivery, but also had to be developed in a way that allowed the participants to understand the full process. To ensure that participants had an opportunity to begin to understand the implications of videos being available on the Internet, a basic Tumblr blog was created during the development stages of the project. Although it is still live as of publication of this article the intention is that eventually it will be taken down, having served its purpose as a training vehicle for the larger project. On this blog participants were involved in selecting videos and images to be shared. They could then view them online and share them with others (Green, Woods and Foley 2011). Below is a screen capture of the Tumblr blog.
The use of the Tumblr blog throughout the initial phases of the project meant that by time the website went live in 2013, participants had some experience of sharing content online. This strategy helped reduce the digital divide between linguists and participants, and created greater understanding about the expectations held by each group. The final website also takes into account the perspectives and needs of the participants that were expressed throughout the initial development phases. Not only does the site require registration, but the administrators can easily alter what is available to view based on discussion with the community as a way of ensuring more fully-informed consent. This means that the content can be discussed and shared within the language communities first, allowing participants to decide when they will show particular content to a general audience (Carew 2013). As of September 2013, content from Anmatyerr, Ngaatjatjarra and Warlpiri became available online. Those materials that are yet to be made live, including recordings from Alyawarr and Kaytetye are awaiting thorough discussion with the community for permissions to be granted to make them available on the site (Green p.c.). These access permissions can easily be changed when the community are ready to share the content. The project will also involve a thorough review once the website has been operating for 3 years. Should no further funding be available the site will be archived. The project has been supported by the Indigenous Language Support (ILS) the Endangered Languages Documentation Program (ELDP) and is based at the Batchelor Indigenous Institute of Tertiary Education (BIITE) (Green p.c.). Coupled with the participant-inclusive project blog, this meant that across the project lifespan participants developed an understanding of the internet as a platform and are able to have informed control over the final content available on the website.
Aikuma has a similar focus on building participant digital literacy, although on a different digital platform. It is a free open source application for the Android operating system that can be networked to a collection of mobile phone handsets. Coupled with a local Wi-Fi router allows the phones to make a closed network of recordings, which can be listened to and translated, even in areas without existing mobile phone or Internet infrastructure. Although mobile phone recording has traditionally been of low-fidelity, the quality of components is improving and the handsets are becoming increasingly cost effective research tools. The other benefit of mobile technology is that in remote or regional communities this is often the first digital device that people will interact with. Mobile phone towers can often follow shortly after electricity, and generally long before Internet and landlines (and, in one area of Nepal I work in, well before plumbing and running water). With the number of mobile handsets projected to have outstripped the global population in 2014 (International Telecommunication Union 2014), their familiarity to many is a way of building further digital competencies through the introduction of "smart phones" with touch screens and greater functionality.
The Aikuma app involves a workflow based on the Basic Oral Language Documentation methodology (BOLD) (Bird 2010; Reiman 2010). People can choose to record stories, narratives or songs, or listen to existing recordings. In listening they can also use the application to intersperse the existing recording with a clear re-speaking in their language, or a translation into another language. The networked nature of the handsets means that a story recorded on any device can be listened to by anyone, and multiple translations of a single story can be recorded, with or without the language worker or linguist present (Hanke and Bird 2013). Should the community successfully take up the devices, this would remove a bottleneck in the documentation process, as larger volumes of data can be recorded, and initial translations provided. From the perspective of the language documentation fieldworker this could be seen to create a new bottleneck further down the documentation workflow, with a greater number of texts still needing close transcription and analysis, but it's the community relationship to the device that I wish to return to.
By literally putting the documentation process in people's hands, this allows participants to determine what is recorded, and to take control of this process. The use of the network and BOLD methodology also has wider implications for participants' understanding of the workflow used in contemporary language documentation. In coming to understand that what they record can be listened to by anyone, and that additional annotations can be made on original recordings, people can learn in a practical way about the distribution and use of digital artefacts. This means people can immediately begin to understand the implications of the documentation project, and consent to participation is informed on a much more immediate level. Since writing the original abstract for this paper, I have started a small research project with the Aikuma team to test the effectiveness of the application in the collection of audio recordings in Nepal, and also as a tool of building better informed consent. The quality of recording on mobile phone handsets is not yet the same quality as that of a professional quality solid state audio device, but there is no reason that Aikuma could not occur in parallel to the linguist's own language documentation workflow. In this way the linguist could use Aikuma as a tool to help educate participants in digital data.
While these are not the only projects that are proactively bridging the gap between the researcher and participants, they do illustrate some of the future challenges that linguistics and the digital humanities face. Above I discussed the benefits of building community education into language documentation projects, in this section I wish to touch on a range of challenges, and how we might rise to meet them. These challenges are broken down into language community challenges, research community challenges and technological challenges.
The biggest challenge we face in providing access to digital learning for the communities we work with is that these groups are highly heterogeneous, and all have their own needs and interests. Therefore, although I singled out the two projects above as being good models of engagement, they may not provide solutions relevant to all communities or projects. The Iltyem-iltyem project in particular was built upon the needs of a very specific group of communities where sign language traditions are still strong, and where interest and access to the Internet is growing. Even Aikuma, which is designed to be as broadly applicable as possible, has not had complete success at all trial sites, as finding a text-minimal interface and process that is remains functional is a challenge that is still being negotiated (Bird p.c.). Different communities will already have access to different levels of technology, as well as their own cultural attitudes that may inform their opinions regarding the use of technology and transfer of digital information. Linguists have always had to take into account the needs of different communities, but the technological needs pose a new level of need for considered discussion with the we work with. They may have expectations of what we can provide them in terms of technological infrastructure that are beyond the scope of what we are able to offer, or there may be people who feel left out from the process. These challenges are perhaps the most difficult of all, as there is no single solution, but the more researchers share their experiences of bringing digital humanities methodologies to the communities they work with, the better placed we will all be in our work.
The second community that we need to consider in this kind of work is the research community, and particularly the domain of the humanities, which poses some specific challenges. The first challenge is that the humanities has been slow to recognise digital outputs as important research outcomes. In Australia there are currently discussions underway to have archived corpora recognised as a research output by the Australian Research Council (ARC), which is the major public funding body in the country (Margetts et al. 2012; Thieberger 2012). Funding recognition will undoubtedly influence the importance universities place on these outputs. Just as data collection and corpus building need to be acknowledged, we also need to develop the expectation that in such projects there will be periods of community education, where other outputs might be slowed down or delayed (although in the longer run there may be many additional benefits). Some may argue that teaching people in remote communities about networked digital devices or internet distributed video is beyond the scope of a language documentation project, however if we are really doing our work as responsibly as possible then it is imperative that we incorporate this knowledge-building into our project work.
The final challenge I wish to touch on briefly is that there are still some technological limitations to what we can do. Digital methods are still bounded by very real constraints in fieldwork situations. Many remote areas where language documentation occurs still have minimal electricity, little access to mobile phone reception and no Internet connection. In these situations the digital technologies used by researchers are often deployed strategically and sparingly, with little scope for additional resources to also be used. There is also a time factor to be considered, in that digital technologies have brought with them a whole new set of work expectations for researchers. As discussed in the workflow above, it is now an expectation that there are time-aligned annotations and comprehensive lexical databases, which in turn require stable orthographic conventions to operate. Materials now need to be archived, which is a time consuming process even when the data is well managed. All of these features of the contemporary language documentation workflow are important, but we also need to acknowledge they are extra time-pressures. Likewise, training participants in any stage of this process is also time consuming – both for the researcher and the participants – but, as discussed above, should also be taken into consideration in research timelines.
The final technological challenge is to incorporate the need for digital literacy with the realisation that the global digital landscape is developing at a rapid pace. With the rapid expansion of access to digital technologies that we are witnessing, it may be that for many communities that currently have limited digital literacy that the issues and challenges raised in this article may be redundant in the very near future, or it may be that new sets of challenges take their place. Either way, we need to be sensitive to the developing needs of participants in light of our own research practices and their own digital access.
The advances made in the last two decades of digital research in the humanities have been impressive, but it has also led to a growing divide between researchers and the communities with whom we engage. In sharing some of my own research experiences I hope that I have shown that there is a need to build more meaningful relations between the methods we use and the groups we work with. Projects like Iltyem-iltyem and Aikuma offer examples of just how this divide might be bridged. Although there are challenges in undertaking research that also engages in participant education it is ultimately our responsibility as ethical researchers in the digital humanities that we address the imbalance of knowledge and ensure our research is accessible to our participants, especially those who are currently without digital literacy.
 In the year since I wrote this article, I have returned to Nepal and visited Rajesh, who now attends university in Kathmandu. One of Rajesh's uncles, who works overseas, gave him a small tablet device, and Rajesh took great delight at repeatedly outperforming me at Candy Crush, a reminder that digital literacy is not static, and there is growing access to technology in countries likes Nepal.
 Since writing this paper I have come to work with the Aikuma team, both as a Research Assistant directly contracted by the project, and in my own Postdoctoral Research project with the Kagate community.
Bird, Steven. 2010. "A Scalable Method for Preserving Oral Literature from Small Languages." In The Role of Digital Libraries in a Time of Global Change, edited by Gobinda Chowdhury, Chris Koo and Jane Hunter, 5-14. New York, Berlin: Springer
Bowern, Claire. 2008. Linguistic Fieldwork: A Practical Guide. Basingstoke, England; New York: Palgrave Macmillan.
Carew, Margaret. 2013. "Dimensions of a Non-Linear Workflow for Iltyem-iltyem — A Central Australian Sign Language." In Australian Linguistics Society Annual Conference. University of Melbourne, Melbourne. October 2-4.
Chelliah, Shobhana L. 2011.Handbook of Descriptive Linguistic Fieldwork. London: Springer.
Crowley, Terry. 2007. Field Linguistics: A Beginner's Guide, edited by Nicholas Thieberger, Oxford Linguistics. Oxford; New York: Oxford University Press.
Crystal, David. 2000. Language Death. Cambridge: Cambridge University Press.
Evans, Nicholas. 2010. Dying Words: Endangered Languages and What They have to Tell Us. Malden, MA: Wiley-Blackwell.
Florey, Margaret. 2008. "Language Activism and the New Linguistics Expanding Opportunities for Documenting Endangered Languages in Indonesia." In Language Documentation and Description, edited by Peter K. Austin, 120-35. London: School of Oriental and African Studies.
Green, Jennifer, and Wilkins, David. (forthcoming). "With or Without Speech: Arandic Sign Language from Central Australia." Australian Journal of Linguistics.
Green, Jennifer, Gail Woods, and Ben Foley. 2011."Looking at Language: Appropriate Design for Sign Language Resources in Remote Australian Indigenous Communities." In Sustainable Data from Digital Research, edited by Nicholas Thieberger, Linda Barwick, Rosey Billington and Jill Vaughan, 66-89. Melbourne: Custom Book Centre.
Hanke, Florian, and Steven Bird. 2013. "Large-Scale Text Collection for Unwritten Languages." Paper presented at the International Joint Conference on Natural Language Processing, Nagoya, Japan, October 14-18.
International Telecommunication Union. 2014. "The World in 2014: ICT Facts and Figures." ITU: Committed to Connecting the World. http://www.itu.int/en/ITU-D/Statistics/Pages/facts/default.aspx.
Kendon, Adam. 1988. Sign Languages of Aboriginal Australia. Cultural, Semiotic and Communicative Perspectives. Cambridge: Cambridge University Press.
Krauss, Michael E. 1992. "The World's Languages in Crisis." Language 68: 4-10.
Margetts, Anna, Stephen Morey, Simon Musgrave, Adam Schembri, and Nicholas Thieberger. 2012. "Assessing Curated Corpora as Research Output: Issues of Process and Evaluation." Paper presented at the Annual Conference for the Australian Linguistic Society, UWA, Perth, December 5-7.
Reiman, D. Will. 2010. "Basic Oral Language Documentation." Language Documentation and Conservation 4: 254-268.
Seifart, Frank. 2012."The Threefold Potential of Language Documentation." In Language Documentation & Conservation Special Publication No. 3: Potentials of Language Documentation: Methods, Analyses, and Utilization, edited by Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagman Jung, Anna Margetts and Paul Trilsbeek, 1-6. Manoa: University of Hawai'i Press.
Thieberger, Nicholas. 2004. "Paradisec: The Pacific and Regional Archive for Digital Sources in Endangered Cultures." Continuo, Journal of the International Association of Music Libraries, Archives and Documentation Centres 33: 31-33.
Thieberger, Nicholas. 2012. "Counting Collections." Endangered Languages and Cultures, November 29. http://www.paradisec.org.au/blog/2012/11/counting-collections/.
Thieberger, Nicholas, and Andrea L. Berez. 2012. "Linguistic Data Management." In The Oxford Handbook of Linguistic Fieldwork, edited by Nicholas Thieberger, 90-118. Oxford: Oxford University Press.
Warschauer, Mark. 2003. Technology and Social Inclusion: Rethinking the Digital Divide. Cambridge: MIT Press.