Introduction: 1. Participatory Action Research for language documentation and preservation

East Cree is a native American language of the Algonquian family, spoken in Northern Quebec in the James Bay area. It has 13 000 speakers spread over 9 different communities and a vast geographical area. There are two dialects, Northern East Cree and Southern East Cree, the latter consisting of two sub-dialects, Inland and Coastal. In 1995, Cree became the language of instruction from kindergarten up to Grade 3 in all Cree schools managed by the Cree School Board, creating a greater need for teaching resources for language and culture courses. The website was created with the intention to explore how information technology can assist the creation and distribution of Cree language resources. A participatory action research framework was adopted (Morris &Muzychka, 2002, Junker, 2002), which meant 1) that we would focus on the research PROCESS rather than on the research RESULTS; 2) that the success of our research would depend on the positive impact it had on language and speakers; 3) that we would define our goals and methods in collaboration with our partners.

A partnership had been established in 2001 with Cree programs, a department of the Cree School Board that specializes in creating resources for Cree language and culture courses. We work together to ensure the participation and feedback of speakers, curriculum designers, and teachers of the language. In the process, we are also training willing native speakers in relevant areas like maintaining the online databases and editing/archiving digital sound records.

Because the Cree schools are spread over a vast territory, a first challenge was to overcome the problem of distance communication. Information Technology seemed perfectly suited for this, but few of the existing tools had been explored or adapted when we started. Another major challenge is that Cree uses a syllabics writing system to which computer technology has been up to now rather unfriendly (Jancewicz and Junker, 2002, 2003). With e-mail and chat-rooms becoming increasingly popular among native people, but only available in the colonial languages, we felt that these tools were one of the many reasons that native languages and culture are losing ground relatively fast to western influence. Another goal of our project was to record and try to preserve what was left of the memories of the elders and of the Cree way of thinking and general world view imprinted on the language with its several dialects.

2. The web databases

The web databases accessible at were developed in order to systematically organize language material and knowledge in culturally sensitive ways. We wanted for example to preserve the oral tradition, a thousand-year old practice in Cree culture, and felt that Information Technology could offer a support previously unknown. By making old language material available again to the younger speakers via the internet, a medium that they love, we felt that there was a greater chance for language vitality and survival. The databases had to be accessible to all concerned, that is, not only to a few educators, but to all Cree people living in the Northern communities, and also to urban, off-reserve natives in need to reconnect with their ancestral culture and language. With Cree Programs offices in many different communities, hundreds of kilometres apart, the databases had to allow collaboration at a distance. They had to allow easy modifications, may it be for updating the content or for maintaining the interface and the functionality. Finally a lower cost compared to previous ways of doing things (print, paper, postage) was also a priority.

We started with a publication catalogue, at the request of Cree Programs staff and teachers. CP has published hundreds of books for the schools over the years and teachers needed a way to know what was available and get it to their classrooms. We then worked together on an oral stories database, in order to digitize, archive and organize old recordings of elders, recordings which were in danger of being damaged by time, and were not accessible. These databases are multi-lingual: they display four languages: East Cree Northern, East Cree Southern, French and English.; two writing systems (syllabics and roman orthographies) are available. Our more recent developments include a Terminology forum and a Cree (syllabics) Chat room ( . Prototypes for Spelling lessons, Read- and Sing-along ( and a Linguistic Atlas ( were also developed using the principles we discuss in this paper. For sake of brevity we illustrate our approach primarily with one Database, the oral stories.

The Oral stories Database ( is populated with recordings from the Cree School Board, academic scholars: anthropologists and linguists, some community radio stations, and private (speakers) collections. At the Cree School Board, there was no systematic centralized archiving system in place. The tapes were scattered in several communities, and some had not been backed up. Some were already damaged. The original tapes were between 20 and 40 years old. The same was true of some (speakers' or scholars') private collections. There was at the same time an increasing need for educational material for the upper grades: children and teens having gone through the Cree as a Language of Instruction Program (henceforth CLIP) since 1995, were thirsty for more language material in Cree. The focus of Cree Programs had been, like in many other native communities, almost exclusively on literacy training (Burnaby, 2004) and there was a lack of educational focus on the thousand-year old oral tradition.

The Oral database was thus developed as a response to these needs and problems. We wanted a usable tool for collaboration and for storage of results. And we knew that our interfaces had to work for users with various (mostly low) technical expertise levels. The material had to be organized in culturally-relevant categories, and that organization had to be flexible, allowing for online changes without having to reprogram the database or the interfaces. For example, the Cree language distinguishes between stories that are tipachimuwin, personal stories, memoirs, and stories that are aatiyuuhkaanh, archetypal stories, legends and myths. Thus our Cree collaborators created in the interface categories that were based on the Cree language, rather than the English language. Because we were working primarily with users from the Cree School Board, the categorisation had to reflect the potential use by teachers of the stories, i.e. concerns with age-appropriateness, curriculum topics, etc. We also had to take into account the available hardware, speed of connection and software available in the Cree schools and homes.

Figures 1 and 2 are screen shots of the database as it is published on the web, and accessible to all users.

Figure 1: English item view for the general user
Figure 1: English item view for the general user

Figure 2: Cree List view for the general user
Figure 2: Cree List view for the general user

A set of users of the site (employees of Cree Programs), have been given access to a separate interface that give them control not only on the content of the databases, but also on pieces of layout and even on code elements, as we will detail in the next section. The figures below illustrate these maintainers' interface: for content control and update, Figure 3 and 4, and for layout and code Figure 5. For example, the 'Edit Terminology' window in the maintainers' interface shown in Figure 5 below controls the 'Topic list' that appears in the general users' interface shown above in Figure 2. This is how we were able to build a user interface in Cree even though the programmer on the project does not speak the language.

Figure 3: List view of the Maintainer's Interface
Figure 3: List view of the Maintainer's Interface

Figure 4: Edit Item view of the Maintainer's Interface
Figure 4: Edit Item view of the Maintainer's Interface

Figure 5: Edit Terminology view of the Maintainer's Interface
Figure 5: Edit Terminology view of the Maintainer's Interface

Now that we presented the task that was ahead of us, and our current solution, let's take a look at how we approached the task and (partially) fulfilled it.

3. Technical concerns

With the requirements set up by our choice of research methodology (Participatory Action Research, henceforth PAR), and by our choice of presentation medium (Web), we had to address many technical concerns in ways that would not detract from the availability of the data. In this section we review these requirements and the solutions we settled on.

3.1 Theoretical and technical approaches following PAR requirements

We developed the online databases as an application of previous research (Luchianov, 2000, 2001) on design patterns. Design patterns in software engineering are a reusable methodology for systematically and consistently developing software (Alexander, 1977). The specific pattern used here leverages theories from various branches of Psychology, Software Engineering and Linguistics in order to set up a dynamic, active user model which provides recommendations for specific software implementation of tools fit for their task and user groups. This pattern describes methods like adaptive presentation (generation of documents from pieces according to user-specified levels of detail), reduced contextual clutter (present the most relevant data possible), context-preserving data mining (access to more data in a stack of notes - without having to move to other pages), data access in graph-oriented rather than serial paths (as suggested in Spinellis, 2001), transparent encapsulation of content, layout and code (allowing for a variety of visibility levels of the details of implementation of each element of the application), informative reports (on status and errors) and active path markers (buttons with pop-up descriptions rather than text-like hypertext links) - to mention a few. A more complete (and evolving) list is available online at

This research was continued throughout this project. The more useful aspects of the design patterns we are working with are that: (1) most of the components of the databases (code, templates and multilingual content) are shared; (2) the site is built based on a common set of content templates, and it uses a common set of JavaScript objects; (3) we continue to modularize, as shared functionality requires it, while keeping down to a minimum the number of files to be maintained; (4) multilingual features are built right in, for content (e.g. story descriptions), as well as for the interface layout and coding. [1]

One of the most important aspects of the design pattern we settled on, is known in Software Engineering and Human-Computer Interaction literature as fast prototyping (REF). Our first attempt at fast prototyping was to build an online syllabic communication tool. It evolved into a general transliteration tool for simultaneous entry of syllabic and roman orthography, used for example in a Cree chat room, and in a terminology forum. As far as the databases are concerned, we started to work in the now-classical Object-Oriented paradigm (e.g. White, 1994), spending lots of resources on creating development plans, database structures and object models; we then worked with the users on first series of prototypes which resulted in the publication catalogue( This process took more than an academic term (5-6 months), to develop and test. For the second database (the oral records, we used fast prototyping from the start, and due to the design of the database engine, its prototype was ready in three days and finalized in an extra couple of weeks. We reused modules and built the simplest database structure, functionality and interface, and grew from there, as functionality was required by the native users.

Theories like design patterns and fast prototyping have allowed us to meet most of the PAR requirements outlined in section 2. Making such flexible systems available over large distances, in (relatively cheap) collaboration-friendly environments becomes then a more technical matter, which we discuss next.

3.2 Technical concerns following the presentation medium (Web)

Until back a decade or so, computer programs used to be mercilessly optimized in order to reduce the resources they were using up. However, since the explosion of storage and processing power, size and speed optimizations seem to be relegated to dreaded computer courses and to the embedded systems where stringent resource limitations still exist. We live in a time where bloatware (software with needlessly huge install footprint), is no longer a sin, but a common feature. However, as far as Web applications are concerned, we still have several limitations (Zhang, 2003): (1) heterogenic client bases (since people of limited technical background tend to disregard upgrading their Web browsers - or not know about the option), (2) low or inexistent access to most client machines, a problem common to all Web applications, and (3) long transfer times negatively impacting the users' appreciation of the application and its data (since we need to supply data to areas where broadband Internet connectivity has not reached out fully).

The object-based fast prototyping approach we used (simplifying on Guerrero, 1998) in developing databases and other aspects of the web site (like the dynamic menus), addressed with various degrees of success each of these limitations. In most of the current web applications we looked at, the server-mainly approach had been preferred. [2] Such an approach gives the developer better control over what clients see and it reduces the resources necessary for developing and maintaining the application. However, it results in very long waiting times, low scalability and more hardware resources necessary at the server side, due to the fact that the entire interface (layouts and content) has to be built and transmitted to the client each time the client performs any action in the application. Since most of our target clients (CreeSchoolBoard employees and Cree community computers), are theoretically maintained by the same group of people, we chose a balanced client-server approach that allows us to decide the amount of work performed at the server-side and to reduce the amount of data transferred to the clients.

For example, the database interface is handled by one JavaScript file (generator) plus one more for each language supported by the database. These files, like the rest of the graphical elements of the interface, are loaded only once per session (under the default settings on all web browsers we know of). The language files are also dynamically generated each time a maintainer changes something in the parts of the interface to which the maintainers wanted access (see Figure 5 above). All the server sends to the client is a set of properties for the main JavaScript object (the interface generator), as for example an array containing the records requested by the user at any given time - in a very compact format.

The downside of this approach is that the various platforms and browsers that are being used, require special attention. There are differences in the JavaScript support and object models implemented in each version of each type of browser, in the way screen measurements are done, and in the way multilingual text is supported. Therefore, the software that drives the interface has to be designed with these differences in mind. Since it's unreasonable to implement each (known) difference from the start, we develop for the small range of browsers and platforms that our intended users have access to, and from time to time (mostly when we receive bug reports), we modify the layout objects to accommodate more of them. Currently, we support Internet Explorer 5, Netscape 4.7 and Opera 7 on the Windows platforms. Macintosh OSX systems which are Unicode-compatible are also supported, but our datbase interface exhibits some layout problems since we don't yet have a MacOSX testing machine.

You can see in Figure 2 an example of Cree syllabic text as part of the oral record database. The Cree Syllabic character set contains about 136 characters. No operating system supports this character set natively, but there are several font sets developed by researchers, publishers or enthusiasts in various formats, encodings and typefaces. Legacy operating systems like Windows 98 and lower, and on MacOS 9.x and lower support only 8 bit fonts, which do not allow mixing of syllabic and roman characters in the same text field without a lot of overhead in the form of font formatting tags of some sort or another. Unicode fonts solve that problem, but: (1) they are not supported well on the legacy platforms mentioned above, (2) many of the Web programming tools available handle Unicode poorly if at all and (3) the protocols define lots of encodings, many of them very verbose (up to a max. of 7 bytes for each character).

Finally, the limitation over which we have most control is long transfer times. As we mentioned before, we are separating content and layout. This is done by using Cascading Style Sheets and JavaScript objects in modular structures; we send very little redundant data from the server. Since the interface is programmed at the client-side, we connect to the server less often and the interface is very responsive, without being overly crowded. There is a caveat here: hiding functionality behind buttons which redraw the interface has proved to be problematic for beginners. So we resolved to have two versions of the interface, one for quick searching and browsing the entire catalogue (seen in Figure 1), and one with category-based browsing and searching on specific fields (seen in Figure 2). An additional one was suggested by the Cree maintainers during our last workshop but is not yet implemented; it is very comprehensive at the expense of being crowded.

The conditional coding required to render an interface on such a heterogeneous set of clients is rather difficult to maintain, especially for visually-oriented people. Our choice of client-side support (HTML, JavaScript and CSS), was driven by the following facts (1) they generate a flowing, flexible layout, (2) they have an already large installed base and (3) there's a relatively very low cost-of-ownership of the tools required for development. However, since they require a programmer to check all changes, the cost-of-operation of the solution is higher than it was expected. So we are looking at an alternative. We have already prepared several prototypes using Macromedia Flash (the widely used multimedia editing program which allows for webembedded animation, sound and video streaming and lately, client-server applications). Versions before Flash MX2004 allowed for pretty difficult data transfer from the server and their Unicode support was poor. However, with this new version, these problems seem to be fixed and the only deterrents in using this visually-oriented design and programming environment are (1) the fixed layout it offers (pages don't flow like in HTML, everything happens on a fixed-size 'stage'), (2) learning the Flash development style (peculiar for people new to multimedia) of half-visual-design, half-programming, (3) the higher price-of-ownership, and (4) the slightly higher difficulty of packaging Flash applications (originally designed for copyright protection), as open source, or even collaborative projects.

All the measures to reduce the data transferred from the server we discussed until now have to do with the interface and the meta-data, the description of the oral records. As far as optimizing the content itself is concerned, anyone who has used digitized sound knows the huge amount of storage needed for preserving sound in lossless formats. The sound format and sound-editing strategy of the oral material we chose resulted from our desire to balance speed, availability and protection from appropriation, with the perceived quality of the sound. The mp3 format was clearly becoming the most widely used standard for web distribution of sound files, and since our project is not commercial, no fees were requires to pay to the developers of the technology. After several tests with compressed Windows (wav), Quicktime (mov), various codecs for these formats, and the mp3-contender (ogg), we settled on an mp3 compression format. We judged that this format gives a sufficient quality for the use intended, i.e. web listening on the web and private CD burning, while preventing reappropriation of the material for commercial use and avoiding the high-pitched artefacts of equivalent .ogg compression. In order to reduce the amount of time spent waiting for the sound to download, we had to cut our stories into parts. Some oral records are more than half an hour long, but we allow the casual listener to get a short introduction from which they can gauge the sound quality and make other aesthetic judgements about the voice and attitude of the storyteller before having to download the entire record. For the same reason, we split files larger than 2Mb in parts, and so, people with slow Internet connections can spread the download time over several sessions without the use of dedicated downloading software. After consulting our collaborators about the acceptable quality loss in the spoken story-like records, we opted for a sound compression ratio of 29.4:1 (sound digitized with a sample rate of 44.1kHz and mp3- streamed at a rate of 24kbps), or 22.1:1 (mp3-streamed at 32kbps) That compression plus the 2Mb arbitrary file size limit we set, suggested a maximum of 10 minutes of content for any given parts, or a maximum wait of about 20 seconds per part on a fast connection (at average latency on a 512kbps DSL), or about 2.5 minutes on a 56kbps phone modem. There are issues of usability both for end-users and for maintainers.

We designed the sound player object in order to make use of whatever browser plug-in each computer has installed. The plug-in we are suggesting is Quick Time or Windows Media Player, since they appear as part of the interface and allow the user to control the sound (start/stop, volume, play-head location, etc.). However, if the user has installed some other sound player or sound-editing program that's set to handle mp3 files, our current solution opens that program as a helper in the background, with disconcerting effects (mainly, the database interface loses control over the playing sound, thus fails to terminate it when necessary). So we are considering writing our own player, in Flash, as we have done for the read-along and sing-along lessons (

To sum up, since we included user feedback in the core of the development cycle, we had generally good results at the programming side. The design pattern used assured the consistency of the data presentation while adding a level of layout flexibility almost impossible with classical database development tools. At the user side, the interfaces were very responsive even over 56k modem connections, which resulted in many reports of user satisfaction. The design pattern implementation also allowed the implementation of additions or modifications almost as they were suggested by the users, resulting in increase motivation on their part.

4. Conclusion

The creation of digital, on-line resources for threatened aboriginal languages presents many technical challenges. These challenges can only be met, if framed in the larger context of a research that also includes educational and ethical challenges. The success in meeting our goal results from always keeping in mind our greater question: "How can information technology help language preservation and documentation and how can the process of creating this resource have a positive impact on the language and its speakers?" The technical approach discussed here is thus framed as an answer to this bigger question. Fast-prototyping, open-source development, proprietary solutions, and database engine design choices were not explored on their own, they came as possible answers to our social and human concerns about the preservation of language and cultural diversity.

Works Cited

Alexander, Christopher, Sara Ishikawa, Murray Silverstein, Max Jacobson, Ingrid Fiksdahl-King, and Shlomo Angel. 1977. A Pattern Language. Oxford: Oxford University Press.

Burnaby, Barbara. 2002. How Have Aboriginal North Americans Responded to Writing Systems in Their Own Languages? Paper given at the Atlantic Provinces Linguistics Association Conference, St. John's, Memorial University.

Guerrero, Luis A., David A. Fuller. 1998. Objects for Fast Prototyping of Collaborative Applications. Proceedings of the 4th CYTED-RITOS International Workshop on Groupware, CRIWG'98, Rio de Janeiro, Brazil, September, 1998.

Jancewicz, Bill and Marie-Odile Junker. 2002. Cree on the Internet: How to Integrate Syllabics with Information Technology and the Web. Presented at the 34th Algonquian Conference, Kingston: Queen's University.

Jancewicz, Bill and Marie-Odile Junker. 2003. Frequently asked questions about Cree syllabics, Computer technology and the Web. In., web pages and PDF download.

Junker, Marie-Odile (ed). 2000-2004. The East Cree Language Web.

Junker, Marie-Odile. 2002. Participatory Action Research in Linguistics: What Does it Mean? / La recherche participaction en linguistique: Enjeux et significations. Presented at the Session on Ethics of Archiving Languages and Fieldwork, organized by the Aboriginal Language Committee, Canadian Linguistics Association Annual Congress, University of Toronto, May 2002.

Luchianov, Mircea-Radu. 2000. MoStaCon: Usability Study for an Experiment Design Tool. Master Thesis. Sofia: New Bulgarian University.

Luchianov, Mircea-Radu. 2001. MonDoc: Effective, personalized active-data documents. Ottawa: Carleton University. Available as a Pdf document at:

Morris & Muzychka. 2002. Participatory research and action. Ottawa: Canadian Research Institute on the Advancement of Women.

Spinellis, Diomidis. 2001. Notable design patterns for domain specific languages.Journal of Systems and Software 56(1):91-99.

White, Iseult. 1994. Rational Rose Essentials: Using the Booch Method. Pearson Benjamin Cumming. Redwood City, CA.

Zhang, Jia, C.K. Chang, and J.-Y. Chung. 2003. "Mockup-driven Fast Prototyping Methodology for Web Requirements Engineering", Proceedings of the IEEE 27th Annual International Computer Software and Applications Conference (COMPSAC 2003), Nov. 3-6, 2003, Dallas, TX, USA, pp. 264-269.


1. This approach reduces the task of adding a new language or dialect to a matter of adding and populating one column in two of the database tables changing the content of a few rows in other tables - and of course some relatively minor debugging - all of which can be done directly online. However, as for most web application development, whenever we make major changes to the database we do it safely on a development server and we upload changes to the public server only after preliminary debugging.

2. Dreamweaver and FrontPage templates, Syndeo, WebCT, Wikis, specialized open source catalogues.