Reading the Text's Mind: Lemmatisation and Interpretation from a Peircean Perspective

W Winder

doi:10.16995/dscn.234

Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S. Eliot, "Choruses from 'the Rock'"

1. Looking for a metalanguage for computational criticism

All novel, innovative approaches to a field of study go through a period of scrutiny where their methods and distinctive goals are debated. Computational approaches to interpretation have not been spared such debates, nor should they be. One instance can be found in Serge Lusignan's call for a moratorium on what was in 1985 the staple of editors of computational criticism: concordances, indexes, and frequency lists. His complaint was that critics would soon be better served by on-line, interactive services and that, in any case, "no [critical] question has been fundamentally changed by text processing" (Lusignan 1985: 212).[1] Computers produce data, and a great deal of it, but unfortunately we have no guarantee that the data produced will be pertinent for any given inquiry into the nature of texts.

Lusignan suggested that our efforts would be better spent elsewhere: "Computer-assisted research on texts will make no real progress unless it concentrates on perfecting models of analysis that are specific to the electronic text" (Lusignan 1985: 211). Until such models exist, the status of computer-generated data will be that of a quotation, i.e. something that "only has an emblematic value for a [critical] question that is defined independently in the silent communion between the printed text and its interpreter" (Lusignan 1985: 212).

Though the world of computers has evolved at a considerable pace, computational criticism still seems to be at the same juncture that Lusignan so pessimistically described nearly ten years ago. We still lack a clear statement of what the electronic text means for interpretation. If Lusignan's analysis is accurate, electronic texts may mean very little indeed.

In this paper, I hope to contribute to the development of a conceptual framework for computational criticism by placing one of its central practices, lemmatisation, in the context of C.S. Peirce's general theory of signs. Lemmatisation is on the practical level an essential and distinctive method of computational criticism -- Choueka and Lusignan have described it as "one of the most important and crucial steps in many non-trivial text-processing cycles" (Choueka & Lusignan 1985: 147)--; I will argue that it has a central theoretical position as well, which, if given a systematic treatment, will lead to a more coherent way of understanding what makes computational criticism distinctive among interpretative practices.

The role of a semiotic framework is simply to organise and develop the terminology we use to describe our interpretative practices. While such meta-interpretative reflection may seem like an awkward, painfully abstract, and useless digression from the real work of interpreting texts, it is unavoidable since, in a very real way, a method of inquiry is substantially a working, growing terminology. The Russian formalists Yuri Shcheglov and Alexander Zholkovsky describe the relation between a theory and its terminology in these terms:

One of the crucial problems in the establishment of a science is the development of a special language in which it formulates the results of its inquiries. This language must be unambiguous (i.e. understood in the same way by all specialists), explicit and formally strict. Only then will it be able to perform its function of accumulating, ordering and storing information. The function of a metalanguage does not consist merely in introducing special terminology and notation, but in providing a conceptual framework capable of adequately reflecting the described object. In other words, a metalanguage, as a means for making statements about the object, is also the means of existence of a scientific theory. [...] An example of a result in chemistry is the periodic table of elements: it is not necessary to consult Mendeleev's works in order to use it, since the table is given in every textbook. Thus the metalanguage makes it possible for a science to keep a strict account of what it has achieved, to make use of the results of previous studies and to reduce the likelihood of repetitious research. (Shcheglov & Zholkovsky 1987: 19-20)

If we are to respond to such negative appraisals as Lusignan's, we must systematise the fundamental procedures and findings of computational criticism. In that way, we can avoid repetition and avoid misunderstandings about the meaning and value of our methodological constructs. For instance, Lusignan seems to suggest that a quotation has a subordinate role in interpretation. On the contrary, I hope to show that the unique approach to textual meaning practised in computational criticism inevitably makes quotations -- taken as attestations -- central to the interpretative process.

Such terminological clarifications are necessary since most traditional interpretative constructs take on a new meaning in the electronic medium. The following discussion of lemmatisation is thus inevitably but a small step in the more general effort to translate interpretative practices and terminology for the new medium.[2]

2. An analysis of lemmatisation: types, tokens, and tones

Those working in quantitative linguistics and lexicology associate lemmatisation with the methodology of frequency lists and concordances; the term "lemmatisation" is otherwise not widely known, and might be considered a technical term reserved to these fields.

But whether working in quantitative linguistics or not, all literate speakers have in fact a very practical experience with lemmatisation through the use of dictionaries. When we as readers turn to the dictionary for help with a word encountered while reading, we perform a lemmatisation. We associate a textual occurrence of a word with an idealised form of the word found in the dictionary. The dictionary headword is "idealised" because it stands for a range of possible occurrences. Thus, if we encounter the word "go" 10 times in a passage of Shakespeare, we will associate each occurrence with the same dictionary headword. On another level the headword is idealised because it represents obvious variants of a word: to know more about the words "went" and "gone", for example, we will look in the dictionary under "go". In English, we generally look for conjugated verb forms under infinitives, plural nouns under their singular form, abbreviations under their full form, lexicalised compounds under key words, etc.

Dictionary headwords belong to a different world than the words we find in other texts. That fundamental difference is labelled differently in different philosophical traditions. For structuralists, dictionary headwords belong to "langue", the systematic side of language; text occurrences belong to "parole", the unsystematic side. For generative linguists, headwords are units of linguistic competence; occurrences, units of performance. For logicians, headwords represent logical classes, to which text occurrences belong as members; headwords are mentioned, occurrences used. For lexicologists, headwords are types and text occurrences are tokens.

The type/token distinction was originally given systematic study by Peirce. However, his distinction was between types, tokens, and tones.[3]. The tone category of signs did not have the same success as its siblings, since it is a more delicate matter to distinguish tones, and they are not subject to the same quantitative treatment as tokens.

The three appear as manifestations of three modes of reality: existential reality (token), the reality of law (type), and the reality of qualities (tone). We will consider each in turn.[4]

2.1 Tokens

A token belongs to the existential world. It is a sign that represents by way of its particular place in time and space. By definition a token is unique and different from anything else. Thus, if I point to the token "swounds" in my copy of Shakespeare's works and say that it is misspelled, I am pointedly not saying that "swounds" is by nature a misspelled word (though that may coincidentally be in some sense true), nor even that in all Shakespeare's plays it is misspelled (which may be coincidentally true also). By calling it a token, I only wish to say that this particular example in my book is misspelled, without any further, perhaps unwarranted, generalisation. To say that a sign is a token is simply to point to what is absolutely unique in the occurrence, i.e. its position in time and space.

Peirce's definition of a token is narrower than the modern definition. Peirce's tokens expressly do not belong to a given type. They are simply text [5] positions, before any lemmatisation has occurred. Peirce reserves the term "replica" for lemmatised tokens, and I will follow his terminology here.

2.2 Types

A type belongs to the world of law. It is a sign that represents the law-like generality of a class. Again when I point to "swounds" in my copy of Shakespeare and say that "I know this word", though I may never have read that particular text in my life (and therefore have never had any existential contact with this particular instance), what I mean is that the occurrence can be assumed under a general model. I recognise it as an expression of a law or convention, in the same way that I recognise the falling of a rock as an expression of gravity. Unlike tokens, types cannot be pointed to any more than can the law of gravity; types are real but do not belong to the existential world where pointing is possible. If a rock fell, it would be incongruous to say "There is gravity." But in the case of a textual replica, I can translate the pointing to another world, and indicate the headword of a dictionary. To say "I know this word" is tantamount to saying that if a replica were found as a headword in a dictionary, I could paraphrase the dictionary text that follows it. The shift in text worlds is an essential property of the type-replica relationship, as is the predictability of the defining text. But fundamentally, a type is by nature a semiotic item that is beyond considerations of time and space.

2.3 Tones

Tones belong to the world of qualities. Qualities are the fundamental perceptual units that cannot or will not be analysed in a given investigation. For example, when reading the barely legible handwriting of one of Shakespeare's manuscripts we might point to a word and say that it looks like "swounds", or perhaps "swound" or even "smounds". We may not recognise individual letters, but still have the impression that we recognise the general form of the scrawl. Whether we recognise the form or not, we do indeed "cognise" some quality, i.e. something that has a value as a first impression, whatever interpretation we may finally bring to it; something that is at the same time both this and that, something distinct from time and space, and distinct from a law. That "something" is a tone, precisely because it does not require that we know what it is, only that we be aware of it as something that is not constrained by time and space. In other words, the same tone is free to appear simultaneously in many places.

This gift of ubiquity that a tone shares with a type often leads us to confuse them. Even a knowledgeable reader of Peirce such as Savan can have misgivings:

The difficulty ... is that there are no criteria of identity for qualities. ... So a quality is identical with or similar to those qualities of which it is judged to be a sign. If Locke's blind man judges the blare of a trumpet to be like red, so be it. The sound is a qualisign of the colour. Such arguments led Peirce to adopt the hypothesis of synaesthesia, that all the sensory modalities form one continuum of qualities. It ought to have led him to ask whether the notion of a qualisign [tone] was in any significant way different from that of a legisign [type]. (Savan 1988: 24, emphasis added)

A tone concerns what is possible, independent of the laws that relate the instances of those possibles; a type concerns necessary relations between possibles. In some contexts, the two are interdefinable, and therefore may appear to be equivalent, as Savan suggests. Thus, in modal logic a necessary proposition is defined as one whose meaning is not possibly not true (i.e. not possibly false); in formula:

Nr <=> -P -r

where "N" and "P" are the modal operators of necessity and possibility, and "r" a proposition.

However, that interdefinability depends on negation (a distinctive aspect of the token dimension) and the subtle principle of hierarchy of operators (a type distinction). Possibility is indeed involved in necessity, but nonetheless distinct from it: in Peircean terms, necessity (a third) is what determines individuals (seconds) to possess certain possibilities (firsts).

In practical terms, we recognize this distinction when we recognize the difference between a wildcard pattern and the class of target strings it delimits. The pattern has its own distinctive internal properties; the class, on the other hand, is an interpretation of the pattern in a given context. It depends on the expression of a rule. Thus, depending on the search context, two different patterns may delimit the same class of target strings. Programmers distinguish between these two dimensions when they recognize the difference between metacharacters (types) and escaped characters (tone).

What are the tones of a text? When we read, we may certainly be aware of much more than we actually use in our reading. A letter may be smeared, the spacing of the text may vary, a higher proportion of e's may be found in one passage, a sequence of words may have a particular rhythm, etc. All these qualities we may perceive, but may choose to ignore, or not, when we read. Carmina figurata --visual patterns made with text-- are good examples of how unique qualities can be found in any passage. Even in the more abstract medium of the electronic text, where the basic tone units are the set of ASCII characters, the possible number of tones is infinite, since even the shortest passage can be taken as the source of a combinatorial explosion which places it at the centre of an infinitely expanding cloud of associated patterns. However, the tones of a text are defined as those qualities that we do indeed wish to consider as fundamental and unanalysable in a given analysis. For instance, in most texts we will consider alphanumeric characters as unanalysable; we will not analyse them into the bars and curves, or distinguish them according to their pitch in proportional spacing.

Thus, following this triadic division, dictionary headwords are types; source text occurrences are tokens; and character combinations are tones.

2.4. Complex signs and terminological conventions

These three sign classes are defined in relation to one another, so that, like dimensions of fractal geometry, their proportion is maintained on any structural level --macro, meso or micro--. At the same time, the type/token/tone scale is hierarchical: a headword may subsume a set of occurrences; an occurrence may subsume a set of characters; and characters are unanalysed members of a tonal alphabet. This "sliding hierarchy" is represented in Peirce's phenomenology by the numbers 3, 2 and 1: a type is a third; a token is a second; a tone is a first.

In spite of this absolute hierarchical ordering of signs, a particular textual sign may belong to all three worlds at the same time. As the examples I have given show, an occurrence of "swounds" can be called a type, a token, or a tone, depending on what we wish to say about it. It is only in our discourse that we distinguish between them and highlight one aspect. In other words, the justification of that distinction cannot be found so much out there, in the text, as in our own discourse and in the critical practices we wish to describe.

In fact, we typically have difficulty maintaining a clear distinction between these three and we often use the term "word" to ambiguously refer to all three. There are at least two very good reasons for the confusion.

First, in a very real sense, type, token, and tone dimensions are indissociable; they are like the legs on which the word stands. When we consult a dictionary, we use the tonal qualities of the text to associate a textual token with a dictionary type. If ever there is a misprision at any of these levels, the lemmatisation fails as does the consultation. To return to the example of a misspelled "swounds": though we presented it as a case of tokenness, it is clear that all three worlds are involved. Misspelling concerns types as well as tones, since we must first recognise the occurrence as a word, and indeed see some resemblance between the misspelling and the correct spelling, and finally associate it with an occurrence of the type "swounds". So it is for all the examples we gave: there is no pure case where only one world appears alone; our critical discourse is needed to select a particular aspect for study.

Secondly, the highlighting we choose may be intentionally complex. We may wish to study signs that are intrinsically hybrid, i.e. that are more or less a tone, more or less a token, more or less a type. For instance, when we speak of the spelling of "swounds" in the first edition of Shakespeare's plays, we restrict the range of the type to a particular set of texts. In other words, the "swounds" type may be drawn from the special, restricted lexicon of that edition, and not all Shakespeare, nor all texts. At the same time, as laws, types can be more or less general, the more specific being instances of other, more general types; the occurrence "swounds" is a replica of the type "SWOUNDS", which in turn is a replica of the type "INTERJECTION", which is a replica of the type "WORD", etc.

Nor do tonal qualities have distinct borders: just as the colour red blends continuously into scarlet and purple, so do character combinations, which tend to resemble more or less other character combinations; wildcard, anagram, and fuzzy match functions are designed precisely to deal with this variability.

Finally, tokens, though particular, are not defined according to size: a word may be a token, but a fixed expression may be one as well, and so too a paragraph, or even a text.

While it is generally not a simple matter to preserve these distinctions, it is worth noting that lexicographic practice does distinguish very carefully between the three. I have drawn on that tradition by saying that the dictionary represents a separate world, the world of types. It is only a metaphor because, after all, a dictionary is a text. Yet, at the same time the tonal space of the dictionary is unique among texts. The lexicographer uses typography, format, and alphabetical order to set headwords apart, to detextualise them. Alphabetical order and headword format are lexicographic conventions that are used respectively to represent the tone and type spaces of texts.

3. Consultation and the meaning of lemmatisation

3.1 The structure of lemmatisation

In terms of these three worlds, lemmatisation might be described as a mapping, conditioned by tones, from a domain of tokens onto a range of types. That mapping can be viewed from several points of view.

3.1.1 Tag insertion

From a practical and computational point of view, a lemmatisation is realised by inserting a tag into a text. Thus, given the textual segment ".....a......", a lemmatisation is effected when the lemma is inserted into the segment and marked appropriately as a tag; "...A{a}..." might be the resulting sequence for this example, where the tag "A{}" is inserted into the original segment around its replica "a".

The kind of information the tagging conveys can be grammatical (as in "GO{gone}"), semantic ("SPHERE{ball}"), connotative ("DEATH{raven}") or other. The linguistic unit that is tagged is equally varied. It could be a morpheme, a word, a sentence or any other textual unit; titles are in fact lemmata of whole texts. Furthermore, tagging can form hierarchies: "SPHERE{NOUN{ball}}" would be an example of a complex lemmatisation.

In the model we are developing, tag insertion is the minimal interpretative act. It is the basis for generating from the source text a new text, a new attestation.

3.1.2 Reference of tags

That is the operational side of lemmatisation, but what are the consequences of that tagging? When we consult the dictionary we expand the source text; at the position of a given replica, we virtually insert the text of the definition. Our reading passes from the source text to the dictionary and back again. The two texts have been woven together, much as the passages of a hypertext are woven together by jumps. The reference from the lemma to the replica serves thus as the basis of a more general reference between two texts.

When we consult the dictionary, we are virtually mixing two texts to generate a third, hybrid text. The third text is in some sense more readable, i.e. it has more meaning than the first. That surplus of meaning is not simply poured from the dictionary into the source text, but rather develops from the reaction of the two.

The derivative text may be informationally richer than the source text, since it is often an expansion of the original text, but that is not a necessary condition for an increase in meaningfulness. It is more accurate to say that we recognise the tones in the hybrid text whereas we did not recognise them in the original text. This is particularly evident in the case where the dictionary expansion is taken from a bilingual lexicon, since the target word is not intrinsically richer than the source word. It simply belongs to a different language, or value system.

3.1.3 Naming

Finally, lemmatisation is a kind of naming. Though a tag is both paradigmatically and syntagmatically related to the replica (it sits "beside" and "above" the replica), lemmata represent their replicas metalinguistically.

What is the effect of naming? The simplest effect is to mark a position in the text, much as a library call number marks a position in a library. Named items can be manipulated on the level of their names, and not as complete units, which gives the name its peculiar efficiency.

But lemmata are not empty deictics; they have meaning in their own right. A lemma is more like a title in that it interacts with what it names. A title is an interpretative instruction, a sign-post for the reader's interpretation of the text. It thematises and systematises what it represents. A lemma represents an interpretative position, saying "this is what the interpreter calls an X". Thus there is a reciprocal relation between the lemma and what it names: the lemma derives its value from what it names and what it names is defined by the lemma.

3.2 Attestation generation

Lemmatisation is thus at the heart of several fundamental procedures: hypertextual linking of texts, generation of new texts, and creation of a surplus of meaning through naming.

It is also at the root of the dilemma that Lusignan pointed out. Whether it is done manually, interactively, or automatically, lemmatisation fixes the countable, objective, textual token between two worlds of essentially uncountable, subjective (i.e. non-token-like), co-textual tones and types. Laws are not discrete units, nor are qualities. One law or quality blends seamlessly into its neighbour, and because there is no limit to the shades of qualities and laws, early research was understandably lost on a sea of possible distinctions.

Though it exists of course in other fields, the problem of non-discrete tones and types is particularly severe for computational criticism, since computational critics necessarily take an extremely materialist view of the text, that places the token at the centre of its methodology. Computational critics also have at their disposal tools that are powerful enough to reveal in detail a vast spectrum of textual tones and types.

These aptitudes and constraints lead computational critics to adopt a particular approach to meaning in which quotations play the central role. Their methodology must ultimately be based on the procedures of lemmatisation similar to those that are used when consulting a dictionary.

Interpretation in computational criticism is implemented by generating a derivative text that has more meaning than the source text. The derivative text can be called a quotation, since it is a reorganised segment of the source text. A quotation in this extended sense is novel, however, in that it may contain words that are not in the source text (though editorial emendations are permitted even in orthodox quotations), and may not have the form of ordinary text at all, but remains quotation-like because it is purposefully derived from the source text. Thus, a frequency list, a concordance, or a distribution graph are all "quotations" in this extended sense.

A quotation has a particular interpretative force because it can be pointed at like a token (it is an instance), yet at the same time has the value of a type (it is generated in a "legal" fashion) and displays the qualities of tones (it has its own feel, its own "air de famille"). We will call the special quotations of computational criticism attestations.

The generation of attestations is analogous to the experimentation of chemists. Chemists too must deal with the continuity of qualities and laws found in nature. Meaningful units are generated through experiments in which samples are combined with known substances through standard procedures. Peirce gives a good example of how the terminology, meaning, and methods of a science are joined through experimentation:

If you look into a textbook of chemistry for a definition of lithium, you may be told that it is that element whose atomic weight is 7 very nearly. But if the author has a more logical mind he will tell you that if you search among minerals that are vitreous, translucent, gray or white, very hard, brittle, and insoluble, for one which imparts a crimson tinge to an unluminous flame, this mineral being triturated with lime or witherite rats-bane, and then fused, can be partly dissolved in muriatic acid; and if this solution be evaporated, and the residue be extracted with sulphuric acid, and duly purified, it can be converted by ordinary methods into a chloride, which being obtained in the solid state, fused, and electrolyzed with half a dozen powerful cells, will yield a globule of a pinkish silvery metal that will float on gasolene; and the material of that is a specimen of lithium. (¶2.330, quoted in Eco 1980: 86)

Peirce's definition of lithium is unique in that there is no attempt to establish an exhaustive inventory of the qualities that lithium may possess. There is no assumption that the definition can be substituted for the defined. Rather, the definition is a recipe for "baking up" some lithium, which the chemist may follow to come into direct existential contact with a replica of lithium.

I would like to suggest that chemists and computational critics share essentially the same methodology on this point: the extracted token of lithium has the same role as an attestation, and on the other hand, textual attestations are produced experimentally in the computer through the reactions of a source text in standard algorithms. In other words, textual attestations are kinds of precipitates and chemical precipitates are a kind of attestation.[6]

What is truly distinctive about computational interpretation, is that its procedures, like those of chemistry, are designed to lead the reader to a replica of a textual feature, the attestation. In this way, the problems associated with the non-discrete type and tone co-texts are resolved: there is no need to have an inventory of tones or types because they are packaged and displayed in attestations, and their terminological value is captured in the algorithms of attestation generation. In short, large-scale, systematic generation of attestations is ultimately the distinctive feature of computational interpretation. Electronic texts are simply the indispensable medium of this truly novel approach to textual meaning.

4. Reading the text's mind

Lusignan traces computer criticism's failure to a crucial distinction between information and meaning: the telephone book is fat with information, but has little meaning. On the other hand, specifically human productions, such as the "lettre d'amour criblée de fautes d'orthographe" that Breton speaks of in Nadja, can have little or no information and yet still be laden with meaning. For Lusignan,

In the context of interpretation, the computer would appear to be a machine that uses the electronic text to produce information. The computer never attains the level of meaning. (Lusignan 1985: 210)

Since Shannon and Weaver (Shannon & Weaver 1964), we understand information in terms of a quantifiable set of possible choices. There is no correlative information-theoretic definition of meaning, but in the Peircean perspective it is clear that meaning is a biased tendency in the change of information. In other words, meaning is the direction that is taken by information in its transmission and growth:

...the meaning of a word really lies in the way in which it might, in a proper position in a proposition believed, tend to mould the conduct of a person into conformity to that to which it is itself moulded. Not only will meaning always, more or less, in the long run, mould reactions to itself, but it is only in doing so that its own being consists. (Peirce 1931-1958: 1.343)

Computers and electronic text participate in this structuring of our behaviour since they are signs and, like any sign, they serve as templates for our practices. In sharing information with signs we share our very essence:

Again, consciousness is sometimes used to signify the I think, or unity in thought; but the unity is nothing but consistency, or the recognition of it. Consistency belongs to every sign, so far as it is a sign; and therefore every sign, since it signifies primarily that it is a sign, signifies its own consistency. The man-sign acquires information, and comes to mean more than he did before. But so do words. Does not electricity mean more now than it did in the days of Franklin? Man makes the word, and the word means nothing which the man has not made it mean, and that only to some man. But since man can think only by means of words or other external symbols, these might turn round and say: "You mean nothing which we have not taught you, and then only so far as you address some word as the interpretant of your thought." In fact, therefore, men and words reciprocally educate each other; each increase of a man's information involves and is involved by, a corresponding increase of a word's information. (Peirce 1931-1958: 5.313, quoted in Crombie 1989)

The electronic medium is a system of signs that we use to "accumulate, order and store information". In the world of computer applications, expert systems are perhaps the most developed and elegant example of this constant dialogue with signs. The attestations of an expert system are direct responses to questions the user puts to the system. We find the meaning of the system's underlying electronic text in the regularity of its responses, regularity which reflects the original regularity of the expert's discursive behaviour. When we are informed by the expert system, our behaviour is then molded by the original behaviour of the expert.

To know what a word means is to know in what direction it will influence attestation generation; whether they have their source in computers or people, attestations are the concrete reflection of interpretative habits, habits that are by nature shared by the sign and the human interpreter, since they ultimately concern sign use. We inscribe in the computer software our own habits of interpretation; but when the computer responds in a regular fashion, it develops habits in us. Computers, like all signs, are repositories for our habits; and our habits are repositories for those of our semiotic systems.

If there is anything silent about this dialogue between signs and humankind that Lusignan and Peirce both rightly emphasise, it is only to be found in the inexhaustible effects of meaning, which by its very nature grows endlessly into the future. As Eliot suggests, there will always be meaning beyond our present language, whatever critical tools we bring to bear:

Endless invention, endless experiment,
Brings knowledge of motion, but not of stillness;
Knowledge of speech, but not of silence;
Knowledge of words, and ignorance of the Word.
("Choruses from 'the Rock'")

Notes

[1] Quotations from this text are my translation.

[2] See Winder 1993 for a discussion of the icon, index and symbol dimensions of electronic texts.

[3] Peirce's terminology varied. He often used the designations legisign/sinsign/qualisign instead of type/token/tone. We will use the type/token/tone designations here, since they are better known and lexically less baroque. Peirce used them as late as his last letters to Lady Welby in 1908.

[4] Much of what follows is inspired by David Savan's interpretation of Peirce's writings (Savan 1988).

[5] The type/token/tone distinction is not limited to language and textual signs; it applies to any semiotic system. However, throughout this article we are describing this distinction only in the restricted context of textual signs, and particularly with respect to electronic texts.

[6] The methodology of such radically different fields can be the same because ultimately any science is metalinguistically --meta-interpretatively-- about exactly the same thing: the administration of meaning. Every method of enquiry must use its terminology to accumulate, order, and store meaning.

Bibliography

CHOUEKA, Yaacov, & Serge LUSIGNAN (1985). "Disambiguation by Short Contexts", Computers and the Humanities 19: 147-57.
CROMBIE, J. (1989). "L'Homme-signe et la conscience de soi", Semiotics and Pragmatics (ed. Gérard Deledalle), Philadelphia: John Benjamins: 215-29.
ECO, Umberto (1980). "Peirce et la sémantique contemporaine", Langages 58: 75-91.
LUSIGNAN, Serge (1985). "Quelques réflexions sur le statut épistémologique du texte électronique", Computers and the Humanities 19: 209-12.
PEIRCE, Charles Sanders (1931-58). Collected Papers of Charles S. Peirce, Cambridge: Harvard University Press, 8 vols.
SAVAN, David (1988). An Introduction to C. S. Peirce's Full System of Semeiotic, Toronto: Toronto Semiotic Circle.
SHANNON, C.E., & W. WEAVER (1964). The Mathematical Theory of Communication, Urbana: UIP.
SHCHEGLOV, Yuri, & Alexander ZHOLKOVSKY (1987). Poetics of Expressiveness: A Theory and Applications, Philadelphia: John Benjamins.
WINDER, William (1993). "A New Notation: Towards a Theory of Interpretation for the Electronic Medium", Texte 13/14: 87-119.

Keywords

How to Cite

Downloads

1040

236

1. Looking for a metalanguage for computational criticism

2. An analysis of lemmatisation: types, tokens, and tones

2.1 Tokens

2.2 Types

2.3 Tones

2.4. Complex signs and terminological conventions

3. Consultation and the meaning of lemmatisation

3.1 The structure of lemmatisation

3.1.1 Tag insertion

3.1.2 Reference of tags

3.1.3 Naming

3.2 Attestation generation

4. Reading the text's mind

Notes

Bibliography

Share

Authors

Downloads

Issue

Publication details

Licence

Identifiers

Peer Review

File Checksums (MD5)

Table of Contents

Keywords

How to Cite

Downloads

1040

236

1. Looking for a metalanguage for computational criticism

2. An analysis of lemmatisation: types, tokens, and tones

2.1 Tokens

2.2 Types

2.3 Tones

2.4. Complex signs and terminological conventions

3. Consultation and the meaning of lemmatisation

3.1 The structure of lemmatisation

3.1.1 Tag insertion

3.1.2 Reference of tags

3.1.3 Naming

3.2 Attestation generation

4. Reading the text's mind

Notes

Bibliography

Share

Authors

Downloads

Issue

Publication details

Licence

Identifiers

Peer Review

File Checksums (MD5)

Table of Contents

Non Specialist Summary