1. Introduction

In the mid-1550s, John Baret, an otherwise unknown schoolteacher, began to collect the materials for a new English-Latin dictionary. Now, more than four hundred years later, we can only wonder at the accomplishment. Unfortunately, we also frequently wonder just what he was doing, and whether he knew what he was doing. In his study of Renaissance dictionaries DeWitt Starnes puzzled over Baret's use of sources. Gabriele Stein concerned herself more with lexicographic technique, the selection and definition of headwords, and she too must frequently throw up her hands in despair when trying to explain Baret's practices.[1] The problems are certainly complex, but the solutions are now within the realm of imagination, through the creation of early dictionary databases. However, even in this field there are many traps, most of which I have fallen into at one time or another, and now, wiser, I can dare to consider the matter all over again.

2. Baret and his Alvearies

John Baret we know almost exclusively through his two major lexicographic works, the Alvearie, or Triple Dictionarie of 1573, and the revised Alvearie or Quadruple Dictionarie of 1580, published two years after Baret's assumed death date, with at least some of the revisions the work of Abraham Fleming, a prolific if undistinguished specialist in this type of revision (Starnes 1954: 205). Baret received his BA from Trinity College, Cambridge in 1544-5, and his MA in 1558. The only other firm date is the year his will was registered, 1578.

In an introductory letter to the first edition, he explains that he had given his students, starting around 1555, the task of freshening up the Latin and the English of Thomas Eliot's Bibliotheca (1538). These "diligent Bees", the inspiration for the title of his work, reversed Eliot's order (Latin-English was shifted to English-Latin), gathered more "fine phrases" from classical authors, and finally "set them under severall Tytles, for the more ready finding them againe at their neede". These rough notes were then polished, apparently several years later, by former students now working at the Inns of Court, and, where their work proved unsatisfactory, by teachers at some London schools.

The team of workers and the span of time over which the dictionary was composed leads to great confusion about the sources, which Starnes tried to resolve, but he ultimately left many questions unanswered. One would expect the primary source to be Eliot's Bibliotheca, but in fact most of the borrowings from Eliot are rather from Cooper's Thesaurus, or Cooper's reworkings of Eliot (1548, 1552, and 1559). In addition, there is evidence that Baret consulted the works of Robert Estienne, Jean Véron, Richard Huloet, John Higgins, and Peter Levins as well (see Starnes 1954: 188-205). And of course, these authors were borrowing from each other, leaving behind quite a difficult task for the scholar interested in tracing sources.

The Alvearie of 1573 has English headwords, followed by Latin equivalents, Latin examples, and French equivalents. Occasionally Greek equivalents are provided as well; the 1580 Alvearie has more complete coverage of Greek, and adds more than 250 proverbs and some new vocabulary. This brief description of the contents however fails to represent the real complexity of the dictionary, and it is precisely the complexity that supplies the book with its greatest interest for modern scholars and the greatest challenge for the computerized structuring of the text.

Indeed, when one tries to construct the SGML tags for adequate description of the dictionary, all the problems leap to the fore. Many words that look like headwords and act like headwords do not receive the structural representation that Baret reserved for headwords, e.g., "Alehouse", buried in the entry for ALE, and thus not capitalized, printed in large bold, or preceded by the mark "¶" as are the headwords. In fact, even though listed under the main entry ALE, it is presented as an equivalent to "tipling house". This combined subentry is marked by an asterisk and given its own number in Baret's indexing system, for the Latin and French equivalents are found in the index. In addition to the twinned English words, the entry has three Latin equivalents, a French equivalent, and, in the 1580 edition, a Greek equivalent.

Creating an adequate representation of the information included just in the entry ALE illustrates many of the challenges. The entry has one headword (Ale), two subentries that are marked with asterisk and numbered ("Tipling house/alehouse" and "A common haunter of alehouses, or vittailing houses"). Between the main entry and the first subentry are two other phrases in which the word ale is used, once as an adjective ("ale cellar") and once qualified by an adjective ("stale ale"). Between the two subentries are three more expressions in which the noun ale is qualified by an adjective. The second subentry is a phrase, treated as a full subentry even though it is not a single English word.

The main entries and the subentries are indicated not only by the external markings of font and special characters, but also by the contents of their paragraphs. The Latin equivalents of nouns in these sections will include genitive case forms, and note the gender, and verbs include principal parts. In sub-subentries these are not marked.[2] These entries also include other vital types of information that must be differentiated in an encoding scheme aimed at scholars rather than typesetters. This includes multiple equivalents, definitions / encyclopedic descriptions, etymological explanations, grammatical classification, commentaries on spelling and pronunciation, attributed and unattributed examples, and finally translations of examples. Each of these types of information might be referenced to any of the four languages represented, and frequently in an unbalanced way that makes encoding especially complicated and dangerously misleading.

2.1. Multiple Equivalents

In the 1580 edition of the Alvearie there are seven equivalents in English for the sub-entry "A common haunter of alehouses, or vittayling houses": aleknight, tipler, tospot, quaffer, quasser, rinsepicher, blowbottell; six Latin equivalents, four Greek equivalents and one French. Sometimes matching these up appropriately is important, as in the series of the subentry "Aide: stay: support: little stakes that stay up the vines: a proppe". The French equivalents match up well, with two equivalents (and some typically English confusion about gender) for the last term: Aide: Appuy: Support: Eschalas de vigne: une estaye, une estançon. Thus if one wanted to see what French words were considered equivalent to a certain English word over an extended period, using, for example, the versions of Palsgrave and Cotgrave that Ian Lancashire has created,[*1] one would have to list all words as equivalent of all others in such series, or else make a reasoned and numbered matching. The first solution is quicker, but potentially misleading; the second more accurate, but time-consuming and uncertain. In any case, just the simple numbering of the equivalents, and their relationship (numerically) to the equivalents posted in other languages, is of substantial interest for the history of lexicography. For how many English headwords are English equivalents provided? This gives a clue to the expansion of English vocabulary at a time when creating neologisms was a patriotic duty, and also furnishes the basis for the development of monolingual lexicography. How often is one English word related to two, three or four Latin words? or multiple French words? How often are multiple English words related to a single word from another language? The answers to these questions can provide insights into the lexicographic traditions and into lexicological development. The fact of multiple possible translations was a potent tool in helping the lexicographer unravel the knots of polysemy and marking such cases helps the modern historian of lexicography trace the development of these techniques and relate them to the expansion of coverage. Of course the words themselves promise invaluable evidence of first attestations, and changes in meaning.

2.2. Definition / Encyclopedic Description

This type of description can appear in any of the three original languages of the Alvearie.

English: the Aier, one of the foure elements: sometyme winde.

French: an Aker, Acra, æ, f. ge. Iugerum, ri. Arpent. Autant de terre que deux beufz accouplez peuuent labeurer en un iour.

Latin: Ale, Ceruisia, æ. f. g. [...] Potus ex aqua & frugibus confectus, & præcipuè es hordeo madefacto, & rursus clibanis siccato. De la cervoise.

This can also refer to the short phrases in which the entry word is put in context, as in "Aldermen of the citie". Frequently these distinguish between various alternative meanings, "The ball of the eye" (prunelle), "The ball of the hand" (paume); or the various meanings of ALL: "All the wholle", "All without exception", "All, every one seuerally". Baret goes to greatest lengths here to explain the differences among the Latin equivalents:

All. Omnis, & hoc [...] Differt autem Omnis à Toto, quod Omnis ad numerum refertur, Totus ad quantitatem. Totum dicimus corpus aliquod integrum: ut Tota domus. Omnis de vniversis, numero distinctis, ut Omnis grex. Confunduntur tamen plerunque.

It is by the explanation of the difference, instead of mere listing of alternate equivalents, that this procedure is distinguished from the preceding category (multiple equivalents).

2.3. Etymological Information

One of the substantial additions of Baret to the English dictionaries that preceded was the inclusion of some etymologies, however partial. Starnes (1954: 210) states that there are about a hundred of these, and Stein (1985: 283) adds the observation that a disproportionate number of these occur in the letter A, suggesting that this was a late project never completed as planned. While this distribution of etymological information is of interest for the history of lexicography, the distribution of language attributions may hold the key to understanding Baret's intentions in expanding this area. The majority of Baret's etymological commentaries trace words to French origins, at least initially, although the French is then sometimes traced back to the Latin, as in:

Attaynted seemeth to come of the french woorde Teinct, which is also deriued of the latin Tinctus. Infectus. Colore vel humore imbutus. And of the old latin woorde Attinctus, came of the feyned french Atteinct, which we yet vse in englishe. There may be also some proouable reason that it cometh of this french woorde Esteincte, which is in latin Extinctus of Extinguo, to put out and so leauing for the s (as frenche men dooe pronounce) it is almost our woorde Atteynt.

Both Stein and Starnes state that many of Baret's comments come from Calepino and Robert Estienne, but when the majority of the etymologies trace the relationship between English and French, this clearly cannot be the case. Sledd (1946) adds the observation that Baret offers few etymologies for words with Germanic roots, and concludes from this that Baret knew little of the history of his own language. This is belied by the references to the thorn and its value in the preamble to the letter D and other references to Saxon usage. Whatever the case, a safer explanation for both of these facts may be that Baret had a particular reason for being more concerned with words borrowed from French. This reason, I hypothesize, is his connection with the legal profession, at a time when Law French was being challenged as the language of the courts and the English language was gaining confidence. The etymological examples frequently relate to the law: FELON, BATTER, and GAILL. Although all the etymologies cannot be justified by this theory (e.g., GARGOYLES), this would explain the reference to "feyned French", and, for the most part, the distribution of words for which etymologies are offered, and the distribution of languages to which English words are traced. To follow such developments so that we can accurately assess these factors, the tagging system must note not only the presence of etymological commentary, but the languages mentioned. Furthermore, sometimes the etymological information is not between languages but within one, particularly Latin. While these are rarely original, comments such as "Obtrecto, tas, ex Ob & Tracto, a in e mutata" need to be marked for easy retrieval if we are to get a full picture of all the types of etymological information being included.

2.4. Grammatical Classification

In its simplest form, this entails listing the part of the speech of the word in question, as in:

Universally: generally: altogither. Vniversè. Aduerb. Generalement.

This is expanded somewhat in the entry for AH:

Ah, is an interjection of sorrowing, applyed also to other affections: as indignation, desire, rebuking, correcting, laughing, etc. Ah alas. Ah dolentis. Virg.

Another frequent type of grammatical comment is the mention of the gender of nouns, often abbreviated. Baret occasionally notes as well if nouns are always plural (crepundia, under BABIE), the fact that a form is a diminutive (posticulum, under BACK). Tagging grammatical information as such is useful for the understanding of the relationship between grammatical and lexicographical practice and theory. Which grammatical categories require specification, and why? Are all forms that ostensibly belong to those categories so marked?

2.5. Spelling and Pronunciation

The most frequent use of this type of information is to show the accentuation of the Latin words, by noting if the penult is long or short (producta, correpta). However, this is not the sole function of these comments. In some cases they can show related words, as in AKE, which Baret describes as "the Verbe of this substantive Ach, Ch. being turned into K." Sometimes alternative spellings are explained, as in the syncopated form of vinculum ("Vinculum [...] vinclum per syncop." under BANDE).

The most complete commentaries on spelling come at the beginning of each letter, in the entry devoted to the letter itself. These commentaries are so detailed, with specific references to the work of other orthographers, that Sledd described them as the first major effort for spelling reform. This characterization contradicts Sledd's own account of the relationship between Baret's work and the earlier efforts of John Hart, cited by Baret in his preamble to the letter I, and of Thomas Smith, also explicitly cited by Baret.[3] This practice of adding long preambles concerning each letter is nonetheless unique among the English dictionaries of the time and represents one of Baret's real additions.

2.6. Examples

In form, the examples are not differentiated from the sub-subentries: both have the same font and lack the morphological information, but they serve different purposes and should ideally be distinguished for study of lexicographic practice. One clue to separating these two categories is that the examples have only Latin equivalents, and not French or Greek. Another is that the verb of the subentries is usually presented as an infinitive in English (e.g., "To succour"), whereas most often the verb is in a conjugated form in an example. However, the distinction is not always straightforward: what looks like a full sentence or at least a full proposition sometimes turns out to be a lengthy equivalent of a single word.

Easiest to judge are the full-sentence examples:

Ah, ah I dye poore wench in laughing thee to scorne. Ah, ah, ah, perij defessa iam misera sum te ridendo. Ter.

Let the Graners be aired with the Northen windes, let in at small windowes. Granaria modicis fenestellis aquilonibus inspirentur. Colum.

He hath payne in his head, or his head aketh. Laborat è dolore capitis.

He is diseased in his eyes and he hath the tooth ake. Laborat è dolore oculorum & dentium.

Even here we can and should distinguish between attributed and unattributed examples, for a good study of Baret's lexicographic technique should account for the source of the example and why this example in particular was selected. With attributed examples we would want to compare the example selected against the examples possible in the source text; in the case of unattributed examples, we would want to determine why this example was created (if indeed it was his own invention).

What feature of the head-word is illustrated in the example? In the first example, the relationship is fairly straightforward, in that the link between the head-word AH and the Latin equivalent "Ah" is already established. The example shows the use and effect of repetition. In the second example, things are a little more confused. The example is at the very end of the entry, but nowhere in what has preceded is AIR used as a verb; the link between "to air" and "inspiro" been established. In the last two examples, no Latin equivalent was ever given for AKE; this may explain why the two examples are non-attributed. What is confusing is the repetition of identical constructions, "Laborat è dolore" followed by the genitive of the noun representing the afflicted part of the body. The situation is further complicated by the fact that none of the Latin equivalents for AKE are listed in the index of Latin words; thus making the Latin inaccessible.

The use of examples thus brings up a number of questions that can only be answered if different types of examples are marked in different ways. The features that need to be distinguished are: attributed vs. unattributed examples; relationship between headwords (or subentries) and examples; the link between numbered index listings of Latin words and the examples. Once these basic categories have been established, one might want to elaborate more complicated and subjective classificational schemes, as for instance one that would characterize the difference between the examples for AKE, dolet and laborat è dolore, as distinguishing alternative syntactic structures for expressing the same English idea.

3. Database Production and Distribution

All this brings us to the problems of representing these features of the dictionary, distributing them, and making possible the addition of layers of description after the original database has been produced. The minimum description needed reproduces the physical nature of the text, and at this a number of systems are adequate. Our ultimate goal in approaching this dictionary is to make it available over the network, with a marked-up base text linked to additional mark-up and commentaries from any scholars who wish to share their contributions.

When I and my research assistants approached this text in the mid-1980s, we were pointing towards a specific product, that would appear in print format: a kind of concordance of all French and English lexical materials from the Middle Ages and Renaissance. Each English word would be matched with the French equivalents proposed, listed in chronological order. At that point the database management options were limited, and we had never heard of SGML. As a result we culled from individual entries the French and English equivalents, to the extent that that was possible given the complexities of multiple entries of the kind described above. These we placed into an inadequate database program, and ultimately even the project we wanted to do proved impossible. Worse yet, anyone wanting to find some other kind of information from these dictionary sources (including ultimately ourselves) was simply out of luck.

Now much better full-text database managers and search programs are available, but interaction with SGML, often promised, is still difficult to come by. Creation of SGML documents is still quite a cumbersome affair, and taking full advantage of documents so prepared is not built in to most applications. Such efforts are underway, however. In conjunction with the National Center for Supercomputing Applications at the University of Illinois, we are pursuing projects that entail the integration of SGML-tagged documents in full international network distribution, with layers of tagging and reasonably powerful search mechanisms. One team is proposing the development of tools to be used within the WAIS (Wide-Area Information Server) format developed by Thinking Machines Corporation. Another is trying to extend the capabilities of Mosaic, a program created at the NCSA for networked distribution of documents and hypertext linking within such documents.

4. Conclusions

The lesson to be learned from our original project is that one cannot foresee the wide range of uses for which scholars might some day want to explore the documents we are producing. Even as I have tried to elaborate some of the types of information that I think must be captured in a representation of Baret's dictionary, we can all be sure that this list fails to include many features more imaginative scholars will discover. My research is a collaboration between my ideas and the ideas of the scholars who have preceded me: Sledd, Starnes, and Stein. We need to insure that the research tools we develop will not only permit but encourage greater collaboration in the future. Therefore we need the most flexible and openly available systems that current technology can provide. With the help of such tools, perhaps one day we can determine if Baret really did know what he was doing.


[*1] Editors' note: Cf. I. Lancashire, CHWP, B.17.

[1] For example: "What is not clear, however, are the criteria by means of which Baret decided which of the headwords were worth making retrievable and which not" (Stein 1985: 280).

[2] For instance, "a baker of bread" is a subentry under the main entry to Bake. For the verb two Latin equivalents are given: Pinso, sis, sui, sere, pinsum; Coquo, is, coxi, quere, coctum. For the agent noun Baret provides several Latin equivalents: Pistor, ris; artocopus, pi; furnarius vel fornacarius, rii; Panifex, ficis; Panificus, ci. However, for the sub-subentry "A baker of fine cakes, or like thinges", no morphological information is furnished for Pistor dulciarius. One might argue that this would be repetitive, because the information has just been provided above for pistor. However the same cannot be argued for the other sub-subentries in the same heading: "To exercise bakers craft" (Furnarium exercere), "Bread baked in an oven" (Furnaceus panis) or "A pastie or pie of fishe baked" (Piscis operi pistorio incoctus).

[3] Hart is mentioned in the preamble to the letter I, concerning the representation of "I consonant" (<j>): "Wherein you may be better resolved in ye will consult with Maister H. Chesters booke which he hath diligently written of Ortographie, after long and painfull trauaile (as it well appeareth) in sundry languages". Hart himself cites the work of the French orthographic reformer Louis Meigret. Smith is cited in the preamble to the letter H, for example, concerning his suggestion that new symbols be invented for the sounds which English represents by combinations of consonants with h (<ch>, <th>, <sh>).


