"The history of lexicography in the Middle Ages", Olga Weijers tells us (Weijers 1989: 152), "is much less interesting for its matter than for its form." There are of course many aspects to dictionary form but the one that shall interest us here is the form of the dictionary entry and its development in medieval lexicography. Even within the nascent entry form of the earliest medieval dictionaries there develops an association between the information conveyed and the location in which it is to be found. It is true that the options for ordering the material are limited, yet it appears that many of the early choices made for the form of the medieval dictionary entry would have far-reaching effects on the organisation of later lexicons.

Every dictionary entry has as its base two poles, the headword and the definition, or lemma and gloss, reflecting the origin of all glossaries. But around these two poles other information is soon added as dictionaries take on a more formal pattern. Some of that information qualifies the headword, some expands the definition, while some relates to both. For convenience we can call that information which is non-lemmatic and non-definitional metalanguage. In a recent study of metalanguage in medieval lexicography, I drew attention to the privileging of three positions in the dictionary entry besides the headword and definition where metalinguistic terms might be found, namely the post-lemmatic, the post-definitional and the marginal (Merrilees 1991). That study was based on a fifteenth-century bilingual, that is Latin-French, dictionary, the Dictionarius of Firmin Le Ver. In his layout and organisation, Le Ver was drawing on a long tradition of medieval dictionary-making and it will be the purpose of this paper to look at the transmission of some of that tradition. The implications for the shaping of the modern dictionary should be self-evident.

1. Basic structures of the dictionary article

The minimal structure of a glossary, lexicon or dictionary entry consists of two parts, the lemma or headword and the gloss or definition. This description is sufficient to cover some very early and rudimentary texts such as the Glosses of Reichenau, but as early as the eleventh century and the Elementarium of the Italian Papias dictionary entries were already developing other constituent parts. Their nature is hardly surprising: their position can be defined by the relationship to the lemma and the definition. What is striking however, as we will try to show, is the gradual association of certain kinds of metalinguistic information with each of the positions. Although these are far from rigid even by the time printing began to fix some of the medieval practices, the correlation of information and location can be traced through the evolution of the manuscript dictionaries of the Middle Ages.

It is not surprising to find that the most basic structure of lemma + definition (L D) is the most frequent individual type in Papias' Elementarium doctrinæ. In a preliminary sample count from a twelfth-century manuscript (BN lat. 7611) I found 43 LD examples out of 115 entries, or nearly one in three. This sample was too small to be useful and I therefore analysed longer samples from a fifteenth-century printed copy of Papias taking whole letters at different points in the text. The letters B, M and T were imported from an ASCII text into a database file that used four fields, lemma, postlemma, definition and postdefinition to divide the material of the dictionary entry. Queries were run on the contents of the various fields and simple statistics generated from the results. For the entries consisting of simply lemma + definition, the proportion was higher than in the small manuscript sample: the letter B had 54.7% of its entries in that category, the letter M 46.7% and T 51.8%, giving 49.9% for all sample entries taken together or roughly one in two. This was confirmed by the only portion based on an edition from manuscript sources, the letter A, where 46.9% show the LD form (De Angelis 1977).

These figures however mean that a half or more of the entries did not have the simple LD structure and that the rest of the entries brought the other two positions, the post-lemmatic and the post-definitional, into play. In many manuscripts of Papias, the marginal position is also used, but its application seems at some variance with Papias' intention as stated in his prologue and we shall leave discussion of it aside for the moment.

1.1. The post-lemmatic position

The post-lemmatic position (PL) is commonly used by Papias who put the following kinds of information immediately after the lemma:

  1. definitional connectors such as dicitur, interpretatur, est etc., sometimes .i. (id est). This latter will be more frequently used by later lexicographers, esp. Le Ver where it dominates the PL position. Le Ver's Dictionarius has some 28,000 examples all told of "id est", most in a PL position. In Papias definitional connectors account for approximately 43% of the cases;
  2. indication of language, especially the term grece or a Greek form (27%);
  3. information on etymology, including derivation or composition (15%);
  4. grammatical information including parts of speech, esp. adverbium, and attributes such as gender or orthography (10%);
  5. occasional confirmative or emphatic elements: vero, generale, proprie.

The PL position itself is used in 30.1% of the entries in the sample letters, in some cases without any definitional material following (see below).

1.2. The post-definitional position

The other position that Papias uses is the post-definitional (PD) in approximately one entry in four (25.3%). In some cases it is difficult to label some information as post-definitional when it is in effect an extension of the definition or to distinguish categories of post-definitional material, particularly between quasi-etymological explanations and other expansions of the definition. However we have, with some caution as to the exactness of the percentages given, divided the PD examples as follows:

  1. etymological and derivational information (32%);
  2. definitional expansion (28%);
  3. definitional connectors (12%);
  4. language or language reference (11%);
  5. grammatical and phonetic or orthographic information (7%).

1.3. Marginal information

In the prologue to his Elementarium doctrinæ, Papias promises that his dictionary will provide much useful information, including indication of gender, declension and tense:[1]

Therefore we shall designate masculine with an .m., feminine with an .f., neuter with an .n., words of two or three common genders with a .c. or an .o., and those which are doubtful in a similar way [...]

In many of the Papias manuscripts, the scribes set out to place such information in the margin, along with names of authorities identified:

But the names of some authors will be written in the margin by means of their first several letters for the identification of their words.

This last is carried out much more faithfully in most manuscripts than the promise to provide grammatical information and the marginal grammatical indications usually drop out after three or four folios. The marginal position (M) then is largely reserved for references, not always sustained beyond the early folios, but some grammatical information is also found there.[2] As Daly and Daly state (Daly & Daly 1964: 234), the programme to provide grammatical and etymological information promised in the prologue "is carried out but indifferently" in the manuscripts they had seen. Nonetheless they note an awareness by Papias of an overall lexicographical plan.

2. Basic entry types in Papias

In sum an entry in Papias could have five parts, or fields, L PL D PD M, but in this analysis I have set aside the marginal position because its use is not consistent nor statistically significant in the manuscripts I have examined. However it is important in terms of the history of dictionary making to remark on its presence. In all there are five possible entry combinations:

  1. L D (lemma + definition) 50.0%;
  2. L PL (lemma + post-lemmatic metalinguistic information with no definition) 5%;
  3. L PL D (lemma + post-lemmatic information + definition) 19%;
  4. L D PD (lemma + definition + post-definitional information) 19%.
  5. L PL D PD (lemma + post-lemmatic information + definition + post-definitional material) 7%.[3] (See Figure 1.)

3. Hugutio of Pisa and Johannes Balbus of Genoa

The next two great lexicographers in the medieval chain are Hugutio of Pisa (12th c.), author of the Magnæ Derivationes, and Johannes Balbus of Genoa (13th c.), author of the Catholicon, the first the principal source for the second. Both works take some content and methods from Papias.[4]

The grand principle of organisation for Hugutio is, as the name of his text implies, the linking of words derivationally, a methodology that has the very obvious result of increasing the use of derivational connectors such as inde, unde, prepositions ab and ex. These usually occur in the post-definitional position and lead to the introduction of a new subheadword. Along with the derivational connectors there is also an increase in the frequency of the definitional connector .i. (id est), used post-lemmatically, in most cases after the derived subheadword. Overall there is an increase, compared to Papias, in the provision of information concerning derivation and composition, although Papias shows examples in both PL and PD positions. The derivational emphasis is carried over to Balbus who introduces a greater degree of alphabetization than Hugutio while preserving many of the derivational links.[5]

Hugutio, Magnæ Derivationes

Machia {grece latine dicitur} pugna {unde} hec machina .e omne quod in genio paratur ut muralis balista et similia et proprie illa que ad pugnam parantur {licet} {et} quelibet artificiosa compositio vel constructio {dicatur} quandoque machina, {unde} hec machinula .le {diminutivum} et machinosus .a .um plenus machinis vel argumentosus ingeniosus ad machinas faciendas vel insidiosus {quia machina sepe invenitur pro} insidiis et machinor .aris machinas facere. [...] Item machina hic Machinis et sunt machina instrumenta edificiorum {sic dicta a machinis} [...] Machina {componitur cum 'siche' quod est anima et dicitur} sicomachia .i. pugna anime {et cum navis et dicitur} hec navimachia .i. navalis pugna [...]

Balbus, Catholicon

Machia {grece latine dicitur} pugna {et acuit penultima}
Machil est tunica talaris [...]
Machina {a machia dicitur} hec machina .ne: omne quod ingenio paratur: ut manualis (sic) balista et similia: {et proprie} illa que ad pugnam parantur {licet etiam} quelibet artificiosa compositio vel constructio {dicatur} machina et invenitur [...]
Machinis: {a machina .ne dicitur} hec machinis .nis et sunt machines instrumenta edificiorum [...]
Machinor: {a machina dicitur} machinor .naris .natus sum .nari .i. machinas facere [...]
Machinosus: {a machina dicitur} machinosus .sa .sum .i. plenus machinis vel argumentosus et ingeniosus ad machinas faciendas vel insidiosus: {quia machina sepe invenitur pro insidiis}
Machinula .le {diminutivum} parva machina

Despite the more discursive nature of their two texts, Hugutio and Balbus use the same positions as we have noted for Papias, except the marginal, and for very similar kinds of information. Balbus, as we note in this example, uses the post-lemmatic position in particular for etymological information, while in Hugutio these links are more often conveyed by unde in the post-definitional position. What is less frequent in both is the simple juxtaposition of headword + definition, the L + D structure that was so prevalent in Papias. The discursive nature of the organisation of these texts renders them less easily divisible into fields and I have not attempted yet to analyse them along the lines of the Papias material.

4. The Dictionarius of Firmin Le Ver

Papias, Hugutio and Balbus, but most particularly the last, are sources for a dictionary that would appear to combine all the advances made in medieval lexicography, the Dictionarius (DLV) of Firmin Le Ver, a Cartusian prior of Abbeville in north-eastern France, who compiled his bilingual work between the years 1420 and 1440. I have noted elsewhere how Le Ver used the two metalinguistic positions under discussion here, namely the PL and the PD, but it was also noted that he incorporates an extensive use of the marginal position for giving grammatical attributes of gender for nouns and voice for verbs. As far as I can tell the use of the margin comes directly from the Latin-French Aalma, a much reduced bilingual form of the Catholicon, and indirectly from Papias.[6] The following DLV entries for Machia, Machina and derivatives show Le Ver's use of the three positions, the uses again put within braces:

MACHIA - {grece, latine dicitur} pugna, {unde}
Machina, machine - {.i.} omne quod ingenio paratur

ad pugnam ut balista et similia et manganum;
licet et quelibet artificiosa constructio dicatur quandoque
machina .i. abalestre, mangonniau, perriere
ou aultres engins pour bataille ou bolevert, {unde}

Machinor .naris .natus sum vel fui, machinari                              {d}

- {.i.} machinas facere vel parare, co<n>struere .i.
faire et appareillier telz instrumens de guerre

Machinari {eciam dicitur} cogitare et proprie malum .i.              {d}

penser mal vel astute insidiari, moliri

Machinatus .a .um - {nomen et participium} - machinés, pensés
Machinatio .tionis - machinemens en mal
Machinamen .minis - idem, molimen, cogitatio in malum vel insidie
Machinamentum .menti - idem
Machinor .naris {componitur} Commachinor .naris

MACHINIS, huius machinis - {a *machina dicitur}                    {f}
Machines {sunt} instrumenta edificiorum, {dicta

sic a} machinis quibus insistunt propter altitudinem
parietum .i. eschafaus pour faire edefiches

MACHINOSUS - {a *machina, machine dicitur}
Machinosus .sa .sum - {.i.} plenus machinis

vel ingeniosus ad machinas faciendas
vel insidiosus, quia machina {sepe invenitur
pro} insidiis {et comparatur}

Machinose - {adverbium - .i.} insidiose {et comparatur}
Machinositas .tatis - {.i.} insidiositas

MACHINULA .le - {diminutivum} - parva machina, idem {est}

We should point out that Le Ver reconfigures the Balbus material into macro-entries which allow convenient alphabetical searching through a base headword while preserving derivational links in an ordered set of subheadwords (Merrilees, forthcoming). Layout of course plays a major part and we must admit that our samples of Hugutio and the Catholicon do not reflect some of the very interesting progress that was being made by copyists in setting out those lexicons. Le Ver, too, draws on the work of earlier scribes to create a layout which is a model for showing off the text to the best visual advantage and which through spatial and texual organization provides a highly searchable document.

5. Analysing the material

My first analysis of Le Ver's Dictionarius using the interactive concording program, WordCruncher, noted some simple characteristics of Le Ver's use of the post-lemmatic, post-definitional and marginal positions. For the marginal position one could already observe from the manuscript that only abbreviations relating to gender and voice were present, a practice Le Ver adopted from the Aalma. WordCruncher showed that approximately one headword or subheadword in three was accompanied by such a marginal indicator and that the other two metalinguistic positions had very few indications of gender or voice. However, a good deal of other grammatical information was noted in the post-lemmatic position. For example, adverbium appears 4200 times in the DLV, virtually every occurrence is in the post-lemmatic position. Another example of a growing association between the kind of information and its location is found in phonetic description. Phonetic information is given in two ways, either as an ablative absolute, like media correpta or penultima producta, or as a full verb, as in corripitur, producitur. Our analysis showed that the absolute expressions are almost invariably in the post-lemmatic position and the full verbal expressions are mostly post-definitional. This might be determined by the natural order of Latin syntax, but there are other cases where the full verb form is post-lemmatic as well as post-definitional, particularly the definitional connectors such as dicitur, interpretatur, etc.

For a closer analysis of Le Ver's use of the post-lemmatic and the post-definitional positions, text was imported as a delimited ASCII file into a database, in this case Paradox for Windows, and the query function of that program was used for searching the data. The task was both simpler and more complex than for the Papias. Although the divisions within the entries were for the most part clearer, the database needed to account for the headword, the subheadword(s), post-lemmatic information, all definitional material in two languages, metalinguistic information that occurred within the definitional material, post-definitional material again in both languages, and finally marginal material. Because the definitions could be both Latin and French with the Latin sometimes first, sometimes with the French first, and both appearing after some intra-definitional item or information, it was decided to allow four fields to handle the definitional possibilities. This meant designing a database with ten fields:

  1. Lemma
  2. Sub-lemma
  3. Post-lemmatic
  4. Definition 1a
  5. Definition 1b (allows for a change of language)
  6. Intra-definitional
  7. Definition 2a
  8. Definition 2b (allows for a change of language)
  9. Post-definitional
  10. Marginal

Not only did this structure permit analysis of the use of the metalinguistic positions, it also allowed some examination of the role of French and Latin as well as giving a detailed profile of Le Ver's entry patterns. (See Figure 2.)

For the post-lemmatic and post-definitional positions, some of the tendencies that were noted in Papias are more fully developed in Le Ver.

Firstly, the post-lemmatic position is more frequently used in the Le Ver samples (approx. 60% of entries) than in the Papias samples (approx. 30%). The most frequent type of PL information is again the definitional connector, 56% in Le Ver, 43% in Papias; the two other most frequent categories in Le Ver are grammar (including phonetic or orthographic information) 22% and etymology (including derivation and composition) 19%. Papias has 10% and 15% for these two categories. Language indicators or references to a language other than Latin are frequent in Papias (27%), though this is often the citing of a Greek equivalent; in Le Ver an indication of language accounts for just under 2% of PL cases.

For the post-definitional position etymological and derivational information is again the most frequent (55%), but grammatical/phonological information is here more substantial than in Papias (21%). Definitional expansion is less ambiguous in Le Ver and occurs in a smaller proportion of cases (8%). Definitional connectors and language reference, the third and fourth most frequent categories in Papias, are almost eliminated in Le Ver (less than 1% combined). Cross-referencing and references to authorities constitute 14% of PD material in Le Ver. (See Figure 3 and Figure 4.)

6. Conclusion

In looking at the four lexicographers mentioned so far, I am conscious that a concentration on Papias and Le Ver, four centuries apart, is an over-simplification and that more needs to be done on other early lexicographers such as Osbern of Gloucester, and that a means of dealing with the very complex Hugutio and Balbus texts also needs working out. I need hardly add that any attempt to encode dictionaries within a database framework is likewise fraught with oversimplification.[7] But there do seem to be conclusions to be drawn even from our simple samples. Firstly, with each successive lexicographer in the medieval transmission there is an increasing amount of metalinguistic information to be found. Secondly, the arrangement of this information becomes more ordered with each lexicographer. By the end of the Middle Ages, a lexicographer such as Le Ver had produced a dictionary the ordering processes of which may be observed at several levels. In this paper I have treated the dictionary as a database with a set of records each of which has a certain number of fields. One might also have chosen to treat the entry structure as a form of grammar with the headword as subject or theme and the rest as parts of the complement. Whatever model is appropriate, it is difficult to deny that a very sophisticated framework has been provided for the information to be conveyed.


[1] The translation appears in Daly & Daly 1964: 232.

[2] Many manuscripts contain a list of the abbreviations to be used to designate authorities.

[3] Error: ± 1%.

[4] The texts used for this analysis were Hugutio 1925 and Balbus 1971.

[5] In the following excerpts the relevant metalanguage is enclosed within braces.

[6] Roques 1938 omits not only the marginal information but also much of the metalinguistic material in the body of the entries.

[7] As William Kent reminds us (Kent 1978: vi): "Information is too amorphous, ambiguous, elusive to be pinned down precisely by the processes of computers."