Validating choices: Texts in the Trésor de la Langue Française [2003, rptd. 2008]
Abstract
The Trésor de la Langue Française (TLF) database was designed, and texts chosen for inclusion in it, in the late 1950s. This paper evaluates the choices of texts made by the TLF committee in the light of a contemporary encyclopaedia of French literature and of subsequent published research in the field of French literature, provided by the MLA (Modern Language Association) online bibliography. Spearman's rank correlation coefficient, and outlier analysis based on Mahalanobis distance evaluate similarities in the sets of data. The conclusion is that the choices made in the mid-twentieth century were a reasonable reflection of scholarly interests both at the time the database was constituted, and subsequently up to the present. The method described in this paper is applicable to the evaluation of other full-text databases.
Keywords
Literature; corpora; French; statistics; Trésor de la langue française; Modern Language Association
This work is licensed under a Creative Commons Attribution 3.0 License.