Numbering the streaks of the tulip? Reflections on a Challenge to the Use of Statistical Methods in Computational Stylistics [1996, rptd. 2008]

J F Burrows

Abstract


Are statistical methods, which take randomness of data as their starting-point, appropriate to the study of something so highly systematic as the English language? The challenge seems justified by the non-random effects of context or semantics, transition, and recursion, despite the significant degree of unpredictability that remains. Yet valid statistical analysis often proceeds by assuming no such effects exist (the "null hypothesis"), then establishing whether they do. Furthermore, the postulate of randomness is not essential to descriptive statistics, which meets most of the requirements of computational stylistics.

The challenge may, however, have force in the area of predictive statistics, where the relationship between a specimen and a named population is in question and the notion of a random, representative sample is crucial. In answer, the author proposes the idea of specimens from a repertoire instead of the statistician's usual samples from a population, and looks forward to the establishing of a "grammar of probabilities" to replace the abstract postulate of randomness.

Keywords


computational stylistics, statistics, statistical analysis, randomness, probability, authorship, language, word-frequency

Full Text: HTML

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.