Patterns of Sentimentality in Victorian Novels


In this article, I discuss a project in which I use text mining and corpus linguistics to explore patterns of sentimentality in mid-Victorian novels. The strong and formulaic conventions of sentimentality as a genre enable me to investigate how text analysis, machine classification, and word cloud visualisations can reveal low-level patterns that correspond to the higher-level formations of sentimentality. Specifically, I use the Naïve Bayes classification algorithm and Dunning's Log Likelihood Ratio (through the MONK Project and WordHoard) to identify instances of sentimentality and isolate the words that are most salient in affective writing. I also use the study to think through issues raised when employing computational text analysis and distant reading to inform critical positions.


Dans cet article, je présente un projet dans lequel j’utilise la fouille de texte et la linguistique de corpus pour explorer les motifs de la sentimentalité dans les romans du milieu de la période victorienne. La sentimentalité en tant que genre repose sur des conventions fortes et idiomatiques qui me permettent d’examiner la manière dont l’analyse textuelle, la classification automatique et la visualisation en nuage de mots peuvent révéler des motifs de bas niveau correspondant, à un plus haut niveau, à la formation de la sentimentalité. Plus particulièrement, j’utilise l’algorithme de classification naïve bayésienne et le rapport de vraisemblance de Dunning (grâce au projet MONK et à WordHoard) pour identifier les occurrences de sentimentalité et isoler les mots les plus saillants dans l’écriture affective. Dans le cadre de cette étude, je réfléchis également aux problèmes rencontrés lors de l’utilisation de l’analyse textuelle et de la lecture distante pour soutenir des positions critiques.             


Computational text analysis, sentimentality, classification, quantitative analysis, MONK Project

How to Cite

Steger, S. (2013). Patterns of Sentimentality in Victorian Novels. Digital Studies/le Champ Numérique, 3(2). DOI:


Download HTML







How to "not read" affectively

Affective reading has, of late, become a critical focus in novel theory, but it is not exactly a new form of criticism. According to Nicholas Dames, Victorian novel theory, originating with G.H. Lewes, E.S. Dallas, and Alexander Bain, was driven by physiological investigation and literary analysis. As Dames explains it, the Victorian reader often thought of the novel as a mode of consumption, rather than as an object or as compositional practice (2004, 209). Whereas Post-Jamesian novel theory has focused on epistemology, these earlier theories emphasised a"poetics of affect," asking "how do we feel in fiction? in [sic] what ways, and with what rhythms, does the novel ask us to feel, to pay attention, to drift?" (2004, 209). Dames goes on to argue that nineteenth-century novel theory showed an "interest in order, or sequence, and the affective workings of various fictional sequences" (2004, 209). He ultimately reconsiders the novel as an engine for the production of affect, and he encourages his readers to conceptualise how physiological novel theory "suggests approaches to reading that might offer us ways to ask new questions of the novel’s role in culture" (2004, 215).

Scholars such as Richard Walsh, Robyn Warhol, and Suzanne Keen are employing affective reading to reshape the ways we think about sympathetic responses to reading novels. In his influential article, "Why We Wept for Little Nell: Character and Emotional Involvement" (1997), Richard Walsh argues that Nell’s death in Dickens’ The Old Curiosity Shop" exposes in an acute form the problem of emotional response to a fictional narrative"—a response that proves problematic because it is not culturally-specific (1997, 308). Walsh suggests that we re-think our model of character to understand how our emotional involvement hinges upon "the recognition of values inherent in the discursive information given by a narrative, rather than in the actuality of the characters this information generates" (1997, 312). Taking a slightly different approach to emotional response, Robyn Warhol, in Having a Good Cry: Effeminate Feeling in Pop Culture Forms (2003), focuses specifically on the physical reactions that take place in a reader’s body during the act of reading and explores the narrative techniques that inspire affective response. In Empathy and the Novel (2007), Suzanne Keen likewise considers readers’ reactions to narrative but is specifically interested in social problem novels and the empathetic responses they inspire.

Affective reading has especially, and naturally, found proponents in those who study sentimentality. As critics rethink the notion of Victorian sentimentality as a naïve form of writing, they often also rethink and re-employ a criticism that has likewise been considered naïve: reader response (see, for example, Robert Solomon, Nicola Bown, and Emma Mason). The language critics use to describe reader response to sentimentality ranges from discussions of feeling and emotion, to descriptions of reading as a "somatic experience" (Warhol 2003, ix), to conceptualisations about the rather broadly-defined idea of "pathos," to considerations of the nuances between sympathy and empathy (see Keen 2007, 4-5). Correspondingly, reader response as an epistemology of affect has matured and expanded.

Yet Emma Mason claims that discussions of feeling and emotion are still considered "an outmoded and untheorised way of discussing something more complex and sophisticated" (2007, 1). The novelistic craft of evoking emotion has traditionally been looked upon with derision and even suspicion, most likely because of its associations with the derided mode of sentimentality. Rather than a physiological approach to reading, which was common in the nineteenth century, formalist modes of criticism emerged in the twentieth century as the method of the serious scholar. With them came a preference for critical distance from the text. Thus Percy Lubbock, in his seminal work The Craft of Fiction advises, "so far from losing ourselves in the world of the novel, we must hold it away from us, see it in all its detachment, and use the whole of it to make the image we seek, the book itself" (1921, 10). Nicholas Dames interprets Lubbock as saying that "in essence, the job of the critic is not to read: to extract data from the novel that make up mental wholes, to avoid everywhere the temporal flow and affective identifications that infect novel-reading" (2004, 211). Formalist criticism involves holding the book away from us in order to see the whole. The type of criticism espoused by Lubbock in 1921 bears a very strong resemblance to the "new" idea of distant reading by means of textual analysis and visualisation. Given the increase in digitally-available texts and the rise of digital humanities, literary critics are turning to computers and algorithms to read literature in new ways. Franco Moretti posits that, with the influx of information available to scholars through technological advances, a "new critical method" becomes necessary: "distant reading" (2000, 55-56). He suggests that the scholar, instead of focusing on close reading, might concentrate instead on "units that are much smaller or much larger than the texts: devices, themes, tropes—or genres and systems" (2000, 57). In practice, distant reading for Moretti often involves reading statistical information about the texts (in his case, titles) rather than reading the texts themselves—a practice that John Unsworth has termed "not reading."

So, on the one hand, formalist reading or distant reading seems completely at odds with affective reading. On the other hand, David Pugmire implies that critical distance is especially crucial when the object of study is affective writing, for sentimentality is "blatant to those who are not in its thrall, while invisible to those that are" (2005, 125). In spite of this tension between a mode of reading that is distant enough for objective evaluation and a mode of reading that acknowledges and celebrates the ability of sentimentality to draw the reader in, the two are not necessarily mutually exclusive. I employ distant reading as a means of formalising affective responses inspired by sentimentality in mid-Victorian novels. While affective reading turns the focus to the reader and his or her response to the text, my approach begins with the text itself and considers sentimentality a mode for analysis. Instead of concentrating on the results of sentimentality, as do Keen, Walsh, and Warhol, I consider its origins: I exploit the critical distance that computer-assisted textual analysis provides to isolate patterns of Victorian sentimentality. In doing so, I’m invoking Stephen Ramsay’s concept of "algorithmic criticism," in which "we channel the heightened objectivity made possible by the machine" into productive, and ultimately subjective, critical work (2011, x).

I chose to work with sentimentality partially because of its long association with formulaic language and cliché. And what is cliché but a pattern? So sentimentality should carry its pattern close to the surface, ready to be gathered and studied, making it an ideal testbed in which to explore the effectiveness of techniques for automated classification and other forms of textual analysis.

Sentimental words

To begin, I needed to form a testbed of the novels that I would study. My focus is on eighty British novels (containing 3,921 chapters) published between 1838 and 1865, framed by Dickens’ Oliver Twist and Our Mutual Friend (the novels studied were those included in the Chadwyck-Healey Nineteenth Century Fiction collection). This focus arose for a couple of reasons: first, because Dickens is, in many ways, the forefather of Victorian sentimentality. Furthermore, these years, the "High Victorian" period, also mark the rise of the Victorian social problem novel, a genre that intersects with sentimentality. After narrowing the focus of my study to these novels and chapters, I created a training set that represented about 10% of the testbed: 409 chapters. Of these, I classified 186 as sentimental, and 223 as unsentimental (because of technical limitations in being able to select text "chunks," I selected the simplest division for novels: chapters).

The first, and easiest, means of beginning analysis is to look at vocabulary. Scholars of sentimentality have begun the work of singling out some of the unique markers of emotion in novels, and indeed argue that sentimental responses can be triggered by the use of a single word. For instance, Janet Todd argues that in general, "vocabulary in a sentimental work is conventional, repetitive, mannered and overcharged" and that "the words in which emotion is described and prescribed are themselves prescribed" (1986, 5). She argues "terms such as ‘benevolence,’ ‘virtue,’ ‘esteem,’ ‘delicacy,’ and ‘transport’ indicate sentimental doctrine and expect a sentimental understanding," while "a few adjectives such as the eulogistic ‘sweet,’ ‘grateful,’ and ‘delicate’" are frequently called upon, as are "the pejoratives, ‘cruel,’ and ‘base,’ and the negatives ‘unkind,’ ‘ungenerous,’ and ‘unfeeling’" (1986, 5). Likewise, Richard Walsh argues that "affective responses need not wait upon narrative: in fact they may be brought into play by a single word in isolation (consider the emotional freight of the word ‘murder,’ or the word ‘gentle’; or less obviously, the word ‘obviously’)" (1997, 312). In a similar vein, Erik Erämetsä’s work on eighteenth-century sensibility also focuses on a "sentimental vocabulary," where certain words "acquired emotional coloring" (1951, 9). Finally, Ben Zimmer’s article on how digital humanists are computing the "jargon" of the novel also focus on how words trigger emotion: "Creative writers are clearly drawn to descriptive idioms that allow their characters to register emotional responses through telling bits of physical action" (2011). To examine the words that are used in sentimental chapters, I turned to WordHoard, a "philological tool" designed to precipitate "the close reading and scholarly analysis of deeply tagged texts" (2008) to compare vocabulary for the sentimental and unsentimental chapters by part of speech.

The top twenty adjectives for both the sentimental group of chapters and the unsentimental chapters look fairly similar; in both, "little" was followed by "old," and these are followed (in slightly differing orders) by "good,""only,""other,""like" and "great"—the "usual suspects" list of adjectives. Mostly, the differences in ranking are five positions or less. When the difference in ranking was ten or more positions, I considered it a significant difference. The words that ranked significantly higher in sentimental chapters are shown in Table 1.

Table 1: Adjectives ranked higher in sentimental chapters

Adjective Sentimental Ranking Unsentimental Ranking
Full 21 31
Dead 22 54
Happy 24 58
Wild 25 87
Low 31 42
Strange 32 48
Dark 36 46
Deep 38 62
Quiet 39 52
Kind 40 67
It appears as though sentimental authors attempt to invoke the mysterious in their scenes through their descriptions of things "strange,""dark,""deep,""quiet,""low," and "wild." Also, while we have an impression that sentimental scenes are sorrowful in nature, this list suggests to us that talking about and describing happiness is equally affective.

On the unsentimental side, Table 2 shows the adjectives that are ranked higher have associations with actuality and surety, including "certain,""real," and "whole."

Table 2: Adjectives Ranked Higher in Unsentimental Chapters

Adjective Unsentimental Ranking Sentimental Ranking
Certain 18 41
Small 23 57
Half 24 37
High 25 48
Whole 29 43
Real 30 47
Large 33 52
Best 34 44
Short 37 54
Pretty 39 49
White 40 60
The anomaly here is "small," which ranks fifty-seventh on the sentimental list; diminutiveness feels like it should belong on the sentimental side. "Small," however, shares an association with other words on the list that are quantitative: "half,""whole,""high,""large," and "short." All of these words point back to the general quality of concreteness in unsentimental chapters. Sentimentality, however, dwells more in the realm of the mysterious than in the measurable.

Moving from adjectives to nouns, Table 3 shows how words that have to do with relationships are ranked higher in sentimental chapters.

Table 3: Nouns Ranked Higher in Sentimental Chapters

Noun Sentimental Ranking Unsentimental Ranking
Mother 6 36
Heart 9 23
Child 10 32
Boy 23 34
Home 26 42
God 27 111
Arm 35 56
Love 36 65
Thought 39 60
The first word that stands out is the word "mother," which is thirty positions higher than in the unsentimental ranking of most-frequently used nouns. The word reflects both the idea of motherhood and the personal relationship—both are sentimental. The next significant word, "child," also captures both a specific familial connection and a broader concept. The associations between childhood and sentimentality extend back to the Romantic period, with Wordsworth’s declarations that "trailing clouds of glory do we come from God" (1984, 64) and "Heaven is about us in our infancy" (1984, 66). In the Victorian period, the term "child" carried over Romantic connotations of innocence and piety that transformed the word into an emotive term. Speaking of piety, "God" also made the list; while ranked twenty-seventh on the sentimental side, He came in a rather low one-hundred-eleventh on the unsentimental list. This makes perfect sense, given sentimentality’s association with religious rhetoric. The word "home" is also striking; what makes a house a home is absolutely sentimental in nature. In general, there is a trend in the sentimental list toward words that reflect intimate connections.

In contrast, the list of unsentimental rankings as shown in Table 4 represents an emphasis on more formal relationships.

Table 4: Nouns Ranked Higher in Unsentimental Chapters

Noun Unsentimental Ranking Sentimental Ranking
Lady 5 25
Miss 10 31
Friend 16 28
Gentleman 18 67
Woman 19 30
People 28 41
Morning 30 42
Name 35 46
Course 39 71
Lord 40 101
The frequency in which titles, such as "Lady,""Miss,""Gentleman," and "Lord," appear in the rankings suggests that unsentimental works emphasise hierarchy rather than community. The list also tends to describe general groups of people; instead of "mother" we get the more generalised stem word, "woman," and the even more distant "people." The exception, however, is "friend." While the difference is not huge between the lists (only a twelve-position difference in the rankings), it’s still somewhat perplexing why the word "friend," which we associate with a relationship that bears intense feelings, doesn’t follow the trend.

Finally, the verbs that are ranked higher in the sentimental chapters, shown in Table 5 reflect affect.

Table 5: Verbs Ranked Higher in Sentimental Chapters

Verb Sentimental Ranking Unsentimental Ranking
Cry 27 37
Love 35 80
Lie 39 57
Die 45 86
Bear 48 63
Open 49 67
"Cry" and "love"” top the list of these feeling words, followed by words that I found in a separate study to correlate to deathbed scenes: "lie,""die," and "bear." In contrast, the word "open" doesn’t immediately call to mind associations with particular scenes of sentimentality. The context for the use of this verb in sentimental chapters often involves descriptions where somebody is opening his or her eyes, as in the scene from Elizabeth Gaskell’s Ruth where Ruth "almost hoped the swoon that hung around her might be Death, and in that imagination she opened her eyes to take a last look at her boy" (1999, 3.3). The word also hints at a larger theme of revelation and transformation in sentimental novels, which frequently documents the figurative "opening" of characters’ hearts, eyes, and minds.

The verbs that rank higher in unsentimental chapters, presented in Table 6, include "want,""like,""talk,""mean," and "suppose."

Table 6: Verbs Ranked Higher in Unsentimental Chapters

Verb Unsentimental Ranking Sentimental Ranking
Want 36 47
Like 39 64
Talk 46 63
Mean 47 75
Suppose 49 92
The fact that "suppose" ranks higher in unsentimental chapters suggests that the sentimental doesn’t seek a way out of uncertainties; it relies more on convictions than suppositions. Although the list of the top-rated adjectives in sentimentality suggests a link with the mysterious, this mystery leads to faith rather than doubt. What is striking about the rest of this list is just how common, how prosaic, these verbs are. "Love," for example, is ranked higher on the sentimental list, but the more common and less emotionally-involved "like" is higher on the unsentimental list. Sentimental chapters, it seems, reach for more specialised and specific, not to mention extreme, verbs. Characters in sentimental chapters don’t perform the banal actions of talking or wanting. We imagine instead that they yearn, they utter, they cry, they wish, they desire.

If, indeed, sentimental authors do reach for specialised language to describe characters and their actions, this calls into question the true utility of these lists of the top n words in sentimental chapters and their rankings compared to unsentimental chapters. The more specialised language—the more nuanced words—would simply not appear at the top of the lists. These rarer words, however, are precisely the words that distinguish affective writing. The sentimental can be located in the difference between "she said" and "she murmured." So, while these rankings begin to provide a picture of sentimentality, we must look to different technologies to really define its shape.

Dunning’s Log Likelihood Ratio provides users a way to measure the significance of rare events in texts. Instead of using raw counts of words to compare two sets of texts, the ratio also takes into account the significance of a particular word in one text set (given the total number of words in that set) compared to the significance in another. Essentially, the log-likelihood ratio’s results are derived from the number of times a word actually occurs compared to the number of times it would be expected to occur if the two text sets being compared were homogeneous. Larger log likelihood values reflect larger discrepancies, indicating that a word is statistically significant in the training set. On the flip side, negative log likelihood values indicate that a word is under-utilised in the set. The results, as shown in Tables 7 and 8, bring outliers to the forefront and highlight the importance of many of the high-frequency words previously discussed.

Table 7: Words Over-Represented in Sentimental Chapters

Word Log Likelihood Value
she  1773
mother  377
child  310
heart  243
love  210
tear  158
sorrow  150
come  146
doctor  142
face  134
thou  131
die  127
poor  114
pain  112
love  105
sob  103
can  100
sin  97
bed  95
happy  88
home  88
nurse  87
lie  86

Table 8: Words Under-Represented in Sentimental Chapters

Word Log Likelihood Value
mr.  -728
duke  -206
gentleman -204
lady  -148
bishop  -105
archdeacon -98
hound  -97
brass  -91
admiral  -84
parliament -81
party  -73
man  -71
island  -69
mrs.  -66
beadle  -65
member  -64
certain  -64
miss  -64
dog  -63
lord  -60
fox  -60
peer  -59
play  -58
general  -58
The first two words on the list of over-represented words are ones that would normally be thrown out as stopwords: "she" and "I." Stopwords are words that are so commonly used—articles, pronouns and conjunctions—that they are often filtered out in pre-processing for text analysis in order to improve performance or to bring attention to the rarer words in a text. (For a more detailed discussion, see "Basic Facts about Common and Rare Words" at And, given the emphasis in the above description of how the log likelihood ratio helps uncover the “hidden gems,” the rarer words, used in a corpus, it might seem counter-intuitive to begin with a discussion of these two words as markers of sentimentality. Nonetheless, the fact that "she" and "I" top the lists of over-represented words in sentimental chapters proves significant.

"She" not only tops the list, it dominates it, with a log likelihood value almost five times as large as the next word down. This suggests that sentimentality is, indeed, concerned—even overly concerned—with the feminine, and the statistical measurement can be read as an indicator of just how preoccupied affective writing is with the female gender. Even in the midst of a society where "he" is the standard pronoun, in what Wendy Martyna would later describe as the prevalence of "He-Man" grammar, we find chapters that are overflowing with "she," chapters that insist upon recording female experience. The over-use of "she" is, in fact, so significant that it should make us question the ways we categorise sentimentality as normalising. In fact, sentimentality defies convention by employing the female pronoun so frequently.

Like "she," the second word on the over-represented list, "I," would also be ignored. However, the fact that "I" is used in sentimental chapters even more than is commonly expected in other chapters suggests that sentimentality is unusually concerned with the personal. The sentimental, in the attempt to evoke feeling, must draw in the reader. One way that it does so is through the use of first-person narration. Robyn Warhol, for example, describes how Gaskell "frequently signals a precise identification of her narrative ‘I’ with the actual author and her narrative ‘you’ with the actual reader," in an attempt to "collapse the intra- and the extra-diegetic, to bring together the worlds within and outside the fiction" (2003, 48). And, in fact, the data backs Warhol up: in addition to "I,""you" appears as one of the over-represented words (thirty-eighth on the list). This use of "you" in a direct address to the reader also points to the ties between sentimentality and affect; the texts demand a response from the reader.

Thus stopwords, words that we think we should ignore, sometimes point to characteristics of texts that are highly significant. In fact, because we tend to overlook these words, including them in the analysis can help us recognise aspects of the text that we have internalised, such as the fact that sentimental texts are personal or that emotion is associated with femininity. By highlighting the degree to which "she" and "I" are over-represented, the analysis occasions a second look at how these words function in sentimental texts, so that what was once considered a banal observation is transformed into something much more evocative. Indeed, Martha Nell Smith describes a similar moment in her computational analysis of Emily Dickinson’s letters when she discovered that the word "mine" was a marker of eroticism: "the data mining has made us . . . plumb much more deeply into little four- and five-letter words, the function of which I thought I was already sure, and has also enabled me to expand and deepen some critical connections I've been making for the last 20 years" (Plaisant et al. 2006, 7-8). Smith’s study also serves as a model for how to negotiate between objective data derived from computational analysis and the subjective criticism and interpretation of a scholar.

Moreover, stopwords prove useful in stylometrics. In fact, John Burrows, in his work on authorship attribution, "concentrates exclusively on the form of computational stylistics in which all the most common words (whatever they may be) of a large set of texts are subjected to appropriate kinds of statistical analysis" (2004). He argues that these frequently-used words "constitute the underlying fabric of a text, a barely visible web that gives shape to whatever is being said" (2004). The words that we tend to ignore, it seems, often contribute most to the "style" of a text. Likewise, in a study of gender categorisation that employs techniques from both the stylometrics and text categorisation communities, Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni reported that function words contribute most to categorisation (2002, 407). They identified the frequent use of the stopwords "as,""the," and "a" as characteristic features of fictional texts written by male authors, while the words "she,""for,""with," and "not" were identified as female features for categorisation. Thus, the over-abundance of "she" in sentimental texts perhaps speaks not only to a focus on women characters and stories, but also may point to a characteristic of writing style that is identified as feminine. As I will discuss below, machine classification suggests that sentimental marker words are also used more often by women authors, who in turn are more likely to write chapters that are classified as sentimental.

To return to the results from the Dunning’s log likelihood analysis, we find that, moving past the stopwords, many of the words that are over-represented in sentimentality are precisely the words that top the lists of most-frequently used words in sentimentality altogether. "Mother," for example, which was ranked sixth on the list of nouns most frequently used in sentimental chapters, tops the list of over-represented words once stopwords are removed. Likewise with words such as "child,""heart,""love," and "cry." These words appear so often that they are over-represented in sentimental chapters.

Some words, however, are not necessarily used frequently in sentimental chapters, but they are nonetheless over-represented when compared to unsentimental chapters. These are the rarer words—the words that are more unique markers of sentimentality. "Sorrow," for example, is only the 100th-most frequently used noun in sentimental chapters, yet it has a high log likelihood ranking of 150. "Sorrow" is a more specialised marker of emotion; it goes beyond common sadness and carries connotations of loss that makes it particularly suited for use in sentimental chapters, where the narrative often hinges on the tension between the lost and the found. The word "thou," likewise, stands out as an instance of more specialised language. With a log likelihood ratio of 131, it occurred twice as often in the sentimental chapters as the unsentimental chapters. "Thou" stands as a marker of the elevated style, and of the intimacy, employed in sentimentality that echoes the elevated emotional impact of the passage. "Pain" also stands as a marker for sentimentality. It earned a relatively high log likelihood ratio of 112, but was placed just before the word "sorrow" as the ninety-ninth-most-frequently used noun in sentimental chapters. Yet "pain" is used 265 times in the sentimental chapters and only sixty-nine times in unsentimental chapters. The word "pain" evokes associations between emotion and corporeality. In using the word, authors foreground the material response to sentimentality by insisting that emotions are felt.

Figure 1 displays the full results of the Dunning’s analysis in a wordcloud that renders graphically the words that are most salient to sentimentality—those over-represented. The visualisation shows the Dunning results of comparing the training set of sentimental chapters to the testbed that represents the benchmark of average novelistic discourse. It includes all the stop words and demonstrates exactly how strikingly over-represented the word "she" is, as it dominates the visual field.

Figure 1: Visualisation of Words Over-Represented in Sentimental Texts (Including Stopwords)

Visualisation of Words Over-Represented in Sentimental Texts (Including Stopwords)
In Figure 2, which also represents a comparison of the training set to the testbed, I removed the stopwords. In the visualisations, we can see that words involving relationships: (mother, child, father) and emotion words (love, heart, sob, tear) really stand out.  There are some interesting tensions in the visualisation: between pain and comfort, life and death, happiness and agony, hope and despair.  The sentimental, we see, is played out in extremes. The visualisation also features several pairs of synonymous words or words with alternate spellings, especially the familial words: "baby" and "babby,""papa" and "father,""mother" and "mama." Here we see a tension between the desire for the formal, elevated style that befits the "high" drama of a sentimental moment, and the inclination to want to capture intimacy through less formal and more colloquial language.

Figure 2: Visualisation of Words Over-Represented in Sentimental Texts

Visualisation of Words Over-Represented in Sentimental Texts

One of the most powerful tools that the Dunning’s algorithm offers is the ability to see the words that authors are not using in the sentimental chapters. To this end, I input the words with negative log likelihood values into Wordle visualisations calibrated to the words’ lack of frequency to capture an image of what sentimentality does not look like. Figure 3 includes stopwords and shows the words that were statistically used more often in the testbed than in the training set of sentimental chapters. Figure 4 eliminates stopwords and also shows the words that are under-represented in the sentimental training set. The visualisations echo some of the trends we saw in the lexicons; again we see that titles are not used as often in sentimental chapters, which prefer more intimate relationships. "Mr.," in fact, dominates the visualisation, just as “she” dominated in the over-representation. Tellingly, the most prominent titles are masculine: "Mr.,""gentleman,""duke,""admiral,""sir" and "archdeacon," although "lady,""Miss," and "Mrs." also appear.  Words having to do with hunting also are prominent: "dog,""hound,""horse," and "fox."  Unsurprisingly, words having to do with business and politics are under-represented in sentimental chapters, and these are sprinkled across the two visualisations.

Figure 3: Visualisation of Words Under-Represented in Sentimental Texts (Including Stopwords)

Visualisation of Words Under-Represented in Sentimental Texts (Including Stopwords)

Figure 4: Visualisation of Words Under-Represented in Sentimental Texts

Visualisation of Words Under-Represented in Sentimental Texts

If we were to compare the word clouds showing the over- and under-represented words in sentimentality, it would be easy to distinguish which one might represent the domestic sphere, indicated by words related to nursing and the home, and which might represent the masculine world filled with clubs, parties, hunting, cigars, parliament, boots, peers, and positions. While some of the titles in the under-represented wordcloud, such as "archdeacon,""bishop,""curate," and "beadle" might at first suggest that religion lacks a place in sentimental texts, these under-represented words have more to do with the external organisation, as opposed to the internal experience, of religion. In contrast, in the words that are over-represented in sentimentality, the language carries more spiritual connotations: "prayer,""heaven," and "soul." Altogether, the words that are not used as often, the "negatives," serve as a sort of shadow to the "positives," giving dimension to the themes and patterns that stood out in the visualisation of the words that are salient to sentimentality.

Indeed, the wordcloud visualisations provide something a chart of words only hints at: "pictures" of the texts. The modified Wordles are strikingly visual representations of "distant reading," and the graphics exemplify the shift from analysis to synthesis that Moretti espouses: Marc Bloch’s "years of analysis for a day of synthesis" (2000, 56-57). As Martin Mueller describes it, "you look at a work, the oeuvre of an author, or an entire canon in a wider context and you see (quite literally) what you can learn from very reductive models. Or if ‘reductive model’ sounds like a bad thing, think of ‘abstract,’ except that these word clouds—like other clouds only visible from a distance—are oddly concrete abstracts" (2008). The word cloud takes the results from the analysis—all the data, the counts of words and the results of the algorithms—and simplifies this data so that the "story" of sentimentality can be read at a glance.

Machine-reading sentimentality

Understanding the low-level characteristics of the sentimental chapters through lexicons, visualisations, and the use of Dunning’s algorithm provides a base upon which to broaden the scope to consider some of the larger patterns and trends of sentimentality. Based on my training set of sentimental and unsentimental chapters, I used the MONK Project’s application for employing the Naïve Bayes classification algorithm (for more on Naïve Bayes, see, to predict through supervised learning which of the remaining chapters in the testbed could be labelled sentimental and which could be labelled unsentimental.  This can be understood as a "more like these" approach, where a user educates the algorithm by means of a training set and asks the system to classify additional texts based upon the common features of the user-classified texts. The result is simply a list of all the chapters from the testbed with a classification of "sentimental" or "unsentimental." Of the 3,512 chapters that comprised the testbed set (the total 3921 chapters, minus 409 chapters used for training), 943 chapters were classified "sentimental" by the system.

Some of the results of this classification experiment fit my expectations. For example, every single chapter from Dickens’ A Christmas Carol was ranked sentimental, which came as no surprise given the fact that the entire story revolves around a sentimental framework detailing the evolution of feeling in Scrooge. Dickens’ works as a whole were, unsurprisingly, well-represented in the sentimental chapters. Mrs. Henry Wood’s extremely sentimental novel East Lynne was also considerably over-represented in the sentimental chapters, with forty-one chapters classified as sentimental by the algorithm compared to the fourteen that were ranked unsentimental. On the other hand, R.M. Ballantyne’s rollicking adventure tale, The Coral Island, is completely unrepresented in the sentimental list of chapters, which seems natural given that the novel’s intended audience consisted primarily of young boys.

Conversely, a couple of surprises emerged from the classification. First, the system only classified two of the thirty-four chapters in Emily Brontë’s Wuthering Heights as sentimental. Emily was, by far, the least represented of the Brontë sisters in the sentimental lists. That, in and of itself, is not so surprising given the different styles of the sisters, but the consistency with which Wuthering Heights’ chapters were classified as unsentimental at first took me aback. Upon reconsideration, however, the fact that the features of Emily Brontë’s novel associate it with unsentimental rather than sentimental texts makes sense considering the ways that sentimentality is particularly associated with feminine writing. The classification seems to corroborate criticism regarding Emily Brontë’s masculine style—that she is, as Anne Mellor claims, "a literary cross-dresser" (1993, 186). Yet, lest this assertion appear too simplistic, let me be clear that I think there is a danger in associating distant reading with a type of surface-reading. It is critically irresponsible to use the results of distant reading to assign seemingly permanent labels to texts or to affirm that there are incontrovertible answers to literary questions. As Stephen Ramsay insists, the presence of a computer doesn’t provide any "truth value beyond what it already stipulated by any critical act" (2011, 80). Thus I do not propose that the results of the classification show that Mellor is “right” in her assertion, but I do think that they echo one critical perspective about how we can conceptualise Emily Brontë’s unique style. Thus, what at first seemed an unexpected result prompted me to reconsider a critical position with which I was already familiar.

The experiment also led to discoveries that were completely new. For example, I was very surprised to find how often chapters from Elizabeth Rundle Charles’ The Chronicles of the Schönberg-Cotta Family were classified as sentimental. Only two of the thirty-three chapters from the novel were classified as unsentimental (in this aspect, it is almost the foil for Wuthering Heights). Charles’ novel is one with which I had not been familiar, but the results from the experiment highlighted it as a work that warrants a second look. Since there are, as Franco Moretti reminds us, so many thousands of nineteenth-century British novels out there that no one could possibly read them all (2000, 55), having an algorithm to serve as a filter is particularly useful. Beyond the utility of simply pointing me toward particular instances of sentimentality that I had not discovered, the machine classification also speaks to the prevalence of sentimentality in the period.  When I first got the results, before I even started filtering through them, I was astonished by the sheer numbers.  The 943 sentimental chapters comprise twenty-seven percent of the total testbed. Even if you only consider the 599 classifications that received perfectly sentimental scores from the algorithm, the chapters that the computer didn’t consider “grey” in terms of classification, the sentimental chapters comprised fifteen percent of the testbed. This seemed high to me at first, until I started going through the results and realised that some novels, including Mrs. Henry Woods’ East Lynne, George Eliot’s Romola, Gaskell’s North and South, Charlotte Yonge’s The Daisy Chain, and the aforementioned Chronicles of the Schönberg-Cotta Family, each had over thirty chapters classified as sentimental. 

The results from the classification also provided a means by which to test my hypothesis that sentimental texts work to build emotional response by dispersing sentimental chapters among unsentimental chapters to produce a wave-like effect: a sort of seismograph of emotion. The dispersal of affect across a novel bears similarities to, and seems to be inspired by, the development of the cliffhanger in serial novels. The serialisation of novels reached an unprecedented popularity beginning with Dickens’ Pickwick Papers in 1836, just prior to the first year, 1838, included in the testbed for this study. The serial novel is associated with a heightening of tension at the end of one installment and the resolution (or partial resolution) of the conflict at the beginning of the next installment. While the term "cliffhanger" actually originates from a Thomas Hardy novel published in the late nineteenth century (in the 1873 A Pair of Blue Eyes, Hardy’s protagonist is literally left dangling from a cliff), skilled serial writers of the mid-century, especially Dickens, certainly manipulated the emotional involvement of the reader by means of narrative suspense and heightened emotional content at the end of an episode (for discussion of Dickens and serialisation, see Ellen Casey [1981] and Cynthia Whissell [2006]). Accordingly, we would expect to see that chapters of high sentimentality alternate with periods characterised by their lack of engagement with affect. We would also expect to see more sentimental chapters toward the beginnings and endings of novels to reflect both the initial affective "hook" and the final sentimental resolution.

In fact, the machine classifications of several serialised novels from the period seem to confirm this general pattern. Because Dickens’ "formula" is such a natural focus of critical work on Victorian serialisation, I began by examining the distribution of sentimentality in his works. David Copperfield contains sixty-four chapters published in twenty installments. The first three chapters, which were published together as the first installment of the novel in May 1849, each were classified as sentimental. In these chapters, Dickens introduces his characters and cultivates the readers’ emotional investment. Sentimental chapters are then sprinkled throughout the novel, with about one third (twenty-one) of the total number of chapters carrying a sentimental classification. In terms of the distribution across the novel, nine of the sentimental chapters appear in the first half of the novel, and the remaining thirteen are located in the latter half. Another of Dickens’ serialised novels, Dombey and Son, proves even more sentimental, in terms of the number of chapters classified as such, than David Copperfield. A full half of the novel’s sixty-two chapters bear the sentimental classification, and these tended to cluster more heavily toward the end of the novel. Only eleven of the thirty-two sentimental chapters can be found in the first half of the novel. In contrast, seven sentimental chapters turned up in the final four installments (the last ten chapters) of the novel. In Oliver Twist, at least, the sentimental moments become more numerous toward the end of the novel. Dickens built upon the readers’ empathy and investment in the characters, heightening his use of pathos as the novel reached its conclusion.

The general trend of “bookending” a work with sentimental chapters while distributing other moments of affect throughout the novel occurs in the works of other serial novelists of the period as well. In North and South, for example, Elizabeth Gaskell begins with a strong emotional hook; the first six chapters are sentimental. In William Makepeace Thackeray’s Vanity Fair, a novel that contains a scant six sentimental chapters, these cluster heavily toward the end of the novel. In Wilkie Collins’ No Name, the only paired set of sentimental chapters can be found in the final installment of the novel. In all of these works, the sentimental chapters do tend to be distributed across the works rather than grouped together in a sentimental section, but they build in frequency toward the end of the novels. Thus, the “seismograph of emotion” shows a steady increase in the use of sentimental moments as the novel proceeds and the plot builds.

While the MONK Project enabled me to consider how sentimentality was distributed across a text, the results also led me to consider the distribution of sentimentality as a whole across the testbed. In the testbed of 3,512 chapters, 868 had female authors. Therefore, just over seventy-five percent of the chapters from the testbed were written by male authors. Yet, of the 943 machine-classified sentimental chapters, 507 were written by women. Even with the difference in representation in the testbed, chapters written by women authors comprise around fifty-four percent of the total chapters that the machine classified as sentimental. These results take us back to Koppel, Argamon, and Shimoni’s study of authors and gender attribution where marker words, like “she,” are used more often by women authors. The classification study suggests that women authors do employ sentimental markers (“she” included) more often. The sentimental style is linked not only to the recording of female experience, but to works written by women authors.

Because the machine-classification itself uses vocabulary as the feature by which it determines classification, it is useful to turn again to consider sentimental words. This time, however, the significant words derive from the computer’s rules rather than from readerly heuristics. Based on the training set, the computer learns which words are indicators of sentimentality.  It then creates a learning routine, a decision tree, to determine whether a new chapter should be classified as sentimental (see Figures 5 and 6).

Figure 5: Decision Tree

Decision Tree

Figure 6: Decision Tree (Cont.)

Decision Tree (Cont.)

The first thing the learning routine looks for is the word “sob.”  If a chapter has the word “sob,” it is likely to be sentimental.  If it doesn’t have “sob,” it then looks for “soothe,” and if it’s there, it’s likely to be sentimental.  If it doesn’t have “soothe,” but it has the word “yearning,” it is likely to be sentimental, and so on.  When we get to “droop,” there’s a split in the tree.  If the chapter has the word “droop” AND the word “ardent,” it’s likely to be sentimental.  If it doesn’t have the word “droop,” the system looks for the word “affliction” and then continues on from there, on down the leaves and nodes of the tree.  

The tree thus provides a path through the words that authors use in sentimental scenes.  The words on the decision tree, including “fever,” “emotion,” “droop,” “affliction,” “steadfast,” “happiness,” “embitter,” “service,” “Jesus,” “confession,” and “together,” are, in fact, words that often evoke or reflect emotion.  These “highly charged” words are ones we should particularly pay attention to as markers of sentimentality.

Textual scholarship by the numbers

Ultimately, general trends and patterns of sentimentality do rise to the surface in the telescopic view of sentimentality. For example, textual analysis highlighted how sentimentality emphasises embodiment and human interaction and how sentimentality is rooted in and concerned with the feminine sphere. Yet in my overall attempt to formalise it, I am not trying to suggest that sentimentality can be reduced to some write-by-numbers exercise or that the “question” of sentimentality is answerable through classification algorithms. Rather, I would like to end by suggesting that computational text analysis is less a methodology than it is a form of criticism that can be used to inspire interpretations based on the new perspective that digitally-enhanced reading provides.

There may not be a write-by-numbers formula for sentimentality, but using numbers as a means of thinking about sentimentality can nonetheless result in new critical and hermeneutical questions. We need to think less in terms of the incontrovertible results of statistical analyses and more in terms of the ways that analytical routines afford critics a place to commence, extend, and refocus their readings. As Stephen Ramsay suggests, we need to "create tools . . . that enable critical engagement, interpretation, conversation and contemplation" (2011, x). Truly provocative text mining projects are less concerned with proving that two plus two indeed equals four than they are with highlighting the textual moments where the numbers simply don’t add up.

This is not to say that there is no inherent value in confirmation and validation. Statistical analysis can, in fact, "confirm" something we already know about a text, such as the "fact" that sentimental texts use words that describe emotion and tears frequently. The moments where the numbers re-enforce our understanding of the ways sentimental texts work assist us in painting a more complete picture of sentimentality as a whole. Furthermore, by transforming textual information into numeric data, we gain the ability to quantify extant trends and patterns in the texts. For example, we undoubtedly understood that gender plays a role in sentimentality even before the texts were manipulated in any way. Yet it is one thing to speculate and argue that sentimentality is rooted in the feminine, and it is another to point to a number that highlights the extent to which the pronoun “she” is over-represented in the sentimental corpus.

Or is it? Stephen Ramsay asks a similar question regarding employing machines to detect a pattern, and concludes: "it is the same thing at a different scale and with expanded powers of observation" (2011, 17). There is an inherent danger in privileging the quantifiable over the interpretive. When I have given talks that present my research, I’ve noticed that the audience tends to squirm a bit when I get to the slides that contain numbers. Lev Manovich has also noted this "nervous" tendency when he lectures on large-scale research on cultural artifacts (2012, 467-468). I don’t think this discomfort is rooted in some sort of innate and collective arithmophobia among humanities scholars; rather it derives from a tradition that equates numbers with the quest for proof. Proof, after all, is antithetical to humanistic endeavour. As soon as I switch to the slides of the manipulated word clouds, the comfort level in the room rises once again. Even though the word clouds present virtually the same information as the charts containing log-likelihood values, we somehow find it easier to view them as entities open to interpretation—as artistic representations. But no matter the format the data is presented in, the data doesn’t denote the end of analysis. As Manovich posits, "Ideally, we want to combine the human ability to understand and interpret—which computers can’t completely match yet—and the computer’s ability to analyse massive data sets using algorithms we create" (2012, 469). My project both begins and ends with human interpretation: from creating the training sets and testbeds to interpreting the results of the machine classifications. It represents an attempt to negotiate the relationship between close reading and distant reading, between algorithm and interpretation, and between objective data and affective response.

Works Cited

Bown, Nicola. 2007. “Introduction: Crying Over Little Nell.” Interdisciplinary Studies in the Long Nineteenth Century 4.

Burrows, John. 2004. "Textual Analysis." A Companion to Digital Humanities. Ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell.

Casey, Ellen. 1981. "‘That Specially Trying Mode of Publication’: Dickens as Editor of the Weekly Serial." Victorian Periodicals Review. 14.3: 93-101.

Dames, Nicholas. 2004. "Wave-Theories and Affective Physiologies: The Cognitive Strain in Victorian Novel Theories." Victorian Studies. 46.2: 206-216.

Erämetsä, Erik. 1951. "A Study of the Word ‘Sentimental’ and of other Linguistic Characteristics of Eighteenth Century Sentimentalism in England." Suomalaisen Tiedeakatemian Toimituksia: Annales Academiæ Scientiarum Fennicæ 74, Series B. Helsinski: 1-168.

Gaskell, Elizabeth. 1999. Ruth. Chapman and Hall: London, 1853. Chadwyck-Healey Ltd.

Keen, Suzanne. 2007. Empathy and the Novel. Oxford: Oxford UP.

Koppel, Moshe, Shlomo Argamon, and Anat Rachel Shimoni. 2002. "Automatically Categorizing Written Texts by Author Gender." Literary and Linguistic Computing. 17.4: 401-412.

Lubbock, Percy. 1921. The Craft of Fiction. London: Jonathan Cape. Project Gutenberg. E-text by David Clarke and Sankar Viswanathan.

Manovich, Lev. 2012. "Trending: The Promises and the Challenges of Big Social Data." Debates in the Digital Humanities. Ed. Matthew K. Gold. Minneapolis: University of Minnesota Press.

Martyna, Wendy. 1978. "What Does ‘He’ Mean? Use of the Generic Masculine." Journal of Communication: 131-8.

Mason, Emma. 2007. "Feeling Dickensian Feeling." Interdisciplinary Studies in the Long Nineteenth Century. 4.

Mellor, Anne. 1993. Romanticism and Gender. New York: Routledge.

MONK Project. 2007. Monk Project.

Moretti, Franco. 2000. "Conjectures on World Literature." New Left Review. 1 (Jan-Feb): 54-68.

Mueller, Martin. 2008. "Wordle and Dunning again." E-mail to MONK listserv. 28 July.

Plaisant, Catherine, James Rose, Loretta Auvil, Bei Yu, Matt Kirschenbaum, Martha Nell Smith, Tanya Clement, Greg Lord. 2006. "Exploring Erotics in Emily Dickinson's Correspondence with Text Mining and Visual Interfaces." Joint Conference on Digital Libraries 2006, Chapel Hill, NC, June.

Pugmire, David. 2005. Sound Sentiments: Integrity in the Emotions. Oxford: Clarendon Press.

Ramsay, Stephen. 2011. Reading Machines: Toward an Algorithmic Criticism. Urbana: University of Illinois Press.

Solomon, Robert. 2004. In Defense of Sentimentality. Oxford: Oxford UP.

Todd, Janet. 1986. Sensibility: An Introduction. London: Methuen & Co.

Unsworth, John. 2008. "How Not to Read a Million Books." With Tanya Clement, Sara Steger and Kirsten Uszkalo.

Walsh, Richard. 1997. "Why We Wept for Little Nell: Character and Emotional Involvement." Narrative. 5.3: 306-321.

Warhol, Robyn. 2003. Having a Good Cry: Effeminate Feelings and Pop Culture Forms.  Columbus, OH: Ohio State UP. 

Whissell, Cynthia. 2006. "Serial Publication and the Emotional Associations of Words in Dickens." "David Copperfield." Psychological Reports. 99.3: 751-761.

WordHoard. 2008. Northwestern University.

Wordle. 2011. Jonathan Feinberg, dev.

Wordsworth, William. 1984. "Ode (‘There Was a Time’)." William Wordsworth: The Major Works. Ed. Stephen Gill. Oxford: Oxford UP.

Zimmer, Ben. 2011. "The Jargon of the Novel, Computed." The New York Times. 29 July.

Valid XHTML 1.0!



Sara Steger (University of Georgia)





Creative Commons Attribution 4.0


File Checksums (MD5)

  • HTML: 90072e0e9707451b65236b6d4298e627