Predicting Author Gender Using Machine Learning Algorithms: Looking Beyond the Binary

This paper explores the relationship between digital humanities studies that utilize computer algorithms to identify author gender and feminist and queer literary theory. I argue that utilizing computer algorithms to sort literature into the categories “authored by a male” or “authored by a female” is too reductive in its treatment of gender as binary. However, I suggest computer algorithms could be utilized to explore the performative aspects of author gender and to ask larger questions about algorithmic criticism, the author as a subject, and the relationship between morphological and cultural properties of texts. Resume Cet article explore la relation entre les etudes des humanites numeriques, qui se servent d’algorithmes informatises afin d’identifier le genre d’un auteur, et la theorie feministe et homosexuelle. Je soutiens que l’usage d’algorithmes informatises pour categoriser la litterature comme « ecrit par un homme » ou « ecrit par une femme » est trop reducteur par rapport a son traitement binaire de genre. Cependant, je suggere que les algorithmes informatises peuvent etre employes dans le but d’explorer les aspects performatifs du genre de l’auteur, ainsi que dans le but de soulever de plus grandes questions sur la critique d’algorithme, sur l’auteur en tant que sujet et sur le lien entre les caracteristiques de textes morphologiques et culturels. Mots-cles: algorithmes d’apprentissage automatique; theorie feministe; theorie queer; genre d’auteur; expression de genre; critique algorithmique


Introduction
In the ongoing discussion of the role of algorithms in creating or confirming bias, digital humanities projects that rely on algorithms should naturally also come under consideration. This article examines studies that use machine learning algorithms to identify the gender of the author of fictional texts. While such studies provide an illuminating look at language usage, the binary approach of dividing texts by author gender not only confirms gender stereotypes but reinforces them by supporting the existence of a distinct "women's writing" style. The underlying methodology of many of these studies is part of a larger tradition of assuming writing by men as the default mode, thus, characterizing writing by women as automatically deviant. Instead of describing writing by women, these studies end up characterizing the category "women" through writing. This definition is reductive and problematic.
This article is not meant to be an exhaustive examination of machine learning algorithms and author gender studies. Nor is it fair to conflate all efforts to identify elements of female writing as utilizing the same methodology. Rather, the aim is to question the utility of continuing to perform tests of authorship based on gender using computer algorithms, and to suggest that such tests performed in isolation of theory serve to reinforce outdated understandings of gender in literature. The majority of these projects ask questions of gender through computer algorithms that largely ignore advances made in feminist and queer literary studies. The usage of algorithms to identify author gender assumes a binary approach that rests upon an outdated understanding of gender as either male or female. Furthermore, this methodology reinforces what Argamon et al. call "a least common denominator approach" that all too easily creates stereotypical lists of characteristics that define writing written by a woman (Argamon et al. 2009, 4).
Naturally, an algorithmic examination of literature written by men and women must operate within the realm of binaries. As Ben Verhoeven and Walter Daelemans (2018) recognize, "[u]sing non-binary gender is currently unfeasible for NLP [Natural Language Processing] research due to lack of data" (11). Argamon et al. (2009) confirm that a computer will find differences between texts because that is what computer algorithms are designed to do. However, I challenge the assumption that Land: Predicting Author Gender Using Machine Learning Algorithms Art. 8, page 3 of 12 this binary approach is beneficial. More importantly I question the interpretation of the results computer algorithm approaches produce. It is too reductive to argue that a list of terms used more frequently by women than by men proves that women have a unique writing style. Instead, further questioning of the results of algorithmic criticism informed by queer and feminist literary theory is needed to examine how language usage is exemplary (or not) of the performative nature of gender. I refer here to Judith Butler's definition of gender performativity as the result of "sustained social performances" within the context of compulsory heterosexuality (Jagger 2008, 27).
Stephen Ramsay (2011) discusses algorithmic criticism and the challenges of utilizing the scientific method in tandem with digital humanities projects. The author's work provides crucial clarification for this argument. Ramsay argues that algorithmic criticism, "criticism derived from algorithmic manipulation of text," exists only in its nascent form (2011,2). For Ramsay, the role of text analysis is not merely to fact-check assumptions but rather to "assist the critic in the unfolding of interpretative possibilities" (9). Projects that utilize machine learning algorithms to identify author gender use the scientific method to present a problem (was a text written by a man or a woman?), identify a hypothesis, test results, present analysis, and form conclusions.
The problem with this methodology is that it operates under the fallacy that there is, as Ramsay identifies, one answer to the problem (18). Texts are either written by men or written by women and once that answer has been determined, there is nowhere to go with the analysis. Worse still, the analysis often reinforces understandings of women's writing that confirm long-held assumptions of how women view and write about the world. Understandably, researchers are eager to identify the computational power of machine learning algorithms and find satisfaction in predicting author gender with high degrees of accuracy. However, deeper questions should be asked about the relationship between an author and the text they produce, as well as how an author enacts gender. In so doing, digital humanists can expand their analysis to produce multiple answers that lead to further inquiry.

Assumptions of algorithm author gender studies
By utilizing algorithms to identify whether or not a piece of writing is written by a man, or a woman several assumptions are made. Such an analysis is predicated on the presumed existence of an identifiable genre called women's literature. Efforts to identify elements of women's literature using digital humanities tools in some ways mirror the methods early literary scholarship used to discuss writing by women. For example, George Henry Lewes (1852)  In using computer algorithms to isolate elements of a feminine style, scholars accept a binary understanding of gender and support the idea that women write differently than men. These assumptions mirror earlier attempts to identify women's literature. However, the critical difference between outdated views of women's literature and contemporary scholarship that relies on machine learning algorithms is that the latter claims to provide quantitative evidence of the existence of a women's literary style that, consciously or not, adheres to certain poetics. Such studies equate women's literature as a genre with work written by someone who identifies themselves (or is identified by scholars) as a woman in a binary system. Indeed, most of the studies refer equally to "women" and "female" writers with little explanation of how they are using these terms.
Lesbian and queer feminists have problematized this binary understanding of gender. In her well-known essay "Gender Trouble," Judith Butler (2018, 2490)

Land: Predicting Author Gender Using Machine Learning Algorithms
Art. 8, page 5 of 12 claims gender does not have a "natural" (biological) origin but is rather the effect of "institutions, practices and discourses with multiple and diffuse points of origin" on society. Butler was influenced by Monique Wittig's definition of "woman" and "man" as political and economic categories and her assertion that there is no natural "women" group. While Butler and Wittig disagreed about the materiality of gender and the ways in which heterosexual oppression should be opposed, both discussed gender performance (Jagger 2008). Butler understood gender as an "enforced cultural performance" that is itself performative (Ibid, 21). However, Butler viewed the performance of gender as a result of "the materiality of signs and signification" within a poststructuralist model that challenges the humanist understanding of the subject as a rational being capable of escaping both nature and culture to make objective statements (Ibid,30). In other words, Butler argued against the idea that gender is performed in a theatrical sense and claimed rather that this performance be understood as a "speech act model based on a poststructuralist understanding of subjectivity" (Ibid, 21). The question of author agency and the ability of an individual author to create work independently of the material conditions under which they write is one area of inquiry that deserves attention in connection with algorithmic criticism of author gender. According to Butler, gender performance is not a conscious choice individual writers make but rather the result of a heteronormative, phallogocentric world order. Questions examining the extent to which individual authors choose to write "like a woman" and how gender is consciously or unconsciously performed could successfully be paired with machine learning algorithms to develop questions of author agency.
Indeed, one of the largest assumptions made in computer algorithm studies of author gender is that there is a direct relationship between the author and the text.
This understanding of the author goes against poststructuralist understandings of the author as dead. As Nancy Miller (1988,104) recognizes, if the author is dead, it does not matter who writes. Computer algorithms that identify author gender reassert the importance of the author. Such algorithms could be used to explore the relationship between the author and the text further by comparing the differences between texts written by female authors under male pseudonyms and works written Land: Predicting Author Gender Using Machine Learning Algorithms Art. 8, page 6 of 12 by those same authors under female names as, for example, J.K. Rowling and James Galbraith and James T. Tiptree and Alice Sheldon.

Interpretative schemata of author gender studies
Algorithmic studies of literature use digital tools to identify elements of an "écriture féminine" in literary works. Scholars utilize machine learning algorithms to process a large number of texts written by men and women in the attempt to identify word usage (content analysis) and syntax patterns (linguistic analysis) that distinguish "women's literature" from "men's literature," or simply literature. Such an approach assumes men's writing as the norm against which women's writing deviates.
As Sara Mills argues in her work on the gendered sentence, these definitions of women's writing keep us "trapped within the notion of women's language being deviant, powerless and submissive, and male language being normal" (1995,36).
The digital humanities studies considered for this article often present their results within this paradigm. Mills identifies the importance of "interpretative schemata" in understanding the underlying biases present in studies of author gender in writing (1995,39). That is to say, the interpretative elements chosen to identify women's literature are arguably more important than the simple assertion that women write differently than men. Critically, the majority of digital humanities studies favour content analysis over linguistic analysis. As such, they point to the topics women write about as indicative of a "women's literature" rather than identifying a feminine style through linguistic features.
For example, Argamon et al. (2009) claimed a machine learning algorithm could predict whether a given text was written by a man or a woman with 90% accuracy.
Using content analysis, they identify, for example, that "male authors…use religious terminology rooted in the church, while female authors use secular language to discuss spirituality" (Argamon et al. 2009, 1). Burrows (2003) examined a corpus of English-language prose fiction and found that connector words, auxiliary verbs and the subjunctive could be used to identify female writing. However, Burrows also makes significant use of content analysis, claiming that "[f]emale authors, it seems, had more to say of females: both male and female authors had much to say of males" (2003,344 texts from the 16 th through the 20 th centuries. Pronouns, possessives and content words that express emotional states marked the works Olsen examined as written by a woman. Olsen explains the higher percentage of first person singular and second person plural pronouns among women writers as indicative of "more personalized and interactional" texts than those written by men (2005,154). More recently, Sean Weidman and James O'Sullivan (2018) used a corpus of English-language fiction texts separated into Victorian, Modernist and Contemporary periods to perform statistical analysis. The authors identify a unique female style that becomes more pronounced over time and argue that each era of "women's literature" has its own distinctive features that distinguish it from previous and subsequent periods. In particular, they find that [f]emales generally tend toward the language of place in the private, or micro-sense-'home', 'kitchen', ' church', 'hallway', and 'school'-whereas males throughout all periods tend toward greater spaces-' country ', ' earth', ' city', 'town', and 'world' (2018, 384). The above studies all rely on content words to argue that women write differently than men, which is problematic as it creates an assumption that all women write about similar topics and ignores the historical and social realities that to identify that discourse aspects of text (connector words such as but, moreover, and so) contribute to the prediction of the author's gender.
Similarly, Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni (2002) analyzed a corpus of written and spoken English ranging from the 12 th century to the present and found that connective words such as for, with, not and in are characteristic of female writing. However, such analysis still relies on an interpretative scheme that equates writing written by women with women's writing as a universal group.

Performed Gender and Machine Learning Algorithms
In her work on femininity in early Canadian fiction, Misao Dean (1998) discusses the way in which female authors enact femininity as a practice in order to make themselves recognized within a larger male literary tradition. Dean asserts that efforts to equate femininity in literature with some kind of "feminine essence" or universal gendered self that "chooses to act as a member of one or other gender" is false (1998,7). Rather than positioning women's writing as either a conscious feminist act to reject the female voice or an essentialist essence that informs every woman's writing, Dean cites Judith Butler's understanding of femininity as "a norm which [women] occupy, reverse, [and] resignify to the extent that the norm fails to define us completely" (1998,8). This occupation of femininity in writing must be considered when discussing the ways in which women have historically used language.
Dean's understanding of women's enactment of femininity provides a helpful point of comparison for analyzing author gender studies that rely on algorithms.
These studies either tacitly or unintentionally argue that women's writing as a category exists because there are both content and linguistic elements common to writing written by women. Dean's and Butler's arguments can be reconciled with the results of author gender analysis. Dean (1998, 9) argues that "[g]ender cannot… be escaped or thrown off" because it is within the ideological system of gender that literature written by women is created. Stated differently, women may write differently not due to some essentialist "essence" or shared female experiences but rather because women have had to claim femininity in order to be recognized as individual voices. Author gender studies that find women write differently than men could be combined with Butler's understanding of the performativity of gender to examine the role of the author in creating works of fiction.
Weidman and O'Sullivan (2018) begin to unpack some of these issues in their recent article on word usage in author gender attribution. In examining works of fiction from three different historical periods they find that "stereotypical stylistic differences between men and women fiction writers" exist across all three periods examined (2018,383). Their analysis showed that works written by women clustered with other works written by women (and vice versa for male authors) (2018, 379).
They highlight female author's tendency to be more object-oriented than males, referring to positioning of the body with words such as hair, fingers, skin, eyes, heart, face, cheeks, dress, and gown as opposed to male authors' use of directional language Land: Predicting Author Gender Using Machine Learning Algorithms Art. 8, page 9 of 12 such as east, west, south, and north (2018, 383). As explored above, this kind of content analysis reinforces stereotypical views of what women can and should write about.
Weidman and O'Sullivan recognize, however, that relying on content words removes their context and that "just because we see that men and women tend to use words that indicate a particular topic is no barometer of how they are treating a said topic" (2018, 384). Furthermore, they recognize the potential that contemporary female authors make use of recognized feminine tropes with the intention of tearing down those tropes and "rehabilitating their relevance" (2018,385). For example, the authors cite the opening scene of Margaret Atwood's 2009 novel Year of the Flood as presenting an "ironic not-so-picturesque rooftop sunrise" that contains content words associated with female authors such as window, crying, birds, flower, garden, gate and balcony (2018,385).
In her work on irony and femininity, Lydia Rainford (2005) discusses the role of the term "ironic mode" as it was used by Judith Butler and Linda Hutcheon. Rainford argues such thinkers claimed irony as a tool for the feminist to "use her alterity to her advantage, by using it to negate the terms of the prevailing hierarchy" (2005,4). In this understanding of the ironic, women use their secondary position in the gender hierarchy as a form of "negative freedom" in order to question the validity of the very existence of gender. While Rainford's work goes on to question the political and disruptive powers of irony in feminist writing, it provides an important counterpoint to author gender studies. By considering that women may use language differently than men in order to subvert their place in the gendered hierarchy, such studies are necessarily complicated. As Weidman and O'Sullivan suggest, it is critical to ask questions that refute the easy conclusion that women are inherently different than men and thus must write differently.

Conclusion
In closing, and perhaps most importantly, it is critical for scholars to consider that computer algorithmic approaches to author gender reinforce understandings of gender as a binary and can ignore individual author's own gender identification.