In 2019, a research article published in Science showed that a computational algorithm used by multiple institutions to guide decision-making for healthcare was underestimating the needs of Black patients (Obermeyer et al. 2019). The researchers determined that this bias was due to the algorithm’s use of previous datasets that related healthcare needs to their costs. Because healthcare costs rely on patients’ socioeconomic circumstances, such as accessibility and affordability, using them to predict healthcare needs perpetuates those socioeconomic circumstances, which stem from systemic racial biases (Obermeyer et al. 2019, 449). Ultimately, the findings depended on showing how the algorithm classified Black patients as less at risk than they actually were, compared to how it classified white patients (Obermeyer et al. 2019, 450–451). In other words, whiteness was positioned as the scientific control against which Black experience had to be tested in order for the algorithm to be critiqued.
The centring of whiteness as a scientific control is precisely why the study was effective in showing the discrepancies due to the algorithm’s bias as it illuminated white privilege relative to Black patients. Since the study hinges on the comparison between distinct racial categories, however, its data only makes sense by keeping racial and ethnic minorities defined as such. Simply put, the data necessarily keeps whiteness at the centre because whiteness is the basis on which the data makes meaning. The fact that whiteness could be used as a scientific control not only speaks to how we understand whiteness but also to how we understand data itself: natural and neutral. How did our understandings of whiteness and data come to be intertwined?
I posit that data, at least in the United States, is an ideological technology of institutional racism, based in part on its role in determining how we even classify race to begin with. Although most clearly seen in quantitative studies involving race, the operations of data carry the logics of whiteness to domains beyond what are explicitly defined as racial issues. For while our understandings of race have shifted to encompass its social construction (that race is constructed by our cultural and political frameworks such that it cannot be captured definitively), our paradigmatic understanding of data has not. Rather, the characteristics of digital categorization and organization as natural and politically neutral are derived, I will show, from scientific racism and embedded into conceptions of data, captured and captive to its very collation, as Johanna Drucker might say (Drucker 2014, 128). For Drucker, data are contextual sites where we can analyze what constitutes it as a form of knowledge production, whereby we make apparent the ideologies that give it significance and legitimacy. Lisa Gitelman and Virginia Jackson, to give another example, consider in “Raw Data” Is an Oxymoron that “data are always already ‘cooked’” (Gitelman and Jackson 2013, 2) by the norms and standards of disciplinary institutions, themselves mutually constituted by their objects of analysis. That is, data cannot be understood apart from their institutional contexts. And yet we point to the numbers as neutral and unbiased, divorced from the specific processes of collection and categorization from which they arise. Even arguing for the necessity of the interpretation of data assumes its neutrality prior to interpretation, further fortifying the lines between data and its constructive contexts. So how did our conceptions of data as natural and neutral come about? Even empiricism, the ideological basis of the scientific method, acknowledges intersubjective experience as the basis of evidence, constructing data from social consensus. What circumstances shifted our conceptions of data away from social consensus such that ideologies of racial classification became the very bones of data formation?
In this essay, I examine the racialization of data in the United States through the Tabulating Machine, developed by the German American inventor Herman Hollerith in the 1880s to automate census tabulation. Because scientific theories of race at the time posited racial categories to be biologically distinct and hierarchized, the formation of data as natural and neutral followed suit to validate those theories for the hegemonic enterprise of population management. In short, the datafication of race based on polygenic racial theories solidified (and continues to solidify) white supremacy. Consequently, methodologies of data formation and deployment centre whiteness in the United States. Whiteness, as I will later unpack by drawing on scholars such as David Roediger and Stephen Middleton, is an ideological framework that constructs a hierarchy in which a particular social group dominates, and exclusionary practices are rationalized by appealing to a natural order, purity, or status quo. Based on naturality and political neutrality, the ways in which data itself reigns supreme in social and political realms are inherited from operations of whiteness. The datafication of race thus encoded a politics into data, reifying specific ideologies into data as well as into an ideology of data.
This argument follows the lead of media scholars who have been working to illuminate such practices of racializing technologies: Simone Browne has shown that dissolving the link between surveillance and the enforcement of white supremacy on black bodies has been a methodological imperative of the discipline of sociology (Browne 2015); Lisa Nakamura has examined how the professionalization of IT work blurred the lines between human resources in digital industries and racial segregation (Nakamura 2002); Safiya Noble has explored how information science and computational standards have worked to conceal the ties between search algorithms and the perpetuation of racial inequalities (Noble 2018); and Phan and Wark consider how correlative training models in large-scale data processing technologies hide racialization through inductive logics (Phan and Wark 2021). Similarly, I argue that the digitization of census tabulation dissolved the links between racial data and the ideology of racial classification that prioritizes whiteness. I show that Hollerith’s Tabulating Machine produced not only an efficient method for constructing databases, but also an institutional system by which data could detach itself from the contexts of its own formation. Seen in this way, deconstructing data becomes inextricable from deconstructing whiteness. In order to understand the sociopolitical constructions and implications in and of data, then, as we do with race, we must continually examine it in light of the contexts by which it has emerged and is emerging.
The race to automate census tabulation
The census presents a site where race is a dynamic system of contested meanings with immediate legal and sociopolitical ramifications. More generally, I follow Michael Omi and Howard Winant in considering race as “a concept that signifies and symbolizes social conflicts and interests by referring to different types of human bodies” (Omi and Winant 2014, 110), paying particular attention to racial formation, or how racial identity is constructed by sociohistorical processes. As an artifact built on data, the census is a “schematic” the state uses to both propose and validate legal significations of race, primarily based on scientific and data-based symbols (Thompson 2016, 21). The conception of race as a biological property in the census, for example, has informed judicial decisions that perpetuate exclusionary and discriminatory practices. Despite considerable evidence from geneticists that greater genetic variances exist in alleged racial groups than between them (Appiah 1985, 30–32), court cases as recent as 1990 show that the legal conception of race has not progressed much beyond attention to biological characteristics (López 2013a, 242–243). Kenneth Prewitt, on the other hand, points to how the categorization of Hawaiians in the census of 2000 as a race distinct from Asians showed a shift from scientific or anthropological theories to the pragmatism of social policy (Prewitt 2013, 513). In this case, racial classification was not used to define biological properties but to rectify historical injustices. While racial categories were formerly used to exclude nonwhite groups from various civil rights on scientific grounds, they became a pragmatic way to reverse discriminatory policies by presenting data that showed systemic discrepancies along racial lines.
In both cases, however, the process of racial formation was already crystallized by the racial project of the census, through the linkage between its categorical structure and the representational methodologies of social sciences (Omi and Winant 2014, 122–123). For the United States, racial categories had been tied to the scientificity of census data so much so that the move to view race as a means for self-expression, manifesting in the 1997 Census Bureau decision to allow one to “mark one or more” races, was criticized as a problem of data. For policymaking, the NAACP, for instance, argued that defining race by self-expression “might disaggregate the apparent numbers of members of discrete minority groups, diluting benefits to which they are entitled as a protected class under civil rights laws and under the Constitution itself” (quoted in Prewitt 2013, 514). The fact that people can and have changed their multi-racial identifications, in other words, makes for unstable (and subsequently discredited as unreliable) data, making efforts that push for race-conscious policy more difficult to evidence.
But what is it that determines whether or not data is even stable or reliable, especially for the census? Until 1960, American citizens were not even allowed to mark their own race on the census. Instead, federally employed census enumerators were trained to observe and ascertain race, constructing data according to particular formats. And until 1880, these enumerators were U.S. Federal Marshals, signalling a direct tie between racialization and law enforcement (Kukutai et al. 2014, 3). Sociologist Matthew Snipp notes how demographic researchers today must use racial categories predetermined by the federal government to structure data reported by various organizations in order for their studies to have any credibility. In short, the decennial census and federal surveys are necessarily the benchmark against which all sociological research involving race is checked (Snipp 2003, 563–564). The racial categories of the census, however, cannot account for the fluidity and instability of racial formation because their construction erroneously assumes each individual to have a fixed, discrete, and clearly identifiable property of race. The data constructed by sociological studies based on those categories can thus only perpetuate an essentialist ideology of race, with its inevitable naturality and political neutrality. In this way, data formation becomes inextricable from racial formation. Data concerning race, consequently, is stable and reliable only insofar as it is able to validate and maintain racial formations that were already legally established.
In fact, from its very conception, the census in the United States has been an apparatus that formed “race” for legal and sociopolitical purposes. The first national census for the United States, taken in 1790, featured only two racial categories: “white” and “slave” (Morning 2002, 42). These categories were formulated by the Constitution (Article I, Section 2) to determine a state’s population for legislative representation and taxes, counting those labelled as “slave” according to the Three-Fifths Compromise. This formulation of race and its data were reliable insofar as they were able validate and rationalize an economic system based on slave labour, and were not explicitly tied to biological or cultural traits. In 1820, however, the introduction of the categorical label of “colour” racialized the dichotomy between “white” and “slave” as phenotypical, distinguishing them based on skin colour. Even those categorized as “free coloured” were differentiated from “white,” attributing an inferior status to those racialized as nonwhite and solidifying white supremacy into the political structure.
By 1850, census enumeration evolved to adopt scientific theories for racial classification, following guidance from the American Statistical Association (Snipp 2003, 566). The seventh census labelled people “Black,” “White,” or “Mulatto,” explicitly racialized through biological differences. This scientific interpretation detached racial categories from sociopolitical status, with “Black slaves” and “Mulatto slaves” as separate categories, implying that one’s race was an intrinsic property necessarily prior to circumstance. To be sure, those biological theories, developed by scientists such as Charles Darwin, Francis Galton, Carl Vogt, Ernst Haeckel, and Nathaniel Shaler, were grounded in white hegemony. Scientific racism, specifically polygenism and eugenics, held that human races evolved from distinct origins, resulting in biological and phenotypical characteristics among various races that could be hierarchized into a natural order. Effectively, these theories were used in the United States to posit the inferiority of Black folk, as well as other nonwhite groups. A person of colour was inferior not because they had been sociopolitically labelled as such, but because of their race, an essential property deemed to be supported by biological and historical evidence. Census enumeration and tabulation processes informed and were informed by these theories, justifying discriminatory practices such as racial segregation, coerced sterilization, and exclusionary foreign policy (Omi and Winant 2014, 119). The census therefore served to scientifically rationalize such a racialized system, later shaping more extensive methodologies of data formation and analysis.
From the 1870s to the 1900s, issues of urbanization and immigration in the United States incited a greater need to use census data for population management (Anderson 2015, 87–93). As evidenced by policies such as the Chinese Exclusion Act of 1882 and the Dawes Severalty Act of 1887 (Indian General Allotment Act), economic growth was greatly impacted by the legal and political processes surrounding definitions of race, for vital industries such as railroads, manufacturing, and agriculture relied heavily on immigrant labour. However, since it took up to eight years to tabulate census enumerations, by which time the data might be significantly inaccurate, what the Census Office needed was not just a more reliable method of categorization and tabulation, but one that could handle the scale of increase and change in national demography (Anderson 2015, 102–104).
This federal need led to the professionalization of statistics, in large part due to the demand for trained experts who could both direct and, more importantly, interpret census tabulations. Francis Amasa Walker, statistician and superintendent of the census in the 1870s, prioritized the accuracy of census tabulation and developed regulations for data analysis methods, standardizing the ways census categories were to be interpreted (Anderson 2015, 88–94). More specifically, Walker’s theories on race held even certain European ethnic groups to be inferior on a biologically based racial hierarchy (Roediger 2005, 69–70). Consequently, these standardization efforts expanded the deployment of the census from its constitutional purpose of political apportionment to the description of the nation’s social climate, authorizing census data to be used as a basis for racialized policymaking. To support his efforts in developing a mechanism to automate census tabulation, Walker hired Herman Hollerith, a German American inventor who had just graduated from Columbia, as an instructor at the Massachusetts Institute of Technology (MIT), over which he presided in the 1880s.
In 1884, Hollerith designed an electrical machine to automate the processes of census tabulation (Figure 1). With such a machine, the federal government could vindicate the exorbitant expense of $3 million USD (about $80 million USD today after inflation) for census work. Additionally, the machine could amass the kind of data to validate the racial categories that justified economic growth based on the labour of people of colour. Termed the Tabulating Machine, Hollerith’s invention materially fastened data formation and tabulation to the racialized sociopolitical system of the census, becoming a cornerstone in modern computation (Spencer 2001; Black 2001). Inevitably, the sociopolitical ideologies surrounding the census, as a technology of race-based population management, were embedded into the workings of the Tabulating Machine. And yet, the computational methodologies of the machine were presented as independent of the sociopolitical contexts from which they emerged.
Hollerith’s electric tabulating machine. Wikimedia Commons. https://commons.wikimedia.org/wiki/File:1902_Hollerith_electric_tabulating_machine.jpg.
The method Hollerith proposed for handling the census in his 1884 patent can be broken down into three steps: digitization, verification, and tabulation. First, one would transcribe census information taken by enumerators into punched holes on a continuous strip of paper according to given categories; a punched hole or a combination of such would indicate sex, race, age, occupation, etc. That is, census information would be parsed into terms that would be computable, stripping it of its sociopolitical significations. Each set of holes would then be verified against a standard template to make sure they were punched according to proper dimensions, reenacting standardized profiling practices whereby a person’s race would be checked by the colour of their skin. Then the strip of paper would be continuously fed through a machine whereby punched holes would allow corresponding circuits to connect, actuating electromechanical counters to tabulate multiple sets of punched holes. What Hollerith specifically patented, in addition to this method and the machinery, was the combination of the circuitry with the “record-strip,” and then the “standard or templet” explicitly added in 1885 (Hollerith 1889, 17–18, 27).
By 1887, Hollerith changed the medium on which census data would be inscribed from the continuous “record-strip” to separable “record-cards” (Hollerith 1889, 2), inspired by the Jacquard Loom and the work of Ada Lovelace and Charles Babbage (Figure 2). In this fashion, Hollerith’s methods adopted the ideology that anything was encodable into a series of true/false values, purporting that markers of identity such as gender, class, and race could be digitized. Such methods highlight, as 1965 Census Bureau director A. Ross Eckler explains, “the outstanding importance of the innovation which first reduced the data on the census schedule to a form which could be classified and counted by purely mechanical devices” (Truesdell 1965, iii). In essence, Hollerith’s innovation was the reduction of census data into a form that was mechanically countable. Racial formation was thus represented through punched holes: a digital format that implies a symmetrical process of signification across categories.
Punch card and circuitry design for tabulating punched cards. Columbia University Computing History. Herman Hollerith. https://www.columbia.edu/cu/computinghistory/hh/.
While race could previously be considered as a fluid amalgamation of phenotypical features, nationality, ancestry, etc., the punched card made it so race was merely a matter of perforation. Paired with electrical circuitry to structure how the data on those punched cards would be processed, the Tabulating Machine technologically embodied and enacted the ideology that identity is reducible to digital information such that meaningful classification can happen regardless of the sociopolitical processes by which markers of identity are constructed. For instance, whiteness could be presented as formed by the same sociohistorical processes as blackness. By reducing race as such, the digital format fixed it to a particular racial formation and at the same time dissolved the link to sociopolitical motivations. In this way, the digital formation of data by Hollerith’s machine shows another technique by which computation encodes ideology and impacts cultural memory: drawing boundaries around what to remember and what to forget, including the memory of how those boundaries were drawn.
The roots of racial data structures
The first tabulation done by Hollerith’s machine was “Colour-Nativity by Age Groups” of the 1890 census: the population for each age group according to race and nativity, the latter being defined as place of birth relative to the United States. Specifically, the racial categories were “White,” “Black,” “Mulatto, “Quadroon,” “Octoroon,” “Chinese,” “Japanese,” “Indian” (Native American), and “nativity” determined whether one was born in the United States or not, the categories being either “native” or “foreign-born.” The categories relevant for this tabulation were not only an individual’s race and nativity, but also that of their father and mother.
For the tabulation process, each punched card would be laid across rows of tiny bowls of mercury, each bowl corresponding to a potentially punched hole on the card (Figure 3). A device with pins would then be brought down and enclosed upon the card, each pin corresponding to a potentially punched hole. The pins were attached to springs such that if a hole had not been punched, the pin would compress upwards into the device; otherwise, the pin would meet with mercury in the underlying bowl, completing an electric circuit that would either connect to another circuit or actuate a corresponding electromechanical counter. If the hole for “Chinese” had been punched, for example, the respective pin would be able to make contact with the small bowl of mercury beneath that hole, completing the circuitry for the electromechanical counter that was specifically counting the Chinese population. If the tabulation required an accounting of a combination of categories (e.g., “Female Chinese”), then the circuitry would be wired such that the counter for “Female Chinese” would only be actuated if both the pins for “Female” and “Chinese” met the mercury. Rather than tallying each record by hand, the tabulators would thus only have to rewire circuitry according to the desired statistic and record the numbers on the counters, turning a process that previously took years into a matter of weeks.
Design for mercury bowls and pins for tabulating machine. Columbia University Computing History. Herman Hollerith. https://www.columbia.edu/cu/computinghistory/hh/.
For the tabulation of “Colour-Nativity by Age Groups” particularly, the circuitry was wired as follows. If the “race” punched on a card was of a nonwhite category, the respective counter for age would directly be actuated. If “W” (white category) was punched, however, the circuit would check for nativity, with “FB” (foreign-born) leading to an accounting for age and “Nat” (native) a further check on parentage. For a “Nat W” (native white) card, the circuitry would then check the nativity of the father. If the father was “FB,” then the card would read to the machine as “Native White/ Foreign Parentage” (abbreviated NW/FP). If the father read “Nat,” the circuitry would then check the mother’s nativity. That is, if either parent was “FB,” the card was labelled as having mixed parentage and categorized as NW/FP. Otherwise, if both parents read “Nat,” then the card would be categorized as “Native White/Native Parentage” (abbreviated NW/NP). Once the circuitry had reached either NW/FP or NW/NP, then the age would be accounted for. Practically speaking, the pins that would actuate their respective part of the circuitry did so simultaneously, meaning that the information of race, nativity, parentage, and age was gathered by the machine all at once. The order of checking, then, was due to how the circuits were connected.
The circuitry of information flow (before the accounting of age) can be visually represented (Figure 4). In such fields as computer science and statistics, this data structure would come to be known as a tree, a prominent characteristic of which is a one-way flow of information. Although we might think of the flow of water and nutrients in a tree as upwards from its roots to its leaves, it is conventionally represented in computer science as a top-down structure such that information flows downwards. The top node is the root of the tree, subsequent levels being derived from direct connections (as represented by the lines). Abstractly speaking, a node that branches down to another node is termed a parent in relation to the latter node, which is its child. In the diagram above, “White” is the parent node of “Native” and “Foreign-born White,” while “Native” is conversely a child node of “White.”
That the question of nativity descends from racial categories in this data structure suggests the assumption that race precedes geopolitical boundaries. An immigrant from Italy, for instance, would be racialized into the same category as a domestic citizen of Irish descent, regardless of the difference in their inherited ethnicities. For, as scholars of Whiteness Studies have shown, the United States in the nineteenth century categorized people as “White” if they were of pure European ancestry in order to rationalize the inferior status of mixed-race descendants of slaves (Middleton 2016, 20). And even though various European ethnicities, such as Finnish, Greek, or Southern Italian, were discriminated against, the necessary involution of colour into American racial hierarchy made it such that those discriminations were not codified into legal and political systems (Guglielmo 2000, 27). Furthermore, the civil rights of the naturalized descendants of those European ethnicities did not face the same systemic infringements and “hard racism” as did nonwhite groups (Roediger 2005, 11, 65–67). Whiteness, in any case, was posited as an attribute prior to sociopolitical circumstances, race informing, through a one-way flow, boundaries such as nationality. If race preceded nationality, however, then why and how were the categories for “Chinese” and “Japanese,” which explicitly reference nations, racialized as nonwhite?
The Chinese Exclusion Act of 1882 provided the impetus for defining “Chinese,” and later “Japanese,” as a racial category rather than one of nationality. To reduce immigration of labourers from China and Japan, Chinese and Japanese people were defined as either migrants from those countries or descendants from people of those nationalities. In other words, “Chinese” and “Japanese” were first defined by geopolitical boundaries. As seen from the cases Chae Chan Ping v. United States (1889) and Fong Yue Ting v. United States (1893), the Supreme Court rationalized the exclusion of Chinese immigrants from naturalization or civil rights as a matter of political sovereignty. While Chinese labourers were primarily imported as cheap labour to build the railroads during the Reconstruction Era, they were just as impertinently deported after the railroads were finished, justified as international foreign policy. However, these justifications were racialized by scientific discourse, which argued that Chinese and Japanese immigrants could not assimilate to the United States due to intrinsic racial differences, and accordingly could never be labelled as “white” (López 2013b, 777). In this way, the United States could assume an impartial stance seemingly divorced from social motivations, justifying racial exclusion as a scientifically natural course of action, while still being able to import labour from other nations. The constructions of “Chinese” and “Japanese” were thus encoded as nonwhite racial categories in order to maintain hegemonic control over Chinese and Japanese immigration on the basis of scientific authority.
But from the data structure above, census tabulations placed “White” on the same level of significance as “Chinese” and “Japanese,” implying whiteness to have a process of construction symmetrical to nonwhite categories. If “Chinese” and “Japanese” signified intrinsic racial attributes, then “White” should as well. And yet, there were no certain positive definitions of whiteness; racial discourse constructing people as “White” pertained neither to skin colour or phenotype (for there are a variety of such from Europe, Asia, and Africa) nor to ancestry (for a person born of both European and African descent was labelled “Mulatto,” not “White”). Instead, whiteness was constructed as not “Black,” “Mulatto,” “Indian” (Native American), “Chinese,” or “Japanese”: a definition by negation (Gotanda 2013, 37). The placement of “White” on the same structural level as other racial categories therefore instituted negation as an intrinsic racial attribute, whiteness as an uncoloured colour. To be sure, an uncoloured colour does the double duty of instituting and delineating colour, the same way a scientific control becomes the category by which all other categories have meaning. With its symmetricality, the data structure digitized this double duty, encoding nonwhite racial categories as “Coloured,” with “White” encoded as both a racial category and the category by which other categories were coloured.
If we treat the data of the 1890 census contextually, as experimental evidence towards validating scientific racism, then the effects of this data structure become clearer. Melissa Nobles, along with other political scientists, has shown that from 1850 to the early 1900s, the census was used as a primary source of data for scientists to propose and test theories of race as a biological property (Nobles 2000, 1739). Polygenists such as Nathaniel Shaler and Louis Agassiz, who believed human races evolved from different points of origin, adapted Darwin’s theory of natural selection to argue for a social theory of racial struggle. Even though Darwin claimed that the human species descended from a common ancestor, polygenists used census tabulations as evidence for distinct races struggling against one another for survival. The census was therefore used circularly to propose and then validate racial categories as “natural,” even though its categories were, decades before, already sociopolitically constructed based on a racialized hierarchy. In this way, racial categories came to represent self-evidential features of the picture the census was to capture, detaching it from its constitutional context. The “Mulatto” category, especially, was a vital site for polygenists, who aimed to show that racial intermixing led to infertility and moral deficiency.
To prove the disadvantages of racial hybridity, as well as the inferiority of African descendants, polygenists required statistical data over time. In the instructions for census enumerators in 1870, as Nobles’s study shows, the priority of advancing scientific theories of race through the census was explicit:
Be particularly careful in reporting the class Mulatto. The word here is generic, and includes quadroons, octoroons, and all persons having any perceptible trace of African blood. Important scientific results depend upon the correct determination of this class. (quoted in Nobles 2000, 1740)
In 1890, the instructions were more specific, defining for enumerators how to distinguish “mulatto” (“three-eights to five-eighths black blood”), “quadroon” (one-fourth), and “octoroon” (one-eighth, or any trace at all), directing enumerators to define race according to skin colour or ancestry. These categories were used to count the distribution of racial intermixing and to determine rates of mortality. Statistical evidence of significant decline in racially intermixed populations would hence prove a polygenic theory of race. It is important to note, however, as Nobles does, that there was no point in census history where “Mulatto” was considered as a person of “mixed White” ancestry, suggesting that the “White” category was presumed to be pure. In fact, Stephen Middleton has shown that this assumption of whiteness as unmixed was evident in United States legal discourse as early as 1842 (Middleton 2016, 14–16). Middleton quotes Judge Read in Thacker v. Hawk, in ruling that “The word ‘white’ means pure white, unmixed. […] A mixture of black and white is not white” (quoted in Middleton 2016, 15). Consequently, the lack of whiteness was grounds for exclusion from citizenship and civil rights, thereby positing whiteness as the hegemonic default. In this way, census categorization did not merely count race but, through the circular argument and validation of “natural” races, assisted in encoding sociopolitical statuses into constructions of race.
If census categories were criteria for comparing the lifespan and fertility of mixed-race populations with “pure-race” populations, then the construction of “White” in late nineteenth-century United States racial discourse must be considered through its role as a scientific control. Seen one way, the scientific control presents a category against which to test specific variables; put another way, it is a category in which said variables are necessarily absent. The negatory quality of “White,” then, as enacted by census enumerators, was justified by making the absence of certain characteristics a categorical property in and of itself. Once racialized, the “White” category as a scientific control implied the notion of purity and therefore reinforced the idea of pure races. The “Black” category though, arguably another scientific control to compare against mixed-race populations, was presented as self-evident, its “purity” justified by skin colour or parentage by purely African descent. In this way, “Black” could be a control for racial mixture, representing the “unmixed” category against “Mulatto,” “Quadroon,” and “Octoroon.” Due to the aforementioned variety of European skin colour and genealogy, however, the property of “unmixed” could not be so directly imputed to the “White” category. As Ian Haney López aptly put it, “whites exist as a category of people subject to a double negative: They are those who are not nonwhite” (López 2013b, 780). Rather than being “unmixed,” whiteness as a scientific control was deduced to be an “unraced” race altogether.
While negations can be productive categories, such as “Other” or “None of the Above,” presenting any of them as a positive category in and of itself implies its symmetrical significance to other categories, which, because of its indeterminacy (e.g., uncoloured, unraced), gives it power over categorization: “I am categorized as one who categorizes the other.” By placing “White” on the same level as other racial categories, the data structure of a tree reified the negation of race as a positive racial property, substantiating it so much so that nationality could be derived from it. Conversely, the one-way information flow of the tree structure also implied that nationality does not inform constructions of race, validating the notion of racial purity apart from nationality. With whiteness paradoxically posited as a racially pure negation of race, it became a site on which racial hierarchy could be established.
Algorithmic whiteness
This anxiety to define and maintain an understanding of “White” as an “unraced” race, as a scientific control, can be further seen in how the notion of purity transferred to classifications of nativity. Specifically, the nodes of nativity that branched from “White” in the data structure imply a purification project by making the absence of foreign parentage the central clarification. To reiterate the logic of this clarification process, we can translate the data structure into a binary classification algorithm:
-
Is the “White” person foreign-born?
If yes, terminate the process at “Foreign-born White,”
if no, proceed to the next step;
-
Is the father of the “Native White” person foreign-born?
If yes, terminate the process at “Native White/Foreign Parentage,”
if no, proceed to the next step;
-
Is the mother of the “Native White” person foreign-born?
If yes, terminate the process at “Native White/Foreign Parentage,”
if no, terminate the process at “Native White/Native Parentage”
For transparency’s sake, this algorithm was derived from the data structure such that the three classifying questions explicitly prioritize foreign birth. The questions could also be asked the other way (i.e., “is the ‘White’ person native-born?”), in which case positive and negative results would switch places, making those aspects of the algorithm arbitrary.
In either case, however, it is the attribute of “foreign” that necessarily terminates this part of the tabulation process. “Native Parentage” required both parents to have been born in the United States, while “Foreign Parentage” only required either parent (with more weight given to paternal parentage) to have been born elsewhere. That is, someone with parentage of mixed nativity was placed in the same category as someone with two foreign-born parents, both cases classified as having “Foreign Parentage.” This attention to parentage as a clarification of whiteness points to how certain European immigrants, particularly those from southern and eastern regions, were “conditionally white” (Brodkin 1998, 60). On one hand, these “Foreign-born White” or “Native White/Foreign Parentage,” termed “new immigrants,” were culturally discriminated against by their northern and western counterparts as biologically inferior (Roediger 2005, 36–37). At the same time, they were used as examples of whiteness to deny nonwhite groups access to citizenship and civil rights. David Roediger considers these “new immigrants” as in between systemic racism and full sociopolitical inclusion, whereby naturalization was determined by how much their genealogies had been assimilated into American society (Roediger 2005, 57–58). Simply put, naturalization was a process of whitening, appropriate only for particular racial groups whose whiteness was questionable. In this way, mixture (or miscegenation) was generalized by the census data structure as a foreign element against the project of purifying and assimilating an uncertain whiteness into “Native White/Native Parentage.”
On the other hand, questions of nativity for “Black,” “Mulatto,” “Quadroon,” “Octoroon,” “Chinese,” “Japanese,” and, most ironically, “Indian” went through a different process, seemingly forgotten in the reported count for “Colour-Nativity by Age Group.” For these seven subclasses of the group that was designated “coloured,” the process of tabulation is described as follows:
In order to get figures for native and foreign-born colored, the machine was wired, as indicted [sic] above, to reject the foreign-born Negroes and Indians and the native Chinese and Japanese, for a supplementary hand count by nativity. (Truesdell 1965, 67)
For the 1890 census, there were significantly less foreign-born “Negroes and Indians” than those who were born in the United States, and similarly less native-born Chinese and Japanese than those foreign-born. It was more efficient, then, to first have the machine tabulate the age groups those subclasses by their total populations, later counting by hand the foreign-born “Negroes and Indians” and the native-born Chinese and Japanese to round out the total counts, including the “White” population, for native and foreign-born. Therefore, the machine was wired to read all “Black,” “Indian,” and mixed-race punched cards as “Native,” and all cards designated “Chinese” and “Japanese” as “Foreign,” in order to get a total count for each of the “coloured” subclasses (Truesdell 1965, 67). For a card the machine would “reject,” the tabulators would record its nativity by hand, and then switch it to an alternate wiring that would count its age group. In other words, nativity did not factor into tabulations of age groups for nonwhite categories. In fact, racial categories themselves only factored in to classify them against the “White” category, as the age groups of those seven racial categories were collated into a single “Coloured” group.
In the context of the census, what this data structure argues is that all “coloured” races, regardless of nationality and cultural differences, bear a similarity that makes them intrinsically distinct from the “White” race. And since the uncertainty of whiteness was being “purified” through processes of naturalization, a person with any trace of “colour,” regardless of nativity or racial mixture, was tacitly considered to be foreign. Indeed, we can read this assumption in the visual representation of the structural layout of Hollerith’s Tabulating Machine dials in Figure 5 (Truesdell 1965, 64). In the column “Ages by colour-nativity,” there is a top-down progression of increasing foreignness. The top line is for “Native White/Native Parentage,” the next for “Native White/Foreign Parentage,” the third for “Foreign Born White,” and the bottom is labelled for “Coloured.” Through the progression of data categories, we can see a trend that implies a decreasing nativity, from parentage to location of birth to colour. In spite of parentage or nativity, then, all “Coloured” people could be read as more foreign than a “White” person who was born in another country.
Structural Layout of Herman Hollerith’s Tabulating Machine dials. Truesdell, Leon E. 1965. The Development of Punch Card Tabulation in the Bureau of the Census. U.S. Government Printing Office. Page 64. https://ed-thelen.org/comp-hist/The_development_of_punch_card_tabulation-i.pdf.
It could be argued, of course, that the bottom line is a separate category from the first three, since “Coloured” could have been placed on the top row of the machine’s dials, making its position relative to nativity arbitrary. But since we understand census data as socially constructed, particularly informed by racial discourse, the progression of categories matters insofar as it demonstrates an underlying politics. While the progression of these categories does not necessarily indicate political intention, the structures of data point to the politics of their construction: data formation signals racial formation. After all, the categories were arranged as such. Furthermore, reading sociopolitical constructions in the progression of categories in “Ages of colour-nativity” does not take away from the fact that the seven racial subclasses of “coloured” were subsumed into one category regardless of nativity or racial mixture. That we can even consider the “Coloured” category’s arbitrary position evinces its sociopolitical status as “Other.”
Effectively, this digitized othering of racial categories bore different sociopolitical weights for each of them. For “Chinese” and “Japanese,” in the first count, recall that Hollerith’s machine was wired to read all of them as “Foreign” in order to get an efficient tabulation of their respective populations and their distribution in the “Coloured” age groups. Whether or not the census enumerators deemed Chinese and Japanese people that were born in the United States as foreigners, the machine read them as such. For the “Chinese” and “Japanese” cards that were “Native,” the machine was set to “reject” the same way it would a mispunched card. Put another way, a “native-born” Chinese or Japanese person would be read as an error by the machine, and enumerators would record the data of Chinese and Japanese people born in the United States by the same process they would errors. The construction of census data for “Chinese” and “Japanese” thus followed a similar logic that has continued to propagate the Asian American with the quality of “permanent foreignness” (Kim 1999, 126) or into the figure of “perpetual foreigner” (Lee 1996; Tuan 1998; Nakamura 2002; Uba 2002; Fickle 2019). The unassimilability of Chinese and Japanese people, a sociopolitical proposition we previously noted became scientifically racialized, was consequently encoded into procedures of data construction.
For the categories of “Black,” “Mulatto,” “Quadroon,” “Octoroon,” and “Indian,” Hollerith’s machine was wired to read all of them as “Native.” Practically speaking, this was done because there were significantly less foreign-born people of African and Indigenous American descent, making it more efficient to tabulate those by hand, later subtracting from the total respective counts to ascertain the “Native” populations for each category. The conflation of “Black,” “Indian,” and mixed-race categories into the same data construction procedure, however, suggests an understanding of nativity that validates and sustains a white supremacy over Native Americans through the process of naturalization. As we saw from instructions for census enumeration, “nativity” was defined as the place of birth relative to the United States, meaning that it was constructed from geopolitical circumstances. What could not factor into “nativity,” then, were ideas of indigeneity and property that did not already conform to the institution of the United States by a colonial white hegemony. A “Native Indian” was subsequently attributed with a nativity of the same level of significance as “Native Black” and “Native White” categories.
The scientific racialization of Native Americans, along with other people of colour, further made it such that nativity, more so indigeneity, could not factor into constructions of race. As we considered of the data structure above, the one-way flow of information perpetuated the assumption that nativity was derived from race, rather than as part of the sociopolitical construction of race. That race preceded nativity allowed for questions of nativity to be applied across the board to all racial categories symmetrically, rationalizing the geopolitical basis of nativity as a natural and neutral classification tied merely to place of birth. To put it bluntly, the United States census tacitly established that all races only became native by the process of naturalization, regardless of whether one descended from people sold and imported by the transatlantic slave trade, or indigenous people who were colonized, or the colonizers themselves. Furthermore, the conflation of “Black” and “Indian” into the same data construction process allowed for the erroneous imputation of the genealogical naturalization of African Americans onto Native Americans. Even though Native Americans were indigenous to the Americas, that is, their conflation with African Americans justified treating them as foreign “others” that had to be similarly naturalized or assimilated (i.e., whitened). With the definition of nativity allowing “White” or “Black” to be as “Native” as “Indian,” the census was therefore able to sustain a white hegemony and continue to legitimize colonial practices, with the data structure of Hollerith’s machine encoding it as a naturalized and politically neutral process.
As such, data structures and algorithms cannot be considered apart from the sociopolitical processes that construct them and which they in turn construct. The insight drawn from the comparative relationship of age groups between “White” and “Mulatto” populations, for example, is inextricably tied to the data structure being programmed such that a person classified as “Mulatto” could never be in the “Native White/Foreign Parentage” category, even if they had a parent of, say, Dutch ancestry. As we considered above, elements of foreignness were filtered out by the algorithmic process of purifying whiteness. Through an examination of how data was formed and represented in the first census count of Hollerith’s Tabulating Machine, we can analyze the interpretive activities by which constructions such as race, nativity, and nationality were digitized and perpetuated. Furthermore, we see how notions of data as natural and neutral find anchor in conceptions of racial categories as a priori, not just in how a racial politics was programmed into census categories but also in how the data structures in census tabulations were inseparable from racial projects such as polygenic theories or the Chinese Exclusion Act. In essence, reading the sociopolitical constructions of data can illuminate both the ideologies that are encoded into the data and the ideologies by which “data” is conceived.
Data formation
If conceptions of race have been constructed in part by data, then conceptions of data and digital computation have also depended in part on racial discourse. The authority of punched holes and hardwired tree structures as data formations for defining and deploying racial constructs is, as we have considered, derived from its role in maintaining a scientifically justified white hegemony. In turn, that hegemony validates the forms of knowledge those data formations produce, authorizing a data supremacy that allows those formations to be templates for evaluating other sociopolitical constructs. Not only is race reducible to a series of true/false punched holes, but so are markers of gender, class, sexuality, etc. The ideology that categories of such markers of identity are fixed and distinctly identifiable is similarly perpetuated through the accumulation of data, sustaining the dominance of the hegemonic default around which those categories revolve. Even the notion that markers of identity are discrete expressions of individuals is derived from the technology of the punched card.
As Hollerith described in his 1887 patent, “each card when properly punched becomes a permanent record of the individual […] and can be filed away as such, or the several records so formed can be classified and distributed” (Hollerith 1889, 2, emphasis added). Before 1887, Hollerith’s machine was designed for long sheets of “record-strips,” where census information for multiple people would be punched onto one strip and then fed continuously into the machine for processing. The switch to individual punched cards as a medium for inscribing data was so that census clerks could reorder the same materials instead of punching new record-strips for tabulating different statistical results. More than efficiency, however, it also shifted the paradigm for information processing. Individual punched cards had been used previously for textile manufacturing (Jacquard loom) and later mathematical computations (the work of Charles Babbage and Ada Lovelace), but the encoding inscribed on those cards differed from census data. For textiles and computation, specific instructions were punched into individual cards (e.g., for raising/lowering the warp, adding, subtracting). Those cards were strung together to form a sequence of instructions (i.e., an algorithm) that were then fed into their respective machines to produce their respective outputs. With census data, however, what would be processed by Hollerith’s machine was not a sequence of instructions, but discrete sets of information corresponding to particular individuals.
In this way, the Tabulating Machine was not operated as performing the instructions of punched cards, but rather as an automatic reader of them. Sequentiality was less relevant for census tabulation since the digital format of punched holes veiled the instructions for categorization as “mere” information, allowing said information to be detached from its instructors. For Hollerith’s machine specifically, its punched cards were detached from the sociopolitical goals of population management based on scientific racism, demonstrating that it was merely simulating the work of multiple human tabulators efficiently instead of performing federally mandated instructions for racial categorization. According to Hollerith himself, “the record-card thus formed can be prepared at any time or place and by unskilled operatives” (Hollerith 1889, 2), suggesting that the information each record-card held was fixed and consistent regardless of circumstance. The dissolution of a permanent record’s temporality, materiality, and sociality (for an “unskilled” person could now just read it anytime and anywhere) unfastened data from its formative contexts, creating a “statistical reality” that legitimized data as a rational and objective basis for racial policymaking (Prewitt 2013, 515). In short, the digital format of Hollerith’s computational method did not just encode instructions for racial categorization as the presence/absence of punched holes, but, more vitally, encoded the ideology of scientific racial formation into a conception of data itself.
It would be a mistake, of course, to consider Hollerith’s punched card and that of the Jacquard loom as distinct, one being wholly informational and the other being wholly instructional. Instructions are informational and, more importantly, information is an encoded set of instructions. And yet, for census tabulation, information was presented as distinct from instruction, categories divorced from their methods of categorization. Since the census, as shown previously, was being used to test and validate scientific theories of race, the instructions for proving or disproving those theories (i.e., their criteria) were encoded as census categories. For theories that held races to be natural and permanent categories, as opposed to constructed and emergent from sociopolitical discourse, an individual’s race was self-evident apart from processes of categorization. The census category of “White” in 1880, for example, was attributed to anyone who was not “Black,” “Mulatto,” “Indian” (Native American), or “Chinese.” In other words, the construction of the “White” category was based on an instructed negation of race, as far as the enumerator could tell, reifying whiteness as an “unraced” default. To then turn those categorization processes into a “permanent record” not only validated and solidified those categories but more so the methods by which recording became permanent.
The idea that these permanent records could be “filed away as such” (Hollerith 1889, 2) also introduced posterity, which, coupled with the dissolution of data’s formative contexts, formed the foundation for constructing databases. Specific persons could be digitized and stored away, ready to be searched and sorted according to particular kinds of information. Consequently, the punched card became a technology of subjectification, a way of producing knowledge about an individual that was encoded as mere representation, each record “complete in itself” (Hollerith 1889, 2). Criteria such as race, gender, and class were not only reified as inherent and permanent attributes of a person, but also as normative categories a priori and wholly distinct from one another. Furthermore, the methods by which such criteria became a priori and distinct was validated and then justified by digitization, encoding processes of social construction as politically neutral data such that it could be used politically. In that way, permanent records representing discrete subjects could be readily surveilled and managed, evolving into paradigmatic foundations for information theory and cybernetics in the early twentieth century (Hayles 1999, 52–54).
It should be noted that Hollerith’s machine was not the first system that attempted to create individual records for sorting census data. For the 1885 Massachusetts census, the clerk Charles F. Pidgin developed a tabulation method using individual cards on which the categorical information of each person was written. This method made it so that tabulators did not have to handle schedules redundantly to tally basic classifications for multiple categorical combinations; rather, they were able to sort the cards into respective groups and tally statistics according to each group. The difference between Pidgin’s system and that of Hollerith’s, however, is the digital form of the latter. The punched holes gave categorical information a materiality, allowing it to be structured and sorted mechanically. To group a set of cards together, as Hollerith suggested, one could thread a sorting needle through a given category such that only the cards with that particular hole punched could be strung. Not only did it afford a pragmatic method for structuring data, but also a means for error-detection, for any card that was strung through a punched hole that did not correspond to the others would be misaligned. By digitizing categorical information into a material form, the punched card enabled the development of data structures on the basis of a normalizing uniformity, where all punched holes hold the same degree of information. In terms of race, a hole punched for “White” held the same degree of information that for “Black,” the only difference between them being which side of the categorical line they fell.
This conception of data—that it lies flat on a horizontal plane—has made it seem like databases “are collections of individual items, where every item has the same significance as any other” (Manovich 1999, 194). With this characterization, databases have been contrasted with narrative, the latter prioritizing sequence, the former organization. This analogy further dichotomizes instruction and information, perpetuating the concept of data’s natural uniformity. To assume that “every item has the same significance as any other” in a database, furthermore, is to assume that the ways by which those items are organized and signified are arbitrary and politically neutral. Assuming such has enabled databases and narratives to be analogized as “natural enemies” (Manovich 1999, 199) or, conversely, as “natural symbionts” (Hayles 2007, 1603). While Lev Manovich conceptualizes database and narrative in competition with the other, where “each claims an exclusive right to make meaning of the world” (Manovich 1999, 199), N. Katherine Hayles classifies them as “organisms of different species that have a mutually beneficial relation” (Hayles 2007, 1603). And though it might be productive to analyze the dynamic relationship between sequence and organization in data processing, the belief in their intrinsic distinction, that each has “an exclusive right” or that they are each a “different species,” implies and reinforces the primacy of data as a priori.
Consequently, databases are presented by scholars of new media, such as Hayles and Manovich, as requiring narrative to interpret or explain, forgetting the interpretive and instructive processes that go into constructing and structuring data. This encoding of data as mere information, with each datum having “the same significance as any other,” is inherited from the process of digitization in Hollerith’s punched card. To summarize and reiterate, the punched hole in Hollerith’s census tabulation system embodied a double signification and collapsed two kinds of meanings through digitization: it is a binary indication of categorical information, and it is a method by which items are organized. Specifically, it encoded organizational principles from the polygenic ideologies of scientific racism by which the census derived racial classifications. The forms through which the punched hole generated meaning for the census, then, were the ways in which race had been formed to legitimize a sociopolitical system based on white hegemony. As such, digital information was imbued with authority based on population management, enabling data to reign supreme as a technology that maintains organizational structures derived from scientific racial formations.
Conclusion
If data formation is so inextricable from racial formation, then the logics of white supremacy that undergird early computational systems institute a data supremacy as well. Data captures and is captive to its own context, the contours of which outline the categories of data as well as their construction. For the 1890 United States census, its data bears the contours formed by the racialization of sociopolitical statuses from the birth of the nation, legitimized on the biological and anthropological bases by which races were argued to be natural and distinct, whiteness being the scientific control, the dominant default. That racial categories were natural and distinct allowed for the social constructions of race to be presented as categorization processes that were rational and politically neutral. These processes and their justifications were then digitized by Hollerith’s Tabulating Machine, which reified racial categories and their hegemonic formations through the data structures of punched cards and electromechanical circuitry. Due to its efficacy and efficiency for census tabulation, the machine’s digitization and automation processes not only validated the resultant data for cost-effective population management but, more importantly, encoded principles of race-based population management into subsequent data collation and computational methodologies.
As previously discussed, these methodologies perpetuate systemic inequalities by linking particular sociopolitical contexts to data structures and then dissolving those links by positing the naturality and neutrality of data and computation. Colin Koopman has examined how the practice of redlining stems from the professionalization of real estate work and their dissolution of the ties between real estate data and racial profiling (Koopman 2019). On another level, Ruha Benjamin (Benjamin 2019) has investigated how contemporary technologies such as facial recognition and virtual reality vocational training softwares encode racist processes by utilizing past datasets with embedded inequalities, similar to the findings of Obermeyer et al. (Obermeyer et al. 2019). The problematics of data has also manifested in the field of digital humanities through debates on the efficacy of quantification and data-based analysis for humanistic research, with scholars such as Johanna Drucker, Moya Bailey, and Anne Cong-Huyen practicing digital work as interpretive and meaning-making activities so as to reattach digitality to its material contexts.
Although this work is only one among many in showing how systemic injustices and inequalities can be perpetuated by digital technologies, its focus on punched cards and electromechanical circuitries as objects of analysis posits the value of understanding the constitutive role that digitality has in specific forms of knowledge production. For we have seen that digital technology is not merely a medium through which systemic inequalities are maintained, but is also part of how those inequalities are constructed and legitimized. The conflation of instruction and information in the punched card, for example, enabled the expression of the digital paradigm as a technology of subjectification, a way of producing knowledge about an individual that was encoded as essential and mere representation. Furthermore, the materiality of categorical information in the form of punched holes enabled the sorting and structuring of digitally representable individuals; and since instruction was encoded as information, the sorting and structuring methods themselves were similarly argued to be rational and politically neutral. Data structures and their corresponding algorithms, in other words, proliferated digital modes of knowledge production and subjectification by way of racial organization. What once directed machines to perform certain tasks became the means by which subjects were mechanized to maintain specific sociopolitical hierarchies.
What the census constructs, then, instead of some objective photograph that depicts reality, is a tableau, where actors are directed to sit and stand in certain ways, and a scene is painted or captured by a camera. Automating the tableau construction conceals direction and colouring processes so much so that we forget the actors behind the scenes, how they held a posture for a time, gazed at something or nothing, still and silent and statuesque. And what is produced looks so lifelike and colourful, as if it really were just a snapshot of the people in the city, their natural forms and conditions of living. At the same time, the chosen colour palette is subsumed to “purify” the image, presenting the image as visual data in need of interpretation; the coloured lines are simply an expression of the data, itself an expression of reality. In turn, we are coloured (or uncoloured) by the image, the structures and algorithms of colouring merely matters of aesthetics, definition, programming.
Remembering it as a tableau, however, we pay attention to how the lines are drawn, the actors directed, the colours seen and painted. To consider data or digital programs in a similar fashion is thus to read their structures and algorithms constructed in sociopolitical contexts. To be sure, these data structures and algorithms form part of the foundation for developing the logics and grammars of computer programming languages. And these languages that build our contemporary technologies, what we now refer to colloquially as code, are themselves undergirded by circumstances not unrelated to those we have considered in this essay. Why, for instance, are programming languages primarily anglophonic? How might military command structures have shaped the organizational principles of these languages? Or what about the gendered history of the term “computer” as applied to computation? Pragmatically speaking, we inherit these frameworks and paradigms, reproducing them in the technologies we build and use. Therefore, these technologies are our systems of cultural memory, by which we encode patterns of what is forgotten, what matters, what counts.
Competing interests
The author has no competing interests to declare.
Contributions
Editor-in-Chief
Daniel O’Donnell, The Journal Incubator, University of Lethbridge, Canada.
Translation Editor
Davide Pafumi, The Journal Incubator, University of Lethbridge, Canada
Copy and Production Editor
Christa Avram, The Journal Incubator, University of Lethbridge, Canada
Section, Copy, and Layout Editor
A K M Iftekhar Khalid, The Journal Incubator, University of Lethbridge, Canada
References
Anderson, Margo J. 2015. The American Census: A Social History, Second Edition. Yale University Press.
Appiah, Kwame Anthony. 1985. “The Uncompleted Argument: Du Bois and the Illusion of Race.” Critical Inquiry 12 (1): 21–37. Accessed November 2, 2025. https://www.jstor.org/stable/1343460.
Benjamin, Ruha. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Polity.
Black, Edwin. 2001. IBM and the Holocaust: The Strategic Alliance Between Nazi Germany and America’s Most Powerful Corporation. Crown Books.
Brodkin, Karen. 1998. How Jews Became White Folks & What That Says About Race in America. Rutgers University Press.
Browne, Simone. 2015. Dark Matters: On the Surveillance of Blackness. Duke University Press.
Drucker, Johanna. 2014. Graphesis. Harvard University Press.
Fickle, Tara. 2019. The Race Card: From Gaming Technologies to Modeling Minorities. New York University Press.
Gitelman, Lisa, and Virginia Jackson. 2013. “Introduction.” In “Raw Data” Is an Oxymoron, edited by Lisa Gitelman. MIT Press.
Gotanda, Neil. 2013. “A Critique of ‘Our Constitution Is Color-Blind.’” Critical Race Theory: The Cutting Edge, 3rd ed., edited by Richard Delgado and Jean Stefancic, 35–37. Temple University Press.
Guglielmo, Thomas. 2000. White on Arrival: Italians, Race, Color, and Power in Chicago, 1890–1945. Oxford University Press.
Hayles, N. Katherine. 1999. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature and Informatics. University of Chicago Press.
Hayles, N. Katherine. 2007. “Narrative and Database: Natural Symbionts.” PMLA 122 (5): 1603–1608. Accessed October 27, 2025. https://www.jstor.org/stable/25501808.
Hollerith, Herman. 1889. “Art of Compiling Statistics.” United States Patent Office, Letters Patent No. 395782.
Kim, Claire Jean. 1999. “The Racial Triangulation of Asian Americans.” Politics & Society 27 (1): 105–138. Accessed October 27, 2025. http://doi.org/10.1177/0032329299027001005.
Koopman, Colin. 2019. How We Became Our Data. University of Chicago Press.
Kukutai, Tahu, Victor Thompson, and Rachel McMillan. 2014. “Whither the Census? Continuity and Change in Census Methodologies Worldwide, 1985–2014.” Journal of Population Research 32 (1): 3–22. Accessed October 27, 2025. http://doi.org/10.1007/s12546-014-9139-z.
Lee, Stacey J. 1996. Unraveling the “Model Minority” Stereotype: Listening to Asian American Youth. Teachers College Press.
López, Ian F. Haney. 2013a. “The Social Construction of Race.” In Critical Race Theory: The Cutting Edge, 3rd ed., edited by Richard Delgado and Jean Stefancic, 238–248. Temple University Press.
López, Ian F. Haney. 2013b. “White by Law.” In Critical Race Theory: The Cutting Edge, 3rd ed., edited by Richard Delgado and Jean Stefancic, 775–782. Temple University Press.
Manovich, Lev. 1999. “Database as a Symbolic Form.” Convergence 5 (2): 80–89. Accessed October 27, 2025. http://doi.org/10.1177/135485659900500206.
Middleton, Stephen. 2016. “The Battle Over Racial Identity in Popular and Legal Cultures, 1810–1860.” In The Construction of Whiteness: An Interdisciplinary Analysis of Race Formation and the Meaning of a White Identity, edited by Stephen Middleton, David Roediger, and Donald M. Shaffer, 11–43. University Press of Mississippi.
Morning, Ann. 2002. “New Faces, Old Faces: Counting the Multiracial Population Past and Present.” In New Faces in a Changing America: Multiracial Identity in the 21st Century, edited by Loretta I. Winters and Herman L. DeBose, 41–67. SAGE Publications.
Nakamura, Lisa. 2002. Cybertypes: Race, Ethnicity, and Identity on the Internet. Routledge.
Noble, Safiya. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. New York University Press.
Nobles, Melissa. 2000. “History Counts: A Comparative Analysis of Racial/Color Categorization in US and Brazilian Censuses.” American Journal of Public Health 90 (11): 1738–1745. Accessed October 27, 2025. http://doi.org/10.2105/ajph.90.11.1738.
Obermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.” Science 366 (6464): 447–453. Accessed October 27, 2025. http://doi.org/10.1126/science.aax2342.
Omi, Michael, and Howard Winant. 2014. Racial Formation in the United States, 3rd ed. Routledge.
Phan, Tran, and Scott Wark. 2021. “Racial Formations as Data Formations.” Big Data & Society 8 (2). Accessed October 27, 2025. http://doi.org/10.1177/20539517211046377.
Prewitt, Kenneth. 2013. “Racial Classification in America: Where Do We Go from Here?” Critical Race Theory: The Cutting Edge, 3rd ed., edited by Richard Delgado and Jean Stefancic, 511–521. Temple University Press.
Roediger, David. 2005. Working Toward Whiteness: How America’s Immigrants Became White. Basic Books.
Snipp, C. Matthew. 2003. “Racial Measurement in the American Census: Past Practices and Implications for the Future.” Annual Review of Sociology 29: 563–588. Accessed October 27, 2025. https://www.jstor.org/stable/30036980.
Spencer, Stuart. 2001. “IBM and the Third Reich.” The Lancet 358 (9292): P1558. Accessed October 27, 2025. http://doi.org/10.1016/S0140-6736(01)06557-6.
Thompson, Debra. 2016. The Schematic State: Race, Transnationalism, and the Politics of the Census. Cambridge University Press.
Truesdell, Leon E. 1965. The Development of Punch Card Tabulation in the Bureau of the Census. U.S. Government Printing Office.
Tuan, Mia. 1998. Forever Foreigners or Honorary Whites? The Asian Ethnic Experience Today. Rutgers University Press.
Uba, Laura. 2002. A Postmodern Psychology of Asian Americans: Creating Knowledge of a Racial Minority. State University of New York Press.




