Mark Olsen
ARTFL Project
University of Chicago
mark@barkov.uchicago.edu
Abstract
It is well documented that men and women use informal language, such as conversation and correspondence, in rather different ways, reflecting a wide variety of cultural forces and practices. There have been relatively few attempts to examine gender differences in more formal, published writing. Using two corpora balanced by time period and genre of 300 French literary texts by male and female authors written between 16th and 20th centuries, this study examines rates of word use and common lexical contexts associated with particular terms in an effort to isolate some characteristics of female writing in this time period. While there are clear differences in the words used by male and female authors not explained by genre or time period, it would appear that meaning of these words was not significantly altered. It is suggested that these characteristics of earlier practices of feminine writing are indicative of the conscious agency of early women writers in establishing a distinct female literary voice and may inform future proponents of "écriture féminine".
Introduction: Dangerous Words
Some 25 years ago, Hélène Cixous provocatively anticipated a distinctly female practice of writing. She declared that écriture féminine would be marked by characteristics which challenge the logic of writing within the "phallocentric" tradition, by its focus on the female body, glorying in a femininity too long repressed, and breaking up received truth through laughter. Her argument, based on her view of the poetic, implied a form of "false consciousness" in that not all women would, or even could, produce texts from this alternative practice. Indeed, she attacked the history of women's writing discounting "the immense majority [of women writers] whose workmanship is in no way different from male writing"1. Further, she declared that écriture féminine cannot be defined or identified outside of itself:
It is impossible to define a feminine practice of writing, and this is an impossibility that will remain, for this practice can never be theorized, enclosed, encoded, coded -- which doesn't mean that it doesn't exist. But it will always surpass the discourse that regulates the phallocentric system: it does and will take place in areas other than those subordinated to philosophical-theoretical domination. It will be conceived of only by subjects who are breakers of automatisms, by peripheral figures that no authority can ever subjugate.2A generation of French and America feminist critics have addressed, in a surprising variety of ways, Cixous' declaration that writing and knowledge is "embodied". In a recent article echoing Cixous, Michelle Kendrick writes,
The hypertext men, I argue, "galvanize" a tradition of writing that assumes the primacy of masculine reason, as instantiated in rhetorics of the "mind," where mind is defined outside of materiality and distinct from embodiment.3Cixous's call for a distinctive feminine writing appears to be based on two questionable propositions: that écriture féminine cannot be defined outside of its own terms and that few women authors may be said to participate in this unique practice, effectively discounting the history of women's writing.
It is unfortunate that so influential a declaration would be auto-marginalized by positing its own epistemological, social, and historical rootlessness. While Cixous may be right in suggesting that a practice of women's writing would be hard to identify, she pushes her attack on "phallocentric systems" of knowledge -- male rationality and binary logic -- to such an extreme that she invalidates any attempt to examine the characteristics of any écriture féminine. More inexplicably, her conviction of the overarching power of phallocentric systems leads her to discount the "other voice" present in women writers, in France and other regions of Europe, from the late medieval to the present.4 I would argue, however, that any putative feminine practice of writing should be, at a minimum, be identifiable as recessive traits in the literary production of women who predate Cixous's declaration and that these traits may be detected using systematic methodologies.
Within important limitations of the available sample of women's literary texts and methodologies employed, this paper suggests that there are clear and systematically identifiable differences between the literary writings of male and female authors in France between 16th and the 20th centuries. These distinctions are largely noted at the level of word selection rather than in significantly different use of particular words or themes. This finding may imply an important element of agency in the creation of an identifiable female writing space in French during this period by use of words whose meanings are not significantly altered. Marie de Gournay, in her 1641 Grief des dames, expressed this level of agency, when in complaining of men who refuse to read or acknowledge the "woman writing," she argued that women "will be able to treasure up some dangerous words with an impeccable pedigree" and be able to turn "the tables on them, because [they] have heard them and [have] read the things that come from their hands."5 If there are identifiable female practices of writing established in the early modern period and later, it would appear that these traditions are both grounded in the history of literary culture and would be an important component of Cixous' prospective écriture féminine.
Background
There is a considerable body of recent work examining the impact of gender on language often demonstrating important, even critical, distinctions in male and female use of language.6 While much of this work has been restricted to less formal forms of communication -- conversation, e-mail, and student essays -- there is some evidence that gender is an important discriminant in more formal literary texts. Minna Palander-Collin, for example, finds in her study of 17th century private letters that there are marked differences between female and male writing, suggesting that the women's letters are more interactional, personal and "involved" than letters by men, which are common features of women's communication in Present-Day English.7 These characterizations are based on identification of specific features and words. Thus, following the work of Biber and Finegan, she suggests that,
involvement refers to linguistic features that show interaction between the speaker/writer and the listener/reader [...] private verbs (e.g., assume, believe, doubt, find, guess, know, suppose and think) and the first personal singular pronoun are important markers of involved style, among other features.8Private letters, she concludes, tend to be more involved and personal, but gender differences can be detected.
These distinctions may extend into more formal published literature. In a ground-breaking recent study, Koppel, Argamon and Shimoni have used a wide variety of simple lexical and syntactic features to detect significant differences in published fiction and non-fiction texts by male and female authors in the British National Corpus (BNC). Combining document clasification techniques and methods drawn from author attribution, the system is able, once trained on sets of documents where the gender of author is known, to infer the gender of an author of an unseen document with approximately 80% accuracy. The classification system achieved moderately better performance for works of fiction than nonfiction.9 These impressive results are based on the identification of surprisingly simple feature sets.
[W]e begin with a very large set of lexical features that were choses solely on the basis of their being more-or-less topic-independent. The features include a list of 405 function words and a list of n-grams of part-of-speech [...] and punctuation marks. 10The learning algorithm is optimized to reduce necessary features to a relatively small number which the authors broadly characterize male features "as noun specifiers (determiners, numbers, modifiers)" and female features as "mostly negation, pronouns, and certain prepositions."11 Indeed, the authors suggest that very few features are required to identify gender of author, writing,
The extent to which frequencies of a small number of features can be parlayed into effective categorization is illustrated by the following fact: of the 58 documents in which the appears with frequency < 408 and herself appears with frequency > 5, all but two are by females. 12The evident success of text categorization techniques to classify modern literary texts by gender of author suggests that there are gendered practices of writing and that these gendered traditions are grounded in the history of literary culture.
In the early 1990s, I attempted to examine the question of écriture féminine using the ARTFL database, only to be confronted by the very significant gender bias of the Frantext database as it was then constituted, concluding that the sample of texts by women authors (only 3.8% of the titles) was too limited to allow for useful comparisons.13 This limitation led directly to an ongoing effort to digitize a collection of French literary texts by women, ARTFL's French Women Writers Project14 to redress the gender bias of the collection. One of the conclusions drawn from this initial study was that the gender bias in the data used to compile a massive and "definitive" dictionary is itself an important example of one mechanism of how patriarchal language may be propagated and authorized.15
My initial studies of gender representation in early modern and modern French -- based exclusively on male writers describing the feminine -- produced some striking examples of long-term shifts i n the use and meaning of common gender terms, such as femme.*. As show in the following table16 age categorization of women -- the collocation of young/old with forms of femme -- -- becomes one of the most notable patterns only towards the end of the 18th century, reflecting both the rise of the romantic novel and a new politics of desire.
Ranking of jeune and vieille as left collocates of femme.* by standard deviation in the ARTFL database. The smaller the rank, the more significant the collocate. jeune vieille Period Rank (std. dev.) Rank (std. dev.) 1600-49 107 (7.8) 141 (6.5) 1650-99 64 (8.9) 20 (15.6) 1700-49 53 (12.1) 21 (17.0) 1750-99 48 (17.2) 16 (25.1) 1800-49 6 (49.6) 8 (46.7) 1850-99 1 (133.5) 4 (59.0) 1900-49 2 (110.1) 4 (72.3)Equally important are long-term continuities, such as the collocation of femme with possessives, suggesting that the semantic field of the feminine begins with putting her "in her place," being possessed by a male.17 Cixous' observation that "woman has always functioned 'within' the discourse of man" simply because women must express themselves in "the language of men and their grammar"18 is a position that is certainly implied by my initial study and needs to be taken into account when characterizing earlier examples of feminine writing.
It is disconcerting to report that while we have been able to create a modest sample of French women writing from collections held by the ARTFL project and other sources -- to be discussed in the next section -- it would appear that the publishers of the largest French electronic texts collections systematically overlook women writers. The 936 documents of the BASILE collection from Editions Champion number only 32 texts (3.4%) by women authors. Similarly only 3.9% (109 texts) of the Bibliothèque des Lettres collection by Editions Bibliopolis are by women writers, the vast majority (78 texts) of which are found in one collection, Autour du romantisme: le roman, 1792-1886 which are almost exclusively the works of one author, George Sand. While the situation is somewhat better among electronic publishers of English literature 19, it does appear that electronic publishers are still not addressing women's writing, an area where several academic electronic publishing projects20, such as the Brown Women Writers project, have made considerable and laudable progress that should be supported in future.
Samples and Methodology
Given the limitations of available texts by women writers
in ARTFL's holdings, including those texts from external
collaborations, this study is based on the creation of
two samples of 300 texts roughly balanced by genre, collection
and time period driven by texts by French women writers.
The female texts are taken from the following collections.21
TLF/Frantext | 60 texts |
ARTFL FWW | 99 texts |
BASILE (Champion) | 32 texts |
Bibliothèque des Lettres (Bibliopolis) |
109 texts |
Genre | Female | Male |
Conte (Fable) | 30 | 29 |
Correspondance | 15 | 15 |
Histoire* | 17 | 0 |
Melanges litteraires | 18 | 18 |
Memoires | 9 | 8 |
Poesie | 17 | 10 |
Recit de Voyage | 5 | 3 |
Roman | 152 | 155 |
Theatre | 6 | 8 |
Traite ou essai* | 21 | 36 |
All of the documents denoted as Histoire* are texts in the history of science by Helene Metzger, which were balanced by documents classified as Traite ou essai*. It is important to note that genre classifications do vary by data provider and are subject to some disagreement.
The resulting corpora show some additional important skewing. As shown in the following table, the collection is strongly skewed to nineteenth century texts, owing to the predominance of romantic novelists in the available collection of female writers.
1500-1599 | 1600-1699 | 1700-1799 | 1800-1899 | 1900-1960 | Total | |
Female | 363,140 | 2,152,653 | 2,784,015 | 12,055,319 | 1,198,012 | 18,553,139 |
Male | 165,785 | 3,152,021 | 5,529,055 | 15,450,265 | 2,714,009 | 27,011,135 |
Differences in text size may be due to selection criteria of particular projects/publishers in creation of the electronic texts or may reflect the fact that works by women were comparatively shorter (by number of words). Such a generalization must be balanced by the counter example of Madeleine de Scudéry's baroque novel Artamène ou Le Grand Cyrus (1649-53) which spanned some 30 volumes, 13,000 pages, and almost 2 million words.23
More problematic is the huge over-representation of the works of George Sand -- Amandine-Aurore-Lucile Dupin -- in the sample of women writers compared to any other authors. The collection contains some 77 texts (7.44 million words) by Sand of which 71 are novels (6.6 million words). By contrast, there are only 38 (2.7 million words) nineteenth century novels not by Sand. Sand doubtlessly played an important, if long underestimated, role in the evolution of the novel. Sand further questions, in her novels, both the sexual identity and gender destinies of her characters, as much as she did in her own life, a particularly salient element of an investigation of the characteristics of feminine writing. The towering achievements of Sand as the epitome of the successful woman writer in 19th century France makes it natural that collections will tend to focus on her, possibly to the detriment of other worthy authors. Until a more balanced corpus can be established, I have generally tested the impact of Sand's works for any effects. In examples below, Sand is usually reasonably close to other female novelists of the 19th century. I do indicate some distinctions -- "the Sand effect" -- which are described as required.
The following discussion is based on simple methodologies, frequencies of words as rates per 10,000 words, ratios of rates of use by female and male authors, and simple analysis of collocations. As part of this study, I extended the real-time frequency and timeseries reporting available under some implementations of PhiloLogic24 to handle any metadata with full SQL support to generate tables that allow simple comparisons between time periods as well as other criteria. For example, searching the female author timeseries analyzer for the pattern "aim.* ami[acest].* amour.*" (the space acts as an OR) generates a table with each matching word, frequency and rate per 10,000 per selected period, a total column for each word and a final total row that resembles the following extract (many rows deleted).25
Word | 1500-99 | Rate | 1600-99 | Rate | 1700-99 | Rate | 1800-99 | Rate | 1900-60 | Rate | Total | Rate |
aimable | 4 | 0.110 | 1043 | 4.845 | 1037 | 3.724 | 1331 | 1.104 | 73 | 0.609 | 3488 | 1.880 |
aimer | 67 | 1.845 | 727 | 3.377 | 992 | 3.563 | 3523 | 2.922 | 243 | 2.028 | 5552 | 2.992 |
Word Totals |
1819 | 50.090 | 10232 | 47.532 | 12735 | 45.743 | 44270 | 36.722 | 3889 | 32.462 | 72945 | 39.316 |
The analyzer allows for selection of particular date ranges, typically quarter centuries or decades depending on the database. Comparisons between male and female usage are calculated by the ratio of rates of occurrence of words (or total word lists). Thus, the total female rate for this pattern is 39.32, while the male total rate is 23.89, giving a ratio of 1.65. The system supports generation of tables to compare rates of word usage in arbitrary selections of documents. Thus, the pattern "aim.* amour.* is found at a rate 29.8/10000 in the novels by George Sand and 33.5/10000 in 19th century novels by female authors excluding Sand. The collocation analyzer in PhiloLogic functions in real time, counts a user specified span of words to the right and left of a pole word and displays the results as a table of decreasing frequencies of the left, right, and both sides. Results can be filtered to remove function words. In this study, I am using a modified filter file26 which does not include some function words, such as pronouns and possessives, given their importance as suggested by my earlier work and Koppel et. al.
Pronouns and Possessives
Examination of pronouns and possessives is an effective means to
depict the situation of the communicative act, in speech or literary
writing, by establishing the perspective and gender of actors.
In the following table comparing the ratios of rates of
pronouns and some possessives, which are not unambiguously
related to the subject, unlike the case of son and sa, which
can bind to either il or elle.
je j' je m' ma me mes moi mon | 1.54 |
tu toi t' tes ta ton | 1.30 |
il | 1.02 |
elle | 1.53 |
nous nos notre | 1.02 |
vos votre vous | 1.52 |
ils | 0.80 |
elles | 0.99 |
the French lexicon is divided into "masculine" and "feminine" [...] even though the meanings of words in each grammatical gender category cannot be linked to social gender in any general way.27Il and elle may refer to things like hospitals and tables as well as to a man or a woman.
Examination of the chronological breakdowns of pronoun
and possessive use28 does not reveal any long
term patterns in female/male ratios of use, but considerable
amounts of variation between periods. The following table
shows the ratios of use varying from 1.38 to 3.45 between
two centuries, probably reflecting variations in period
styles, genres, or even selection of particular authors.
1500-99 | Rate | 1600-99 | Rate | 1700-99 | Rate | 1800-99 | Rate | 1900-60 | Rate | Total | Rate | |
Female | 12429 | 342.26 | 87453 | 406.26 | 165964 | 596.14 | 495704 | 411.19 | 53325 | 445.11 | 814875 | 439.21 |
Male | 4130 | 249.12 | 37110 | 117.73 | 238366 | 431.11 | 439756 | 284.63 | 48569 | 178.96 | 767931 | 284.30 |
Ratio | 1.37 | 3.45 | 1.38 |   | 1.44 | 2.49 | 1.54 |
As noted above, the distinct chronological sampling biases limit confidence in long term temporal shifts in use rates, but it is useful to note that this distinction, as well as those in rates of use of elle and vous/vos, are marked in all periods.
Not surprisingly, genre as opposed to strict chronology, plays
a stronger role in gendered use of pronoun and selected possessives.
The following table, comparing ratios and rates of use for
novels and all texts but novels (excluding, of course, almost
all of George Sand's works) shows that the female preference
for je/me and vous/vos increases in texts aside from novels while
the female preference for elle stable.
Words | Ratio without Novels |
Ratio with Novels |
Rate without Novels |
Rate with Novels |
je j' je m' ma me mes moi mon | 2.10 | 1.54 | 403.0 | 439.2 |
tu toi t' tes ta ton | 1.05 | 1.30 | 26.96 | 43.55 |
elle | 1.56 | 1.53 | 73.0 | 97.7 |
vous vos votre | 2.59 | 1.52 | 158.3 | 146.95 |
It is also important to note that the female rates of use of je/me and vous/vos (per 10,000) remains relatively stable in novels and non-novels, reflecting a general preference of the personal orientation of writing. By contrast, the ratio of tu/ton/toi declines to near equality. One would be led to suspect that the moderate female preference for the informal and singular you is thus a characteristic of the novelistic form as expressed by women authors. This is, however, is almost certainly a "Sand effect". As shown in the following table, George Sand shows marked preference for the informal you than in comparison other 19th century female novelists as well as all other 19th century female writers.
All George Sand Works | 58.24/10000 |
19th Century novels excluding Sand | 45.31/10000 |
19th Century works excluding Sand | 40.19/10000 |
Comparing rates of these pronouns and possessives in non-fiction
-- trait, essai, or histoire, containing 36 female titles and 38
male titles -- indicates that the more personal preference
persists. The following table shows that while the rates of use
dramatically decline from the more general rates, female/male
ratios indicate the the female preference for je/me and vous/vos
remains marked.
Words | Female Rate | Male Rate | Ratio |
je j' je m' ma me mes moi mon | 100.4 | 64.6 | 1.55 |
vous vos votre | 48.1 | 20.2 | 2.38 | elle | 47.24 | 41.6 | 1.14 |
It is interesting to note that the distinction ratio of female and male use of elle declines to near equivalence, suggesting that the female preference for elle is driven by selection of subjects, an option less available in non-fiction.
Use of selected pronouns and possessives suggest that female authors show a marked preference for a more personal and interactive style over time and across different genres. The selection of elle is also a marked feature over time and for most genres with the exception of non-fiction.
Homme et femme
The female preference for female subjects is also indicated in differential rates of use of homme.* and femme.*. The following table shows that both male and female authors refer to homme and its variants more often than femme.*. For the entire period, women authors, however, refer to homme 1.45 times as frequently as to femme, while male authors refer to homme 1.85 times as frequently as to femme. While women are not invisible in the male and female authors of this period, there clearly are some gendered differences in their appearances.
1500-99 | Rate | 1600-99 | Rate | 1700-99 | Rate | 1800-99 | Rate | 1900-60 | Rate | Total | Rate | |
femme.* | 1033 | 28.45 | 1591 | 7.39 | 3107 | 11.16 | 15798 | 13.1 | 1119 | 9.34 | 22648 | 12.21 |
homme.* | 1213 | 33.4 | 2431 | 11.29 | 4184 | 15.03 | 23534 | 19.52 | 1494 | 12.47 | 32856 | 17.71 |
Ratio | 1.17 | 1.53 | 1.35 |   | 1.49 | 1.36 | 1.45 |
1500-99 | Rate | 1600-99 | Rate | 1700-99 | Rate | 1800-99 | Rate | 1900-60 | Rate | Total | Rate | |
femme.* | 185 | 11.16 | 2412 | 7.65 | 6955 | 12.58 | 21466 | 13.89 | 1372 | 5.05 | 32390 | 11.99 |
homme.* | 306 | 18.46 | 6227 | 19.75 | 12412 | 22.44 | 35623 | 23.06 | 4669 | 17.2 | 59237 | 21.93 |
Ratio | 1.65 | 2.58 | 1.78 |   | 1.66 | 3.4 | 1.83 |
FEMALE femme.* novels=13.06 NOT novels=10.74 MALE femme.* novels=14.96 NOT novels= 7.31 FEMALE homme.* novels=16.83 NOT novels=19.10 MALE homme.* novels=23.03 NOT novels=20.17
Surprisingly, female authors refer to men somewhat less frequently in novels while male authors refer to men slightly more in novels. These distinctions are not due to the works of Sand, since the rate of femme.* in female novelists excluding Sand, some 81 documents, is actually higher, 13.76 per 10,000 words. Thus, male authors are far more likely to discuss women in novels than in other forms of literary writing.
As has been frequently noted,29 French does not have a distinction between woman and wife. Adding husbands to the list of terms does not have a significant impact. Husbands (mari/maris) are found somewhat more frequently in texts by female authors (2.57/10000) than male authors (2.08/10000). But the relatively low frequencies in both male and female authors does not change the markedly different woman/man rates shown above, rising to 2.0 in male authors and 1.66 in female authors.30
In my earlier studies I noted the consistent location of "femme" in a grid of possession (sa, ma, votre, etc.) reflecting dual meaning of femme as woman and wife. The "rates of possession" as depicted in the following table by male and female authors are very similar.
Word | Male | Female |
sa | 3687 | 2701 |
ma | 1434 | 837 |
ses | 290 | 206 |
leurs | 281 | 136 |
ta | 280 | 155 |
votre | 244 | 335 |
Num Possess. | 6216 | 4320 |
Num femme.* | 32428 | 22739 |
Rate (percent) | 19.2% | 19.0% |
Using the same calculation, male and female authors show only minor differences in possession of husbands. Possessives collocate within two words to the left of mari/maris 73.8% (3,525/4,774) of the time in female authors and 67.8% in male writers.
A Global Word Comparison
We have seen that gender differences in style and subject selection can be identified using very simple measures on the most common of words. A global comparison of word frequencies also shows marked distinctions between the writing of male and female authors in this time period. Following methods used in authorship studies which are premised on the examination of highly frequent textual features that may suggest the "unconscious" style of an author, Koppel et. al. based their examination on the frequency of function words and other common features.32 For the purposes of this study, however, comparison of rates of use of some "content words" shows important distinctions between male and female writing in this period which form an important part of the creation of an early feminine writing.
The collection of 300 documents by women writers contain 137,093 types and 18,623,547 tokens while the male comparison collection has 226,576 types and 27,026,833 tokens. To compare the global frequencies, I selected for words greater than rate of 0.9 per 10000 words in collection of female authors comparing the the rate per 10000 of each word to the corresponding entry in the male sample. The general table33 is broken down into three sections, the first showing raw frequencies and rates for male and female usage and ratio where:
word | Female Freq | Female Rate | Male Freq | Male Rate | Ratio of Male/Female rates |
absence | 2234 | 1.19 | 1896 | 0.70 | 1.70 |
adieu | 4340 | 2.33 | 3552 | 1.31 | 1.77 |
affection | 2087 | 1.12 | 1604 | 0.59 | 1.88 |
affreux | 1735 | 0.93 | 1906 | 0.70 | 1.32 |
agréable | 1978 | 1.06 | 1541 | 0.57 | 1.86 |
ai | 56902 | 30.55 | 52952 | 19.59 | 1.55 |
aimable | 3488 | 1.87 | 1940 | 0.71 | 2.60 |
aimait | 2268 | 1.21 | 1927 | 0.71 | 1.70 |
aime | 9992 | 5.36 | 7786 | 2.88 | 1.86 |
aimer | 5552 | 2.98 | 3474 | 1.28 | 2.31 |
aimé | 3288 | 1.76 | 2542 | 0.94 | 1.87 |
aimée | 2040 | 1.09 | 1326 | 0.49 | 2.23 |
aise | 1785 | 0.95 | 1784 | 0.66 | 1.45 |
aller | 8027 | 4.31 | 8415 | 3.11 | 1.38 |
ami | 8550 | 4.59 | 9137 | 3.38 | 1.35 |
amitié | 6063 | 3.25 | 3934 | 1.45 | 2.23 |
amour | 17557 | 9.42 | 18339 | 6.78 | 1.38 |
amoureux | 1859 | 0.99 | 1833 | 0.67 | 1.47 |
While identification of patterns or generalizations regarding such long lists of terms is somewhat impressionistic, we can make a few observations. As noted earlier, the female authors tend to favor a more personal style, indicated by the appearance of pronouns, possessives and common verbs in first person. There is a surprising array of what might be thought of as terms describing emotive, internal or subjective states including:
affection, calme, chagrin, confiance, courage, douceur, douleur, désir, envie, estime, espère, espoir, espérance, joie, imagination, larmes, passion, pense, peur, plaisir, plaire, plaisir, respect, rêve, savoir (and verb forms), sensible, sentiment.*, sentir, solitude, souffrir, souvenir, sérieux, tendresse, trist, tristesse, vain, âme, and émotion.Also notable are interactive terms denoting relations with others, such as ami.*, aim.*, amour.* and kinship terms including mère, père, oncle, maman, enfant, enfants, frère, mariage, parents, soeur, tante, and époux.
By contrast, the list of terms selected at a moderate frequency in the works by female authors that are much more common in texts by male authors include a considerable number of abstractions, such as politique, peuple, religion, sujet, église, france, histoire, homme, justice, forme, ordre, lumiére, nature, and paix. Also notable is larger number of nouns or things, including ville, porte, terre, palais, or, mer, face, feu, garde, guerre, livre(s), eau, and chapitre than is evident in the female authors listing.
Inverting the word selection to start with words that occur with a frequency of 0.9 per 10000 in the male collection, results in an initial list of 901 words. The comparison table34 shows that 157 of these were used significantly more frequently by male authors than by females. By contrast, 268 terms in the female to male comparsion were used significantly more frequently in the women's writing. It appears that words used by male authors more likely used at high rates by female authors than the reverse. In addition to the characteristic words used by male authors mentioned above, this test reveals one additional feature of male writing, the use of numbers: 1, 2, cent, cents, cinq, deux, nombre, quatre, second, seconde, sept, trente, trois, uns and vingt.
This general comparison suggests that male and female writers in the samples may be clearly distinguished by their word selection. It is also suggestive to note that the words favored by male authors tend to be more frequently used by female authors, while this is less frequently the case for words favored by female authors, suggesting that male author vocabulary selection was somewhat more influential than female vocabulary which tended to be more specific to female authors.
Affection and Friends
The general comparison invites a number of more detailed examinations of particular themes, but I will take only one as an example of gender marked selection of vocabulary. As suggested above, female writers in the sample under consideration showed a marked selection for interactive terms denoting friendship, affection, and love.
A single search pattern "aim.* ami[acest].* amour.*" applied to the timeseries tool for each corpus generates lists35 of terms and relative frequencies for many words denoting love and friendship. As noted in the chronological summary table below, female authors generally use these words 1.65 times more frequent than male authors. There is also a marked decline in the rates of female author use, from 50 to 32 per 10,000 words, and a general narrowing of the distinction.
1500-99 | Rate | 1600-99 | Rate | 1700-99 | Rate | 1800-99 | Rate | 1900-60 | Rate | Total | Rate | |
Female | 1819 | 50.09 | 10232 | 47.53 | 12735 | 45.74 | 44270 | 36.72 | 3889 | 32.46 | 72945 | 39.32 |
Male | 232 | 13.99 | 5393 | 17.11 | 16962 | 30.68 | 37561 | 24.31 | 4392 | 16.18 | 64540 | 23.89 |
Ratio | 3.58 | 2.78 | 1.49 |   | 1.51 | 2.01 | 1.65 |
Genre certainly plays an important role in the selection of this set of interactive or affection words. In the following table, we see that non-fiction works (Hist/essai/trait) show a marked decline in rates of use, but also show a slightly stronger ratio between author genders than found in novels. Correspondence sees what would be an expected increase in the use of these words, but this is accompanied by a significant decline in the gender of author ratio.
Genre | Female Rate | Male Rate | Ratio |
Novels | 39.99 | 26.14 | 1.53 |
Hist/essai/trait | 22.96 | 13.14 | 1.75 |
Correspondance | 51.55 | 46.37 | 1.11 |
The distinctions between novels and correspondence may reflect gender marked differences in the public as opposed to more private use of these words. It would be a mistake, however, to read too much into this, since correspondence available in this sample is mainly from the 17th and 18th centuries. Each collection has 15 documents of correspondence, the female sample containing 1.8 million words and male authors represented by 1.3 million words.
The relative selection of this cluster of words (aim.*/ami.*/amour.*) in the 19th century is not a "Sand effect". Indeed, the rate of use of these terms in Sands novels (37.01) is slightly lower than in other 19th century female novelists (42.32) and in all works in the women writers sample in the 19th century excluding Sand (41.33). Frequent use of these terms is spread among many prominent women authors through the period. After Sand in terms of raw frequencies, they are Mme de Sevigne (17th century: 5,102 occurrences), Germaine de Staël (18-19th: 4698), Mme de Genlis (19th: 1999), with the first 20th century author appearing in 8th rank: Simone de Beauvoir (20th: 1458).
Comparing Meanings
We have seen that the male and female writers represented in our sample use many function and content words at different rates across many genres and time periods. It is most neatly summed up by the interesting example of body and soul. The female writers use âme 1.62 times as frequently as men, but use corps -- which signifies the physical body and a "political body" -- only half as often as male writers. Such differential rates of word use does not seem to entail that individual words or groups of words representing a theme or concept are given different meanings by male and female writers.
One gross measure of possible gender of author related shifts in meaning of particular words or themes is collocation, or looking at the words which cluster around a keyword, typically within a defined span of words to the left and right of the word or words in question. Remaining with the example of "aim.* ami[acest].* amour.*", I used the PhiloLogic collocation function to generate a simple frequency listing a 5 word (left and right) collocation table for of the male and female authors under consideration. These tables filter most function words, except for personal pronouns and possessives, and rank the collocates by raw frequency. The female author table is based on 73,086 occurrences and the male database has 64,602 occurrences of our example pattern.
The most striking feature about the two tables36 is how similar they are. Only 8 of the top 100 collocates in the female author table are not in the top 100 for male authors, from rank 80 down:
As noted above, in my earlier study I argued that the increasing collocation of young and old (jeune and vielle) with femme.* from the 17th to the 20th centuries may have reflected shifts in genre and developments of a literary politics of desire. As with our previous example, the appears to be very little distinction in the representation of femme.* as expressed in collocation tables between male and female authors. In this case, I took simple 2 word (left and right) span collocation, filtering function words, except for personal pronouns and possessives, ranked by frequency. Again, there is a surprising degree of agreement, by rank and number in the top 100 collocates in each table.37 The following table, listing the top 20 adjectives in the two word left collocates of femme.*, suggests the degree of similarity between the two tables.
Word | Female Rank | Female Freq | Male Rank | Male Freq |
jeune | 1 | 457 | 1 | 819 |
pauvre | 2 | 397 | 2 | 505 |
vielle | 3 | 228 | 3 | 364 |
bonne | 4 | 199 | 4 | 287 |
belle | 5 | 138 | 7 | 168 |
petite | 6 | 120 | 6 | 187 |
jolie | 7 | 112 | 5 | 196 |
jeunes | 8 | 102 | 12 | 94 |
seule | 9 | 82 | 13 | 80 |
première | 10 | 66 | 14 | 80 |
jolies | 11 | 60 | 11 | 107 |
honnête | 12 | 66 | 8 | 132 |
excellente | 13 | 54 | 21 | 41 |
vieilles | 14 | 53 | 9 | 124 |
pauvres | 15 | 49 | 16 | 71 |
grande | 16 | 38 | 17 | 53 |
belles | 17 | 34 | 15 | 72 |
aimable | 18 | 32 | -- | -- |
noble | 19 | 26 | 26 | 27 |
méchante | 20 | 26 | 27 | 26 |
Indeed, the only adjective not appearing in the male author collocation list, aimable, is found 12 times within the 2 words of femme.*. This finding is due to the lower frequency of aimable in the male author sample.
From these two examples, it appears that the differences in frequency of use of sets of terms by male and female authors is not translated into different meanings being assigned to these words. It also appears that the general representation of women, as indicated by the adjectives applied to femme.*, does not vary significantly between male and female authors.
Caveats and Findings
Based on the available sample of women's writings in French and the relatively simple methodologies employed in this examination, it appears that there are clear distinctions between male and female published literary writing in France from the 16th to the 20th centuries. The differences are clearly identifiable in the selection of words used in the two samples, as reflected in use of both pronouns and possessives (function words) and less frequent content words. The differential rates of use of words are not reflected in different collocations -- a rough measure of the meanings of these words -- of selected themes, suggesting that the meaning of words, or at least some words, are not significantly altered by gender of writer. Women's writing of the period is generally characterized by a more personal, interactive, and involved style. To judge by the available sample, women's writing is considerably more descriptive of internal and emotional states than male writing of the same period. Finally, comparing the representation of women, by rates of "possession" and frequent adjectives, which may have been expected, did not result in significant differences by gender of author.
These findings, which are more suggestive than conclusive, point to several limitations in the current study which need to be addressed in order to be able to describe a proactive of women's writing in French. Most notably, it is clear that systematic comparisons must be based on far more representative samples of women's writing, with particular emphasis on earlier periods and in a wider array of literary genres. With more and better data, it also appears that more refinement of quantitative approaches is also required. We need methods that facilitate global comparisons of large frequency lists by the introduction of better control of semantic fields as well as more refined collocation measures, particularly using word root forms rather than surface forms.
The selection of words, reflecting the selection of topics and treatments of topics, is not an unconscious effort. Women's writings of this period established a space in which women could write, in ways identifiably distinct from male authors, but within prevailing language norms. Prevailing linguistic norms do contain encoded sets of power relations. "Language," Cixous writes, "conceals an invincible adversary, because it's the language of men and their grammar."38 But to return to Marie de Gournay, women can and did use men's words "against" men in order to write. Thus, rather than depict language as an invincible adversary of women's writings, it would appear that women writing in French between the 16th to the 20th centuries did use language in ways that can be readily identified using relatively simple means to create the basis for future women's writing.
1. Hélène Cixous, "The Laugh of the Medusa" in Signs: Journal of Women in Culture and Society 1:4 (1976), p. 878.
2. Cixous, p. 883.
3. Michelle Kendrick, "The Laugh of the Modem: Interactive Technologies and l'ecriture feminine" in Rhizomes, Spring 2002, http://www.rhizomes.net/issue4/kendrick.html
4. Margaret Hill and Albert Rabail, "The Other Voice in Europe: Introduction to the Series", in Richard Hillman and Colette Quesnel, eds., Apology for the Woman Writing and Other Works (Chicago, 2002). They note that from fourteenth century on, :the volume of women's writings crescendoed" and that it may generally characterized as a development from less formal genres of writing to more formal publication.
5. Marie le Jars de Gournay, "The Ladies Complaint", Edited and Translated by Richard Hillman and Colette Quesnel in Apology for the Woman Writing and Other Works (Chicago, 2002), p. 105.
6. For a general overview of this wide ranging field see, Penelope Eckert and Sally McConnell-Ginet, Language and Gender (Cambridge, 2003). Earlier works by Deborah Tannen, You just don't understand: Women and men in conversation (Ballantine, 1990) and Gender and discourse (Oxford, 1994) are instructive.
7. Minna Palander-Collin, "Male and female styles in 17th century correspondence: I THINK", in Language Variation and Change, 11 (1999): 123-141.
8. p. 129
9. Moshe Koppel, Shlomo Argamon, and Amat Rachel Shimoni, "Automatically Categorizing Written Texts by Author Gender", Literary and Linguistic Computing 17:4 (2002): 401-12.
10. p. 404.
11. p. 408.
12. p. 408-9.
13. See Mark Olsen, "Gender representation and histoire des mentalités:Language and Power in the Trésor de la langue française," in Histoire et measure VI (1991): 349-73 and "Quantitative Linguistics and Histoire des mentalités: Gender Representation in the Trésor de la langue française,, in R. Köhler and B. B. Rieger (eds.), Contributions to Quantitative Linguistics (Kluwer, 1993): 361-381.
14. For a description of the French Women Writers project, see http://www.lib.uchicago.edu/efts/ARTFL/projects/FWW/.
15. Olsen, "Gender Representation", p. 369-70.
16. p. 364
17. See also Tuija Pulkkinen, "The History of Gender Concepts: The Concept of Woman", in History of Concepts Newsletter, 5 (2002): 2-5.
18. Cixous, p. 887.
19. Some commercial e-text publishers in English are specializing in women writing, such as Alexander Street Press with such titles as North American Women's Letters and Diaries.
20. See www.lib.uchicago.edu/e/ets/efts/Women.html for a partial list of current women writing projects and products.
21. Consult http://www.lib.uchicago.edu/efts/ARTFL/databases/champion/basile/ for information on the BASILE collection by Editions Champion and http://www.lib.uchicago.edu/efts/ARTFL/databases/bibliopolis/ for information concerning La Bibliothèque des lettres by Editions Bibliopolis.
22. In addition to tables found in this paper, there are a number of additional tables, too large to be suitable for print publication all of which are found on the WWW at:
23. This text is not included in this study as it was unavailable at the time when I was assembling the collections. Please consult http://www.artamene.org/ for further information on this project.
24. See http://www.lib.uchicago.edu/efts/ARTFL/philologic/
25. Full table available at: http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/fem.ami-amour. time.html
26. The collocate word filter list is available at: http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/cluster.filter.wrds
27. Eckert, p. 67.
28. Full table: http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/pron-poss.html
29. See for example, Pulkkinen's discussion, pp. 3-5.
30. Full table: http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/homme-mari.femme.html
31. Full collocation tables are available at http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/female.femme.col.html for Female Authors and http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/male.femme.col.html for Male Authors.
32. Koppel et. al., p. 402.
33. The full Female to Male Author Comparison Table is available at http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/general.comp.html
34. The full Male to Female Author Comparison Table is available at http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/male.global.html
35. Complete comparison tables are available at http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/fem.ami-amour.time.html for female authors and http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/hom.ami-amour.time.html for male authors
36. Complete collocation tables for "aim.* ami[acest].* amour.*" are available for female authors at http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/fem.aim-amour.col.html and male authors at http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/hom.aim-amour.col.html.
37. Complete collocation tables for femme.* are available at http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/female.femme.col.html for Female Authors and http://www.lib.uchicago.edu/efts/ARTFL/mark/papers/ACH2003/tables/male.femme.col.html for Male Authors.
38. Cixous, p.887