Women’s Underrepresentation

in the news quotations

“Gender bias does worldwide damage. It’s a cause of low productivity on farms. It’s a source of poverty and disease. It’s at the core of social customs that keep women down.”



--Melinda Gates
The Moment of Lift: How Empowering Women Changes the World

Gender bias: What can we find from quotations?

Gender bias in the news media has gotten a lot of attention lately, and the variations in news coverage caused by gender have been studied from a variety of angles. However, quoting, which is common in all forms of news coverage, has received insufficient attention. Quoted material in the news helps us to look at gender bias in the news from a new perspective because quotes are straightforward, direct, and accurate expressions of the speaker's point of view.

As a result, we ask a new set of questions: Is it true that male speakers speak "louder" than female speakers? Is this caused by news coverage or by the gender itself, if it is true? Is it also possible to derive the portraits of each gender from quotations?

In short, our data story examines not just the preferences of different news outlets when quoting speakers of various genders, but also the impact of various political parties and countries on gender prejudice. In addition, we investigate gender prejudice in self-defined themes and at the linguistic level so as to better understand why bias appears in quotations.

Dataset Overview

Our research is based on the Quotebank[1] dataset, which covers the period from January 1, 2015 through April 30, 2020. Following the wrangling of the original dataset, we now have:

2,301,973

different speakers.

74,377,416

different quotations.

288,772,111

occurrence of quotations.

12,715

different news domains.

3,064

different speakers' party affiliations.

579

different speakers' nationalities.

Because there were a tiny number of sexual minorities in the sample, we categorized them as "others" and limited our study to binary gender representations.

1. Men’s voice is three times louder than women!

From January 1, 2015, through April 30, 2020, we show three types of gender percentages. (The total number of quotations that appear in the media is referred to as the Occurrence. The number of different speakers in the dataset is referred to as Speaker. And the number of different quotations is represented by the term Quotation.)

At first glance, it's easy to notice that over 80% of the speakers quoted by sources are men, which is consistent with past gender bias research[2] [3] [4]. You may be persuaded to assume that there are many more males than females on the planet; however, this is incorrect for the entire world's population, or even for citizens of the United States (for both the gender ratio is about 1:1). Interestingly, the percentage of women speakers is around 5% greater than the other two forms of aggregations, implying that despite accounting for about a quarter of all speakers, news outlets mentioned them less. We are still far from gender equality, as evidenced by women's underrepresentation in society and in news articles.

To be more specific, we show the top ten most quoted speakers each year from 2015 to 2019. As we can observe (unsurprisingly), men have dominated the news quotations: Men made up the majority of the top ten quotes.


“According to the Interparliamentary Union, 77 per cent of the world’s parliamentarians are male, and only two out of 193 parliaments (in Rwanda and Bolivia) comprise at least 50 per cent of women.” [5]

The majority of the data in this dataset (QuoteBank) comes from the news media, where politics is constantly a hot topic. Women, on the other hand, do not hold a dominant position in politics in general. As a result, it's plausible to conclude that the majority of the Top 10 speakers are men, and the number of female quotations is significantly lower than the number of male quotations.

Is the bias still present if we control for possible causation? Let's take a look at Hillary Clinton and Donald Trump. In 2015, Donald Trump's number of quotations exceeded Hillary Clinton's by nearly two to one. However, because they were both candidates in the 2016 US presidential election, the difference in their social media exposure and influence (or even the actual vote outcomes) cannot have such a significant impact on the quotation amount. The only probable source of such bias could be news organizations and perhaps society.

2. Insights on gender bias in media sources, nationalities, and partisanships

There needs to be a fundamental shift in the way societies view women in government, one that does not see them as mere seat-fillers or stats on a chart, they must be viewed as a vital contributing factor to the betterment of the world.

Aysha Taryam

When it comes to providing background or analysis for reports, journalists have a lot of leeway, and the people who supply it are overwhelmingly male. This is possibly the most direct driver of gender bias in quotations. To verify our conjecture, we choose seven of the most significant global news agencies and study their gender preferences and evolving trends in quotes to perform a more detailed examination of gender bias in news quotations.

In addition, we look into whether the speaker's nationality and political party affiliation introduce gender bias in the quotation. In other words, whether a woman is in a different nation or party has an impact on her ability to speak out.

Females are in the disadvantaged situation generally in news outlets

“But the truth — we are reminded every time we try to quote female experts — is that the gender balance of our articles is only the final step in a process of gender discrimination that begins long before we pick up a phone to begin reporting. We’ve learned to see our role as journalists as important, but also as just the most visible component of a vast social machinery that equates expertise with maleness.”

Amanda Taub and Max Fisher

Many media outlets have recognized the problem of gender inequality in reporting in recent years and have begun to address it (e.g. BBC's 50:50 program[6]), and we have made some progress. According to research, while women do not feature in the news as frequently as men, the percentage of women in the news increased from 2015 to 2020. However, it is still insufficient.


Female speakers are always at a disadvantage place in a macro level (regardless of the news service). Hence, the percentage differences among news outlets are not substantial (highly consistent trends). This tendency leads us to believe that media preferences may have a minor impact on the formation of gender bias. They are merely a genuine depiction of society's societal disregard of women, as New York Times journalists Amanda Taub and Max Fisher put it.

The bias is also obvious (and terrible) among nationalities

We group speakers based on their nationality in this part and select the Top 7 countries with the largest quotation quantity. In addition, to investigate the impact of a country's development level on this prejudice, we choose India as a representative for developing countries.


What impact do female leaders have on the voices of women? Is it going to be "louder"? Britain can give supporting evidence to answer our inquiry between mid-2016 and early-2019. From mid-2016 to early 2019, when Theresa May became Prime Minister of the United Kingdom, the percentage of female quotations in England had climbed to 23.76 percent. Thus, compared to male leaders, female leaders have a larger influence on reducing existing gender bias.


Nation Gender Inequality Index Rank
France 0.049 8
Canada 0.080 19
Germany 0.084 20
Australia 0.097 25
UK 0.118 31
USA 0.204 46
India 0.488 123

What about the stage of development? The table of Gender Inequality Index[7] in 2020 is shown on the left. Despite that India's Inequality Index (0.488, rank 123) is much higher than that of other developed countries in the data (e.g. 0.204, rank 46 in the United States and 0.049, rank 8 in France), there has been no significant progress or improvement in the percentage of women's voices, according to the UN Development Programme's Gender Inequality Index. As a result, we can deduce that a country's degree of development has no discernible impact on lowering gender bias.

Partisanship: Something different?

Compared to news outlets and nationalities, the gender bias differs more among various parties in this scenario. The Republican Party, for example, has a clear male bias (88.35 percent), whereas the Democratic Party has 34.03 percent female speakers. This observation could be explained by the Democratic Party's goals.

“We are committed to ensuring full equality for women. Democrats will fight to end gender discrimination in the areas of education, employment, health care, or any other sphere. We will combat biases across economic, political, and social life that hold women back and limit their opportunities.” [8]

We know that the Democratic Party favors female candidates and defends the legitimate rights of minorities (females, ethnic minorities, etc.), implying that it encourages women to speak up. It also highlights the potential for a feminist divide between the Democratic and Republican parties. This is also supported by our statistical findings: among the Democratic Party's most-quoted female speakers are Nancy Pelosi, Elizabeth Warren, Alexandria Ocasio-Cortez, and Kamala Harris.

3. Do topics reveal potential gender bias? How?

A gender-equal society would be one where the word ‘gender’ does not exist: where everyone can be themselves.

Gloria Steinem

Obviously, the terms used by males and females may have varying levels of exposure under different news topics. It makes sense to at the gender ratio of quoted speakers under various topics in order to conduct a more detailed analysis. To extract related quotations from the data, we manually select 9 common news topics and define keywords for each of these topics. In addition, we look at the sentiment of statements in three ways (Positive, Negative, and Neutral) to see how they differ between men and women on various issues.

Women seem to be sensual and benevolent, men seem to be aspiring and sports-focused

We begin by plotting the percentage of women speakers by month for each topic. As shown on the vertical axis of the chart, these themes encompass a wide range of domains commonly appeared in the news. As can be seen, news media treat male and female voices differently depending on the subject.

For women, quotes related to education, health, lifestyle and entertainment are more prevalent than the average level. Whereas in politics, people, business and sports, women percentages are below the global average as shown in the three pie graphs. To our surprise, under the topic of gender, the percentage of women speaking out is at its highest level, but their voices are just almost equal to men's (still less).

Looking at the topics with a mostly masculine quote count, it is easy to see that these fields tend to be dominated by men. Take sports for example, according to the Forbes 2021 ranking of the top 50 highest-paid athletes, the highest-ranked woman: Naomi Osaka, is only 12th. And its annual revenue of $60 million is only 1/3 of the premier sports star Conor McGregor[9].

To summarize briefly, female quotation sources are more often in caregiving roles whereas male quotations are closely related to sporting and business fields. This reflects the position of women as caregivers but men as sports mania and breadwinners.

Sentiment Analysis for specific Topics

Let’s first look at gender. You can notice that for certain painful topics, such as "abortion" and "sexual harassment", men and women tend to express a negative feeling. This may mean that these social problems have not been significantly improved or resolved in society.

How about entertainment? Interestingly, women tend to be positive about "art", while men are not (40.53% are negative). But when it comes to "music", only 9.46% of men are negative. It seems that males prefer "music" more than "art".

The most popular keyword in sports for males is “football”, extremely larger than others, especially for “swimming”, which is consistent with our normal perceptions. When we talk about football games, the first image should be that of male athletes running on the court. For women, they prefer “tennis” more (even greater than such a common and popular topic, “football”). Put another way, football is considered more "masculine" than tennis.

In our conclusion, for certain specific topics, such as "art" and "tennis", we can indeed see a noticeable difference. However, not what one would expect, for the majority of common subjects men and women show a similar attitude distribution.

4. Detecting the linguistic differences between men and women in quotation

It is difficult for a woman to define her feelings in language which is chiefly made by men to express theirs.

Thomas Hardy

We already know that men and women tend to display dissimilar preferences towards different subjects (from heat map and sentiment analysis). Now we ask ourselves: can we predict gender (i.e. male or female) when we know the quotations?

If the quotation style discrepancy between males and females is not as significant as we expected. Then the model we developed would be equivalent to a random assumption (we cannot infer gender from what they say), i.e. 50%. But if the model works better than 50%. it indicates that there is indeed some bias in quotes made by different genders, which is therefore identified by the model as a discriminatory feature.

How do we design such a machine learning model?

Model Designing

A logistic regression model is chosen for this is a binary-classification task. And we label females as 0, males as 1. Hence, large parameters indicate the word is male-oriented.

Data Selection

We randomly select 1 million quotations for each gender sequentially from 2015 to 2020, constituting a balanced training data of 12 million quotations.

Data Processing

We use the bag-of-words model and TF-IDF to convert the quotation into vectors and select the top 2,000 most frequent words as features to make predictions.

How can we infer the gender bias from the weights in the model?

In our task, R-Square is 59.65%. As discussed above, this model does have some capacity for interpretation. Let us look at these most predictable words and see what we get.

The figure below shows the top 50 most predictable words for males and females respectively. These words reflect how some topics are more likely to appear in an overview given the gender. Here we can see that the representative words for male quotes are dominated by sports and business-related words, which is in line with our previous finds in Heatmap. Therefore, it confirms our observation: men are portraited as "ambitious" and "active" while women are more related to family.

Top 50 predictable words for male

Top 50 predictable words for female

In fact, in a long time period, these areas (e.g. business, sports, politics) tend to be governed by men, where females are already outnumbered, let alone the reflection in quotes. For women, these are more their own characteristics and life-like words, which is also consistent with our earlier analysis. It mirrors the reality of women in the social division of labour: to be a good wife, a good mother, and to concentrate on life rather than dominate a particular industry.

5. There is a long way to go towards gender equality!

I raise up my voice — not so that I can shout, but so that those without a voice can be heard... We cannot all succeed when half of us are held back.

Malala Yousafzai

A lot of people may think nowadays women's social status is on par with men's, even if not equal to, only slightly lower than men's. However, through the analysis of quotations, we have proof of the belief that gender bias does not only exist only in society, but also in gender itself. In general, gender bias in the quotation can be summarized in the following three areas:

Overall, the sound of women is much weaker than that of men, although small progress does exist.

The factors causing gender bias are not as significant as we might expect from either news outlets or nations, problems hidden in the whole world (not occurring in just one country) might be the most possible reason.

The sentiment distributions are similar between men and women in most topics. Nevertheless, at the social level, people expect some words to be gender specific implictly or explicitly. Females are anticipated to be a “home-carer” for children and husbands.

Gender bias still exists now, in retrospect. It is not only the responsibility of journalists to eliminate gender bias in quotations, but it is also the obligation of each of us to take it seriously so that women could have more diverse identities and other possibilities. As a result, women's voices can be quoted in the news alongside men's voices.