Numlock Sunday: Francesca Tripodi on Wikipedia's gender gap
By Walt Hickey
Welcome to the Numlock Sunday edition.
This week, I spoke to Francesca Tripodi, a senior researcher at the Center for Information Technology and Public Life at UNC Chapel Hill who wrote “Ms. Categorized: Gender, notability, and inequality on Wikipedia” which was published in the journal New Media and Society. Here's what I wrote about it:
A new study of Wikipedia found that the number of biographies on the site of women rose from 16.83 percent in 2017 to 18.25 percent in 2020. However, the percentage of biographies about women nominated for deletion in a given month was reliably above 25 percent. For instance, in April 2017, when 16.93 percent of biographies on Wikipedia were of women, fully 41 percent of the biographies nominated for deletion were of women. While today only 19 percent of biographical articles are about women, still, reliably a quarter of all such biographical articles nominated for deletion are about women.
Tripodi is an assistant professor in the School of Information and Library Science who researches the ways that people use digital platforms in unintended and unexpected ways. For this study, she spent years interviewing and researching Wikipedia’s gender gap, exploring how the flagged for deletion systems makes it harder to get more women on the internet’s encyclopedia.
Can you tell me a little bit about the research that you do?
My research focuses on two kinds of different things. Largely speaking, I look at what I refer to as socio-technical vulnerabilities, so how platforms are used by communities or groups in ways that programmers do not anticipate or intend. I apply that focus to a variety of platforms, so in the past, I've looked at now defunct apps like Yik Yak. I do a lot of work on inequality in Wikipedia, and I also look at how search engines are manipulated for political gain.
You have this new study of Wikipedia out that kind of looks at the number of biographies on the site that are of women. You wrote that they rose from 16.8 percent in 2017 to 18.25 percent in 2020. Obviously, that's still vastly under-representative. But then there's this other component of this that women accounted for a disproportionate number of the biographies that were nominated for deletion. Can you tell me a little bit about this study and what you looked into?
The study took place over many, many years, and it started with ethnographic work, which ethnography is a qualitative method of inquiry where you are doing participant observation, you’re sort of a fly on the wall at these different events. So I started doing ethnographic observations of edit-a-thons. In particular, I was focused on edit-a-thons that were trying to close the gender gap because a lot of work has gone into showing that there's a gender gap in a couple of ways. One, in terms of content, that not many articles are about women or women's interests. Then you also have the editorial gender gap in which most editors are men. So, these edit-a-thons are designed to encourage women to edit Wikipedia, and also are designed to improve the gender gap of representation on the website. I started with ethnographic observations of edit-a-thons. While I was at these edit-a-thons, I also met with people who were new editors and long time veteran editors, and I asked to do individual one-on-one interviews with them.
Then in my interviews, it was becoming really clear that a lot of people were frustrated with the process of trying to add women to Wikipedia. Not only because the notability criteria is really difficult for women to meet — and we can talk about that separately, but a lot of studies have looked at that — but also many people were telling me that articles about notable women were being flagged for deletion and nominated for deletion. I sought to try and test that hypothesis, the central research question being, "Well, are men who are notable also being nominated for deletion at the same rate? Or is this something more specific to gender?"
So for that, I partnered with an amazing data scientist whose name was Eric Rochester through the Scholars' Lab at University of Virginia. I received the dissertation improvement grant through UVA, and Eric wrote a script to scrape all articles for deletion and then filter articles for deletion based on biography. Then I worked with undergraduate and graduate research assistants to clean up that data set. Essentially, he was able to grab for me a ridiculously large CSV file, about 22,000 biographies that were nominated for deletion from January 2017 through February 2020. Then my research assistants and I cleaned up that data set, which basically meant going to the articles for deletion page, looking at the decision rendered, determining the gender of the subject, which was done by pronoun use and then running descriptive statistics on that data set afterwards.
It's such a comprehensive thing. I love how in the actual text of the study, you talked about the specific case of Lois K. Alexander Lane. I think that we're talking about kind of higher level statistics, but this seems like a really good example of a situation you were trying to write about. Do you maybe want to explain what happened to Ms. Lane?
What's interesting is there are a lot of observations that I saw at edit-a-thons. This was not one that I witnessed, I was not at this edit-a-thon. I found out about Lois K. Alexander Lane from an interview with a young woman who was interested in fashion, who was telling me, "It's unbelievable that Lois Alexander Lane doesn't have a Wikipedia page." When I started my ethnographic observations way back, five years ago, she didn't have a Wikipedia page. Then as my study was unfolding I saw that a Wikipedia page for her was created, but when you go through the history of this, you can see through data matching it was created at an edit-a-thon.
So, there's another level of bias documented at edit-a-thons, where anecdotally people say if you try to submit an article through Wikipedia through the submission process, it can be more difficult than just making the page active. Often editors will encourage people to skip going through the criteria for inclusion and submitting an article for inclusion, to just make it an active page.
During this edit-a-thon, this editor had made this an active page. While they were editing, somebody took that page and put it back into their sandbox. So, essentially it made it go from main page to draft space, with a note saying "The proper protocol for creating a new page is to submit it for inclusion." So, this editor submits it for inclusion and less than six minutes later it's declined for inclusion citing that this person is not yet notable for inclusion on Wikipedia. Now this person had already passed away, so establishing notability for living subjects is more difficult than establishing notability for deceased subjects. Because someone who's deceased has either established the threshold of notability or, quite frankly, can't any longer.
So, this person had already passed away and this person had developed fashion museums on their own in New York, and all of these archives were going to the Smithsonian. The Smithsonian had written a blog about it and the Washington Post had written an obituary about this person. Typically if the Washington Post is writing up an obituary on somebody, then they are considered notable, right? The Washington Post doesn't just write up obituaries for anybody.
Her notability had been very clearly established, and it was saying that she had yet to establish notability. So, this is just kind of one example of many that I was seeing in ethnographic observations or heard directly during my interviews. That's what got me thinking, "Okay, we need to look at this bigger. This is going to be a bigger project than qualitative methods can handle. We're going to have to test this."
The exploratory part was the ethnographic, where I really was able to understand that this was happening. In current datasets and in current research projects people were talking about, "Oh, it's really difficult for women to get a page because the notability criteria is biased against women." But this was something different, right? This was saying, "Well, even women who are meeting the threshold for inclusion are still being categorized as non-notable, even if they're meeting these criteria." That's what I sought to explore using quantitative methods then.
So, the quantitative methods are really stark. Again, like I mentioned earlier the 19 percent statistic. But you have these charts in this post that are actually, they're just flat. They're not even improving. Can you explain a little bit about what you found?
Yeah, actually I'm really excited, I made this part of the Creative Commons. So, now these charts are on the Gender Bias on Wikipedia, Wikipedia page.
Oh, wow. That's so cool.
Which makes me really excited because that was kind of the whole point to begin with. But so the chart that you're referring to, the sad part is that this chart doesn't even show the progress that organizations are making. So, there's really incredible organizations, like Art+Feminism, or Glam, or Women in Red that are just volunteering hundreds of hours trying to close those gender gaps. Through their extreme devotion we're making these very small incremental changes. The data that I was able to get that looks at the percentage of biographies available on English Wikipedia comes from Women in Red's revision history, because they are constantly tracking what the percentage of articles are about women because they're really trying to improve that statistic. So, from 2017 through 2020, it rose. I mean, yeah, it looks flat but there was this small little incremental rise from 16.83 to 18.25 percent from January 2017 through February 2020.
But what made me frustrated is the percentage of biographies about women nominated for deletion. If you're just looking at biographies nominated for deletion, if there wasn't this gender inequality on the site, then the percentage of biographies nominated for deletion each month, the proportions would be relatively equal. Right?
We see January 2017 all right, only 16.83 percent of biographies about women exist on the site, so you would imagine January 2017, roughly 17 percent. But they're continuously over 25 percent. So, even though they make up less than 19 percent, they are routinely a quarter of articles nominated for deletion each month.
You wrote about how this had a really tangible effect. You cited an incident in February 2018 where because of the efforts of Women in Red, the percentage of biographies got up to like 17.9 percent, but then a bunch of them were deleted from Wiki-data dropping it. It took months, you wrote to get it back up. It just seems like it is a ton of effort being expended, but also it's oftentimes just being nullified by some of the more established figures.
Sure. Sometimes it definitely is one step forward, three steps back. Right? Where you have Women in Red doing tremendous amount of volunteerism, rising this percentage of biographies to 17.9 percent. Yeah, it took almost a year and a half, longer than a year and a half, it took them 17 months to get that percentage of women back up to 17.9 percent. There's so few to begin with, just a few mistaken deletions have a really, really big effect.
Just to kind of bring this home, is there anything that folks can be doing to help combat this? I know you wrote a little bit about the Donna Strickland effect. Is there anything looking to move the needle, or ways that folks can get involved?
I think if people want to get involved in this issue, there's a couple of ways they can. One of them, I highly encourage editing Wikipedia. I don't myself edit because everything is so transparent, I was worried if I was editing pages that it would somehow out people who had participated in my study, or who were in the study. What I really wanted to do was less say, "Hey, there are a few people that are problematic" and more say, "I think this is systemic, we're looking at widespread women just aren't seen as valuable as men."
I think if people are trying to close this gender gap, look into these organizations like Art+Feminism and Women in Red. They are filled with really devoted editors who are so excited about editing Wikipedia, and want to help you learn how to edit, and have resources for pages that are missing that you can add to. They often partner with organizations like the Smithsonian or local libraries so they can get you those sources you need to establish that notability, so I think that's really important.
The other thing that I think people need to realize is that this deletion process does happen, so if they want to make sure that notable women persist, it's important that they star their pages, that they keep tabs on it, that they don't just log into an edit-a-thon, and never log in again. I would just say, know what you're getting into before you get started, and recognize that it's going to be more than just one day of volunteerism, but that your contribution is important and your contribution matters. The other thing I would say is within the culture of deletionism — so deletionism is an important part of Wikipedia and does hold value to some extent — there is a lot of vandalism that occurs on Wikipedia. People coming through Wikipedia to make sure that vandalism doesn't happen and deleting that content, that's really important.
But when it comes to this idea of notability, I think snap judgments that people are making regarding notability need to perhaps be made a little bit more carefully. I do bring this up in my dataset, right? When there was all this news coverage surrounding Donna Strickland not having a Wikipedia page, all of a sudden you see, I can't reject my null hypothesis. Right?
All of a sudden everything's equal. Now that wasn't statistically significant and it didn't last for very long, but I think that it is possible for this equitable treatment of subjects to happen, but I think it only happens when it's at the forefront of people's minds. So, to me that means either there are a few people that are targeting women, or probably more likely there are these implicit biases that women are just not as worthy of inclusion.
Anything else you want to get in there?
The one thing that I think is super important is that this is more important than Wikipedia. Something that I've been thinking about, for example, not only do we rely a lot on Wikipedia, but Google relies on Wikipedia, Alexa relies on Wikipedia, Siri relies on Wikipedia. Most notably, I've just been noticing people's ability to establish that blue check mark on Twitter is connected to Wikipedia. If you go to How to Get Verified, you'll see that in order to get verified journalists and business leaders need to have a Wikipedia page. That directly relates to my research because if women are less likely to be seen as notable, more likely to be removed from Wikipedia or denied entry to Wikipedia, then that also means we're less likely to get that coveted blue check mark. Just really recognizing how many other information systems are dependent on Wikipedia and that this is bigger than just inequality on the site.
Got it. That's a really, really great point. Where can folks find you and where can folks find your work?
I publish a lot on Twitter. I put a lot of things on Twitter, so I'm always loving if people want to follow me there. I'm at @ftripodi, that's where I share all of my new publications. I have my website, which is ftpripodi.com, and I link my publications to that. I try to write pretty frequently for public audiences, so I'll share those things there. Then I also love Twitter because I can amplify other people's works that are tangentially related to the concepts that I'm studying.
If you have anything you’d like to see in this Sunday special, shoot me an email. Comment below! Thanks for reading, and thanks so much for supporting Numlock.