Twitter User Pronouns
Exploratory Data Analysis

Summary

Analysis of approximately 158,000 unique Twitter profiles, exploring preferred personal pronouns of users by extracting pronouns from user-defined profile text.

Overview of findings:

12,467

Twitter users with preferred personal pronouns identified

30.7%

of Twitter users with identified pronouns use gender non-conforming pronouns

3.8%

of Twitter users with gender non-conforming pronouns use neopronouns

Data Collection

The dataset explored here is selection of around 220,000 individual tweets collected via the Twitter API along with the user’s display username and bio as text fields. The tweets came from 157,625 unique accounts, and were collected in batches of 100 over a two-day period by searching for tweets containing common English words found in the NLTK English stopwords list, excluding any personal pronouns included in the analysis to avoid bias.

While Twitter profile information is publicly accessible data, when combined with the methods used to classify profiles based on the presence of pronouns, it provides a level of searchability outside the scope of the usual Twitter search functions. I’ve made the decision not to share the full dataset due to ethical and safety concerns, so this analysis explores aggregated results with no uniquely identifiable personal information.

Data

  • User ID
  • User display name
  • User bio text
  • Text content of tweet

Assumptions

  • Bios using singular pronouns are from accounts representing an individual human rather than a brand.
  • Pronouns identified are this person’s genuine preferred pronouns

Extracting pronouns from bio and username text

Despite announcements in May 2021 and further speculation in March 2022, at the time of writing in July 2022, Twitter has not yet provided the inclusive feature which have been available for over a year now on platforms like Instagram and LinkedIn. Without a dedicated space for users to add their preferred pronouns to their profile, the text from users’ bios and usernames was scraped and analysed to identify words and patterns that match those commonly used when expressing personal pronouns.

A list of common English pronouns was compiled, along with a list of common neopronouns from the LGBTQIA+ Wiki

he/they

she/her

xe/xem

Common format for displaying preferred pronouns

After exploring the bio text, it was clear that while most people abide by the conventional pronoun format of listing preferred two third-person personal pronouns separated by a forward slash (‘xxx/xxx’), this became more complex as more than two pronouns were listed (eg. ‘he/she/they’ or ‘he/him and they/them’.)

I made the choice to search each pronoun individually rather than as pairs, and searching the surrounding letters for confirmation of context. Each pronoun must be followed by or preceded by a forward slash, with no other alphanumeric character immediately at the other end. This helped with capturing more pronouns from bios that are often formatted for brevity (eg. ’occupation|she/her|hobbies…) and capturing all chosen pronouns for each person.

Pattern used for matching individual pronouns

"(?<![a-z0-9])she(?=\/)|(?<=\/)she(?![a-z0-9])"

Pronoun Use Overall

Preferred pronouns were specified in an identifiable format in 8.2% of Twitter users in the sample collected.

Adding, or not adding, pronouns to your Twitter profile is a decision that may be made for many reasons, from ideological to personal safety, so while this is a large enough sample to explore the variety of pronouns used, it is not representative of the Twitter community as a whole.

PRONOUNS IN PROFILES

Each square in this plot represents 1% of the overall sample population.

Pronoun use by gender group

In order to produce the final classifications, the first step was to identify pronouns in the following four groups: masculine, feminine, traditional gender-neutral pronouns and gender-neutral neopronouns. Further classification was produced by checking for the combinations of these 4 types.

Despite searching for tweets containing English words in the data collection language tweets, there were still a number pronouns provided in other languages. Most commonly Portuguese, with some German and French pronouns also identified. These were treated in the same way as their closest English translations.

Although “it”, “its”, “thy” and “thou” pronouns are not commonly used like they/them pronouns, they have been classified as traditional gender-neutral pronouns because they are existing pronouns within the English language. This is significantly different to neopronouns, which are generally new words created by gender non-conforming people as a deliberate evolution of language that aligns with their identities.

Masculine

  • he / him / his
  • ele / dele (Portuguese)
  • er / ihn (German)

Feminine

  • she / her / hers
  • ela / ella / dela (Portuguese)
  • ihr (German)

Neutral

  • they / them / theirs
  • elu / delu (Portuguese)
  • iel (French)
  • it / its / itself
  • thy / thou / thee

Neopronouns

  • xe/ xem / xey
  • ve / ver / vem
  • fae / ae / faer / aer
  • ey / e / em / erself
  • ze / hir / zir / zim/ zis

Note: “sie” has been removed from the classification process as it is both a common neopronoun and German for “she” and appeared to be used in both contexts in the dataset.

PRONOUN USE BY GENDER GROUP

Each square in this plot represents 10 users.

We can’t be sure how many of the bios analysed are brand accounts - classification of brand/human accounts was deemed unnecessary in this case as the focus of the research is on exploring the range of pronouns in those accounts that do specify a preference, and it has been assumed that bios containing singular pronouns represent individuals, however it is worth noting that accounts with gender non-conforming pronouns make up 2.4% of the total sample population.

Exploring gender non-conforming pronoun combinations

The interactive plot below allows you to explore the unique combinations provided by users, and provides some transparency on how these combinations were classified into more general groups. This shows differences in pronoun word choice (ie. she/they vs. she/them) but does not differentiate between the order they were provided in (eg. she/they & they/she are grouped)

Neopronouns

“Neopronouns are a category of new (neo) pronouns that are increasingly used in place of “she,” “he,” or “they” when referring to a person. Some examples include: xe/xem/xyr, ze/hir/hirs, ey/em/eir, and fae/faer/faers.”

K.R. Blevins | mykidisgay.com/blog/defining-neopronouns

The use of neopronouns dates back hundreds of years, with the most notable example being the pronoun “thon”, thought to be a contraction of “that one”, and originally coined in 1858 by Charles Crozat Converse. It has continued to be used to this day, and was even added to the Merriam-Webster dictionary in 1934.

Modern neopronouns developed in the Tumblr community, where the noun-self form of pronouns (bun/bunself, vamp/vampself, fae/faeself) started to be used from around 2012 and have evolved within the community since then. The analysis below shows that the most commonly used neopronouns now have evolved into words closer in form to traditional pronouns, with one significant exception.

1.18%

of users with pronouns identified use neopronouns

3.84%

of gender non-conforming accounts use neopronouns

85.03%

of those who use neopronouns use them in combination with traditional pronouns.

USE OF NEOPRONOUNS

Each square in this plot represents 1 user.

Further analysis of the dataset

For this further analysis of the dataset, the users are split into four groups:

Masculine pronouns only

Feminine pronouns only

Gender non-conforming pronouns

No pronouns identified

To minimise the number of non-individual accounts in the “No pronouns identified” group, only profiles containing “my” or “I’m” were used. For each group, a sample of 3,000 users were randomly selected.

Most frequent words used in bio text by gender group

Word Rank No pronouns provided Gender non-conforming pronouns Masculine pronouns only Feminine pronouns only
1 good 18 game account
2 get artist fan artist
3 follow art artist fan
4 make nsfw like writer
5 game queer 18 18
6 i ’ be dni love love
7 art bi writer game
8 friend game account like
9 people 22 nsfw bi
10 please writer art mom
11 man account i'm life
12 18 blm bi art
13 live 21 make i'm
14 new like gay trans
15 work sometimes life thing
16 writer 20 pfp black
17 also minor enthusiast much
18 artist old thing good
19 country trans time cat
20 day autistic twitch lover

Bio text analysis

The sections below will explore some of the common word groups seen in the table above, by checking for the presence of words within profiles. While it may be assumed that terms like “bi”, “gay” or “trans” are used to represent the users’ identities, there is no common format as there is for pronouns, so the context of the words is not certain.

LGBTQIA+ representation within groups

While displaying pronouns in social media profiles and email signatures is now relatively common online and in workplaces, the idea initially grew within the LGBTQIA+ community, and the table above shows that there is still very strong LGBTQIA+ representation among those who use bios in their pronouns, with words like “queer”, “trans”, “bi” and “gay” appearing in the most-frequently used words list for each group, and not so commonly in the group with no pronouns identified.

The proportion of profiles with any individual word is very low for all groups (maximum of just over 1%) - this may be due to Twitter users more commonly displaying their identities with emojis, or combinations of emojis. Unfortunately in the format the text was extracted, there are many emojis that share the same unicode characters, so I was unable to confidently extract context from the emoji codes.

LGBTQ

The most widely used identifiers are the first 5 letters of the LGBTQIA+ acronym.

Other LGBTQIA+ identities

A list of LGBTQIA+ identity terms was compiled from the LGBTQIA+ Wiki (sexual orientation, gender identity, romantic attraction) and Stonewall websites, and the terms listed below appeared most frequently in the user bios analysed.

Autism & ADHD

Autism and ADHD are conditions that have become closely linked with the LGBTQIA+ community, and the plots below show that the proportion of users with “autistic” and “ADHD” on their profile is much higher in groups with pronouns identified vs. those without. There is a particularly high proportion of gender non-conforming users who also identify as autistic.

“Current research indicates that autistic people have higher rates of LGBT identities and feelings than the general population. A variety of explanations for this have been proposed; The Lancet’s ‘Commission on the future of care and clinical research in autism’ commented that it ‘might be part of a different concept of self, less reliance on or reference to social norms, or part of a neurodiverse lived experience of (and outlook on) the world.’ While autistic people are more likely to be non-heterosexual than the general population, the majority of autistic people are heterosexual.”

Autism & LGBT Identities | wikiwand.com

Neurodiversity terms