Understanding NamSor API precision for Gender inference

Posted by

Using a library or an APIs to infer the gender from a given name, is a common way to fix data quality issues with name and title information, or to enrich a dashboard with additional demographics information.

But did you know? You would expect Jean, Julia, Karen to be female names. Karen is usually a female name, but in Armenia, it is a male name. Same for JULIÀ : in Spain/Catalunya is a male name. Or Jean, in France – male again. Culture matters!

What’s the Gender Gap at the European Union institutions?

We used the OFFICIAL DIRECTORY OF THE EUROPEAN UNION (ISSN 2363-3271) to measure NamSor Gender API precision for European names.

Using title information (Mr, Ms) we find 35.33% women at the EU, compared to 35.45% if we infer the likely gender using NamSor API. The error rate in measuring the gender gap is less than 1%.

EUGenderGap2016

Overall, NamSor Gender API precision and recall are respectively 98.41% and 99.28%.

NamSorGenderPrecision.png

Why recognizing the name’s cultural origin or ethnicity matter?

What makes NamSor Gender so accurate is that we try to recognize the likely cultural origin and gender at the same time. It’s not always a simple task, but let’s take the example of Gabriele : all the ‘Italian’ versions of the Gabriele given names are associated the title ‘Mr’, as it is a Male name in Italy. Conversely, all the non-Italian ‘Gabriele’ are associated with the title ‘Ms’.

Based on the combination of first and last names, a cultural context is assigned before trying to infer the likely gender.

EU_Gabriele

What is the ‘Gender Scale’ numerical attribute?

The Gender Scale is a numerical value in the range [-1…0…+1] that reflects how likely the name is ‘Male’ or ‘Female’, with the value 0 representing ‘Unknown’. It reflects how confident we are in the result and is directly correlated with the API’s accuracy.

When the scale attribute is close to -1 (Female) or +1 (Male), the accuracy is close to 99%.

EU_GenderScalee.png

You can download the DataSet used for this blog post here : (1), (2), (3). Image Credits : The European Institute for Gender Equality (EIGE).

About NamSor

NamSor™ Applied Onomastics is a European vendor of sociolinguistics software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people. We proudly support Gender Gap Grader.
Reach us at: contact@namsor.com

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s