We’ve recently opened GendRE API to recognize the gender of international names. What makes it so accurate? We recognize the cultural origin of international names, so we can tell that Andrea Rossini is most likely an Italian name (and a male name), whereas Andrea Parker is most likely an anglosaxon name (and a female name); 声涛周 is most likely a male ; “O. Sokolova” is most likely a female. Try those:
We’re continuously working on improving the accuracy of our software, for a particular country/region. This year, in August, the University of Glasgow will host the 25th International Congress of Onomastic Sciences, the premier conference in the field of name studies. So, how accurate is NamSor at recognizing Scottish names?
The following chart shows some backtesting results: how NamSor correctly or incorrectly classified ~3000 names that are labelled as Scottish in several databases (Freebase, WWI casualties) among one hundred other places/regions/cultures (from Ireland to Zimbabwe).
Also, this is a list of “Very Scottish” names coming from the software : craig docherty, alasdair macgregor, malcolm finlayson, alistair urquhart, rikki ferguson, scott shearer, scott taggart, craig strachan, james wedderburn, scott muirhead, bobby prentice, scott mcculloch, stuart munro, grant munro, alistair woodburn, deborah mccallum, hugh mackintosh, scott chisholm, bobby shearer, billy abercromby. Does they make sense? Feedback welcome.
We hope to make it even better, so we can produce interesting DataViz from Twitter or other cool databases/services.
- Presentation of NamSor for migration studies (Turkish Migration Conference 2014)
- What’s in a Twitter name? Revealing the Irish, French, Indonesian digital diasporas