Tag Archives: Gender API

What’s the gender gap among accredited medical doctors in France?

The French ‘Haute autorité de la santé (HAS)’ has published an open data set with the list of accredited medical doctors in France. So what’s the gender gap among them?

2015_GenderGapGrader_MedecinsAccreditesFR - Copy

Further details,

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched NamSor Gender API, a free API to extract gender from personal names. http://www.namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

Video Tutorial: NamSor extension for RapidMiner to measure a Genger Pay Gap and other Diversity Analytics

[NEW! Get your API Key directly from NamSor and use the online tool as a complement to RapidMiner]

This is a hands-on video tutorial on measuring the gender pay gap and other diversity analytics, with the example of open data published by the city of Palo Alto, California.

Palo Alto public data on employees salaries in 2013.

Palo Alto public data on employees salaries in 2013.

The original file has a lot of data, but doesn’t have any diversity information so we use NamSor extension for RapidMiner to extract from names the gender and likely origin.

NamSor API can infer gender information with high precision, recognizing for example that Andrea Rossini is likely Italian male, whereas Andrea Parker is more likely a female name. Onomastics or onomatology is the study of the origin, history, and use of proper names.

Using NamSor to determine gender and likely country of origin

For this tutorial, you will need,

Data mining can be fun, open data and better corporate transparency can make a difference. Please RT and join us in thanking Palo Alto City for their transparency:

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

Recent Sony Hack, Diversity Problems and the Gender Pay Gap at Deloitte

The Recent Sony Hack Reveals Huge Diversity Problems and Gender Pay Gap Among Top-Level Executives in the film industry. This data, though unlawfully disclosed to the public, might shed lights on some of the causes behind the huge gender gap in the film industry. Earlier this year, we’ve -lawfully and publicly- measured the gender gap using Open Data published by The Internet Movie Database (IMDB). We’ve genderized over 5 million names to produce this chart:

20140516_IMDb_GenderGap_Methodology_v002

20140518_IMDb_GenderGap_Table_byRole

The public may wonder also on How Did 30,000 Deloitte Employees Get Caught Up in Sony’s Data Leak? As many audit and consulting companies, Deloitte communicates periodically on its inclusiveness to women in the workplace, participates to women empowerment initiatives, even offers consulting services on gender diversity and corporate governance. Behind doors the gender pay gap at Deloitte could very well look like this one, produced from actual data of a different large consulting firm:

20140314_GenderEquality_Teaser_v001

We live in a age of transparency, where companies are encouraged to disclose more and more data on their Gender Diversity and inclusiveness to a more Diverse Workforce. Though this particular case of the Sony data leak was originally caused by unlawful criminal activity, unwilling disclosures of payroll data could also be the result of negligence. Today, all women and men can make use of online services to ‘genderize‘ a HR database, or to recognize if a company has a bias towards employees or a certain origin. Employees, trade unions, journalists or feminist activists might have got hold of the Pastebin data (now removed) and may disclose the real picture of gender equality at Deloitte – making good use of our free gender gap API …

Companies should take such event as a forewarning to take the matter of gender and cultural diversity seriously.

Consulting companies and auditors in particular should be aware of the absolute necessity to close the gap between how they communicate about gender equality and how they behave in the workplace. They should be the first users and promoters of Open Data as well as of benchmarking the Gender Gap in their sector of activity.

Recently, we had an informative exchange of tweets about how the Australian Government communicates on gender equality. See how transparent a government can be on Gender Equality, making good use of Open Data and technology.

Not only the Australian Government publishes an Open Dataset under the Workplace Gender Equality Act 2012

https://data.gov.au/dataset/wgea-dataset

but also it publishes the raw data at the finest grain level

http://data.gov.au/dataset/directory-gov-au-full-data-export

making it possible for citizens to see data from other angles (looking at different segmentation, analyzing biases towards employees or a certain origin, etc.)

Earlier this year, we have launched Gender Gap Grader – a global initiative to measure the gender gap across all professional fields, using Open Data and a public API.

Further reading:

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

Popular names of Delhi, Incredible India

Incredible India is the name of successful place marketing initiative launched in 2002 by the Indian Government using the incredible diversity of India in terms of colors, landscapes, people, languages etc. to promote tourism in India. Because of this same diversity, Indian onomastics are a tough nut to crack. At NamSor, we’ve opened a free API to predict the gender of personal names according to the various languages and cultures (Andrea Rossini is male, Andrea Parker is female, Jean Durieux is …). We aim for 95 to 99% accuracy and 95 to 99% recall, in every country where possible.

This work is important. The status of women in India is a very current issue. At NamSor, we believe in the value of open data mining initiatives -such as Gender Gap Grader– to advance the empowerment of women worldwide. So we work hard to understand how names vary in different states of India, different regions. For example, below are the most frequent male/female names in Delhi.

Most popular female and male names in Delhi, India

Name Female Male Likely Gender Total Pct
Sunita 51526 82 Female 51614 0.5%
Poonam 33524 79 Female 33603 0.3%
Raj Kumar 127 33313 Male 33440 0.3%
Anita 32816 45 Female 32861 0.3%
Ashok Kumar 29 29747 Male 29776 0.3%
Manoj Kumar 35 29197 Male 29232 0.3%
Anil Kumar 32 28943 Male 28975 0.3%
GEETA 28547 30 Female 28577 0.3%
Sunil Kumar 27 27438 Male 27465 0.3%
Santosh 22557 4465 Both 27022 0.3%

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. We support the @GenderGapGrader initiative. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

 

Leave a comment

Filed under General

An API to measure the Gender Gap accross all professional fields

NamSor API was presented yesterday at the amazing APIDays.io Paris conference. The Gender Gap Grader project will be featured as a Keynote at the next APIDays conference in May.

Download presentation : APIDays-slides.pdf

Further reading:

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. We support the @GenderGapGrader initiative. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

NamSor and Gender Gap Grader are in AngelList database of Startups, VC, Angels

We’ve analyzed the gender gap in AngelList database of 650k profiles… we’re in it too. In perfect balance. Follow us in AngelList and hear more about our development in 2015:) #datamining #machinelearning #bigdata #opendata

https://angel.co/namsor

https://angel.co/gender-gap-grader-1

GENDERGAP_infoviz_web

Gender Gap Grading : read about the making-of and make yours!

Leave a comment

Filed under General

Popular baby names in Tripura, Incredible India

Incredible India is the name of successful place marketing initiative launched in 2002 by the Indian Government using the incredible diversity of India in terms of colors, landscapes, people, languages etc. to promote tourism in India. Because of this same diversity, Indian onomastics are a tough nut to crack. At NamSor, we’ve opened a free API to predict the gender of personal names according to the various languages and cultures (Andrea Rossini is male, Andrea Parker is female, Jean Durieux is …). We’ve already made it very accurate for most countries. Still, we have a lot of work to do on Indian names, as the precision of Gendre APIv0.0.17 for India is not yet at the right standard. We aim for 95 to 99% accuracy and 95 to 99% recall, in every country where possible.

This work is important. The status of women in India is a very current issue. At NamSor, we believe in the value of open data mining initiatives -such as Gender Gap Grader– to advance the empowerment of women worldwide. So we work hard to understand how names vary in different states of India, different regions. For example, below are the most frequent male/female names in the state of Tripura. Keep posted for the next version of GendRE API, as it will have much better precision to predict the gender of Indian names. In the meantime, further reading:

Most popular female and male names in Tripura, India

Name Female Male Total Gender
Sabita 6364 6 6370 Female
Pradip 9 5077 5086 Male
Kalpana 4481 4481 Female
Anita 4424 13 4437 Female
Rina 4182 6 4188 Female
Ratna 4133 13 4146 Female
Ratan 20 4112 4132 Male
Narayan 11 4102 4113 Male
Namita 4079 6 4085 Female
Uttam 19 4043 4062 Male
Sbapan 6 3981 3987 Male
Dilip 7 3903 3910 Male
Dipali 3898 6 3904 Female
Bishbajit 12 3853 3865 Male
Gita 3853 3853 Female
Tapan 9 3798 3807 Male
Anjali 3617 3617 Female
Ranjit 7 3508 3515 Male
Sanjit 14 3443 3457 Male
Sbapna 3429 3429 Female
Lakshi 3366 29 3395 Female
Soma 3347 3347 Female
Sabitri 3287 3287 Female
Kajal 1930 1349 3279 Both
Suman 58 3218 3276 Male
Shipra 3244 3244 Female
Purnima 3168 3168 Female
Sunil 6 3119 3125 Male
Sujit 14 3103 3117 Male
Rita 3111 3111 Female
Sumitra 3088 10 3098 Female
Bimal 10 3086 3096 Male
Shefali 3091 3091 Female
Ajit 12 3050 3062 Male
Aarati 3022 3022 Female
Anjana 2983 8 2991 Female
Malati 2963 2963 Female
Babul 8 2952 2960 Male
Archana 2915 2915 Female
Samir 9 2852 2861 Male
Gautam 2840 2840 Male
Gopal 9 2813 2822 Male
Dipak 2791 2791 Male
Rekha 2756 2756 Female
Dulal 7 2736 2743 Male
Basanti 2727 2727 Female
Shyamal 11 2711 2722 Male
Minati 2677 2677 Female
Manik 21 2644 2665 Male
Shilpi 2632 7 2639 Female
Sima 2619 6 2625 Female
Bikash 10 2569 2579 Male
Sanjay 2540 2540 Male
Subhash 2529 2529 Male
Sajal 68 2443 2511 Male
Abhijit 8 2488 2496 Male
Litan 26 2465 2491 Male
Parimal 16 2465 2481 Male
Pratima 2421 2421 Female
Anima 2374 2374 Female
Manju 2309 50 2359 Female
Bina 2344 2344 Female
Sandhya 2327 10 2337 Female
Anil 2321 2321 Male
Mamata 2303 8 2311 Female
Ruma 2299 2299 Female
Rabindr 2295 2295 Male
Rajib 9 2263 2272 Male
Biplab 7 2250 2257 Male
Sukumar 2242 2242 Male
Abdul 2230 2230 Male
Chandan 13 2212 2225 Male
Shikha 2213 2213 Female
Rajesh 2196 2196 Male
Manoranjan 2195 2195 Male
Milan 1435 752 2187 Both
Nirmal 9 2173 2182 Male
Sarasbati 2177 2177 Female
Raju 46 2117 2163 Male
Aparna 2146 2146 Female
Zarna 2061 2061 Female
Rakhi 2044 10 2054 Female
Mayarani 2033 2033 Female
Tapas 2030 2030 Male
Rakesh 10 2009 2019 Male
Jyotasna 2005 2005 Female
Jayanti 1985 7 1992 Female
Santosh 1950 1950 Male
Subrat 1918 1918 Male
Ranajit 1917 1917 Male
Sandhyarani 1912 1912 Female
Bijay 22 1845 1867 Male
Suchitra 1833 15 1848 Female
Mira 1846 1846 Female
Haradhan 1815 1815 Male
Kabita 1784 8 1792 Female
Niranjan 1792 1792 Male
Gitarani 1772 1772 Female
Pramila 1761 6 1767 Female
Manika 1740 1740 Female

Leave a comment

Filed under General

GGG & AngelList – a making-of

Tools, methodology, data sources, data output used to produce the article GenderGapGrader: AngelList.

We’ve opened the free GendRE API which extracts gender from names. To make it usable by everyone, we’ve built an extension for RapidMiner, a leading open source data mining and predictive analytics software

.

So you can run your own gender gap analysis, where and when it matters to you!

GGG_Make_your_own_gendergap_study_vF

Data Sources:

Data Mining Tools:

Data Output:

Estimates:

Tutorial:

Leave a comment

Filed under General

What’s the Gender Gap in the European Union Whoiswho?

This summer, the European Union has published an updated version of The Official Directory of the European Union [mirror: FYWW13001ENC.pdf, in ZIP/Excel format TheEUDirectory_xls.zip].

The EU Directory is not a good candidate for the GenderGapGrader initiative – because it’s not a truly global database like the FAA Airman Directory or the Internet Movie Database (IMDB), which we’ve used to disclose the gender gap among airline pilots or in the film industry.

But it’s a useful document, which lists people from all countries of Europe with their title (Mr, Ms). We can use it to validate our methodology, specifically compute the error rate of the software we use to predict the gender of names. The document lists about fifteen thousand EU executives, with different roles : Member, Head of Unit, Substitute, Director, Counsellor, Head of Division, Legal secretary, Vice-Chair, Attaché, First Secretary, Member of the Bureau, Administrator, Adviser, Head of Delegation, Second Secretary, Acting Head of Unit, Vice-President, Head of Service, Chair, Head of Office, Director-General, Delegate, Head of Sector, Assistant to the Director-General, Bureau member, Third Secretary, Judge, Head of private office, Acting Director, Head of Cabinet …

We’ve used RapidMiner with the Onomastics extension (available free in RapidMiner MarketPlace) to calculate the Gender Gap, then we’ve compared the results from then actual counts using titles (Mr, Ms). The software recognizes the origin of personal names to solve cases such as Andrea Rossini (likely Male) or Andrea Parker (likely Female), Jean, Kim etc. The use of the ‘gender scale’ metric (a numeric value between -1 and +1), instead of the predicted gender (Male, Female or Unknown), helps level cases of truly genderless names such as Kerry.


20140905_TheEUDirectory_GenderGap_scale_cc_vF

In the chart above, the difference cannot be seen with the human eye: onomastics are a reliable tool to measure the gender gap. In most cultures, names can be used to accurately predict gender (with the notable exception of Chinese names, Korean names once transliterated).

The European Institute for Gender Equality is an EU agency based in Vilnius, Lithuania, which promotes gender equality, fights discrimination based on sex and raises gender awareness. What about promoting gender equality among EU political institutions;-) We believe such new data mining capabilities and growing open data initiatives can help public and private organisations find innovative solutions to close the gender gap.

20140905_Genderizing_TheEUDirectory_using_RapidMiner

Using RapidMiner Onomastics extension to measure the Gender Gap

Additional information in the EU Directory allows to visualize other factors affecting the gender gap : EU Country of affiliation, role, …

20140905_TheEUDirectory_GenderGap_a_vF

20140905_TheEUDirectory_GenderGap_b_vF

We are continuously improving the software to predict the origin and gender of names. The coming version of GendRE API v0.0.16 will still enhance the accuracy. Also, please stay tuned for the upcoming GenderGapGrader study on women entrepreneurship, start-ups and access to financing (VCs, Business Angels…)

To redo this study or make your own

– Get the source data files used in this article [mirror: FYWW13001ENC.pdf, in ZIP/Excel format TheEUDirectory_xls.zip]

– Get RapidMiner with Onomastics extension and Documentation, watch the tutorial video

– Or download directly the results TheEuropeanUnionDirectory_genderized.zip

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. We support the @GenderGapGrader initiative. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Creative Commons License
‘What’s the Gender Gap in the European Union Whoiswho?’ by gendergapgrader is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at http://namesorts.com/2014/09/09/whats-the-gender-gap-in-the-european-union-whoiswho/.

Leave a comment

Filed under General

GGG & The Airman Directory – a making-of

Tools, methodology and data used to produce the article ‘GenderGapGrader: Airline Pilots‘:

We’re disclosing the data used in the study. We’ve opened GendRE API which extracts gender from names. To make it usable by everyone, we’ve built an extension for RapidMiner, a leading open source data mining and predictive analytics software.

So you can run your own gender gap analysis, where and when it matters to you!

 

GGG_Make_your_own_gendergap_study_vF

 

Data Files:

Data Mining Tools:

Data Scope:

  • Commercial/Airline Pilots :

SELECT ALL_PILOTS_GENDERIZED_LICENCED.COUNTRY, ALL_PILOTS_GENDERIZED_LICENCED.TYPE, ALL_PILOTS_GENDERIZED_LICENCED.LEVEL, ALL_PILOTS_GENDERIZED_LICENCED.gender, Sum(ALL_PILOTS_GENDERIZED_LICENCED.gender_scale) AS SumOfgender_scale, Count(ALL_PILOTS_GENDERIZED_LICENCED.[UNIQUE ID]) AS [CountOfUNIQUE ID] FROM ALL_PILOTS_GENDERIZED_LICENCED WHERE (COUNTRY=’USA’ And TYPE=’P’ And (LEVEL=’C’ Or LEVEL=’A’)) Or (COUNTRY<>’USA’ And TYPE=’Y’ And LEVEL=’Y’) GROUP BY ALL_PILOTS_GENDERIZED_LICENCED.COUNTRY, ALL_PILOTS_GENDERIZED_LICENCED.TYPE, ALL_PILOTS_GENDERIZED_LICENCED.LEVEL, ALL_PILOTS_GENDERIZED_LICENCED.gender;

Raw Estimates:

In bold, the estimates cited in the article or the infographics.

All pilots gender gap, overall statistics:

 gender  SumOfgender_scale  CountOfUNIQUE ID
 (blank)                                          219
 Female                               35,419                                    48,106
 Male –                          490,805                                 596,368
 Unknown –                                       6                                      7,513
 Total Count n/a  652,206
 % Female 6.73% n/a
 % Male 93.27% n/a

 

All commercial pilots gender gap, overall statistics:

Gender  SumOfgender_scale  CountOfUNIQUE ID
Male -225955 272994
Female 13011 17697
Unknown -4 2322
Total Count n/a 293013
% Female 5.44% n/a
% Male 94.56% n/a

 

Commercial/airline pilots gender gap, by country:

NB/ this table was uploaded on Wikipedia to facilitate sharing of alternative statistics, actual gender gap disclosures by major national airlines.

Country Pilots Estimate Female (%, Scale)
USA 218229 5.12%
UNITED KINGDOM 14684 6.37%
GERMANY 11881 7.11%
CANADA 6852 6.78%
SWITZERLAND 4736 6.45%
FRANCE 4396 7.62%
ITALY 2984 4.89%
AUSTRIA 2405 5.50%
SPAIN 2081 5.28%
NETHERLANDS 2068 6.10%
BELGIUM 2037 7.53%
MEXICO 1530 2.33%
AUSTRALIA 1472 6.58%
BRAZIL 1315 2.20%
SWEDEN 1260 8.20%
IRELAND 957 6.80%
JAPAN 732 5.58%
NORWAY 665 4.47%
ISRAEL 628 5.71%
SOUTH AFRICA 551 7.54%
DENMARK 485 4.37%
NEW ZEALAND 465 7.76%
INDIA 445 7.72%
ICELAND 427 15.63%
ARGENTINA 362 1.83%
GREECE 359 2.62%
KENYA 329 8.78%
POLAND 291 5.26%
TRINIDAD & TOBAGO 289 7.06%
VENEZUELA 280 3.79%
FINLAND 277 12.07%
CZECH REPUBLIC 263 2.73%
COLOMBIA 261 2.96%
LUXEMBOURG 215 9.56%
HONG KONG 214 3.89%
SRI LANKA 204 15.54%
CHILE 202 4.26%
PORTUGAL 193 2.93%
SINGAPORE 193 7.46%
UNITED ARAB EMIRAT 189 3.81%
CYPRUS 185 4.63%
GUATEMALA 160 2.78%
ECUADOR 156 4.47%
COSTA RICA 148 5.75%
HUNGARY 139 7.86%
NIGERIA 137 4.32%
PANAMA 131 7.07%
JAMAICA 127 8.86%
DOMINICAN REPUBLIC 120 2.04%
SAUDI ARABIA 120 3.19%
EL SALVADOR 105 3.02%
BAHAMAS 102 5.62%
PHILIPPINES 99 7.19%
NETHERLANDS ANTILL 86 5.48%
THAILAND 80 12.06%
WEST INDIES 78 5.51%
PERU 76 2.95%
SLOVENIA 74 14.21%
RUSSIA 69 9.76%
FRENCH WEST INDIES 69 2.74%
EGYPT 64 1.83%

 

Student pilots gender gap, by country:

Country Estimate %Female Students
USA 12.0%           93,395
GERMANY 9.0%             1,753
UNITED KINGDOM 8.5%             1,224
NETHERLANDS 10.8%                 746
SAUDI ARABIA 3.8%                 407
JAPAN 12.9%                 394
BELGIUM 11.3%                 306
SWITZERLAND 10.1%                 269
INDIA 12.5%                 262
EGYPT 0.9%                 233
NIGERIA 8.8%                 213
CANADA 11.9%                 210
MEXICO 2.9%                 177
ITALY 8.1%                 169
COLOMBIA 8.5%                 152
FRANCE 7.1%                 138
NORWAY 12.3%                 136
IRELAND 9.1%                 108
TURKEY 2.6%                 107
BRAZIL 6.2%                 101
BAHAMAS 8.9%                   89
AUSTRIA 4.7%                   88
ISRAEL 4.5%                   79
BAHRAIN 2.8%                   79
HONG KONG 9.5%                   71
RUSSIA 9.9%                   71
UNITED ARAB EMIRAT 17.1%                   66
SINGAPORE 23.3%                   65
ECUADOR 3.6%                   60
SPAIN 14.4%                   53
DENMARK 7.6%                   49
PANAMA 14.5%                   45
INDONESIA 10.2%                   43
SWEDEN 10.6%                   43
AUSTRALIA 4.1%                   39

 

1 Comment

Filed under General