Tag Archives: RapidMiner

NamSor at RapidMiner Wisdom NYC 2016

In January this year, NamSor founder Elian CARSENAT was at RapidMiner Wisdom conference in New-York City. Discover theCUBE video.

The Big Apple: the World owns a piece.

We’ve analyzed NYC Open Data on Real Property (ACRIS) using RapidMiner with NamSor customer segmentation tool. Based on the socio-linguistics of personal names, we inferred gender, cultural origin and ethnicity to produce various maps and data visualizations.

It is fascinating how diverse New-York is. Who lives in NYC? Who owns a property? How do people vote?

Check out our presentation to discover some of the findings:

Try NamSor Extension for client segmentation

You can try NamSor API for free. NamSor names processing extension is an open source RapidMiner add-on available for download in RapidMiner MarketPlace.

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

 

Leave a comment

Filed under EthnoViz, General

NamSor Extension for RapidMiner 6.5

RapidMiner empowers enterprises to easily mashup data, create predictive models and operationalize predictive analytics within any business process.It is a leading mining tool for the ‘big data’.

At RapidMiner Wisdom 2015, the user conference that took place in Ljubljana, Slovenia – a new release was launched with two forever free editions and one commercial edition of RapidMiner™ Studio 6.5.

We’ve also updated our Names Processing Extension for RapidMiner and it offers all the functions found in NamSor API :

  • Parse a Personal Name
  • Infer Gender from a Personal Name
  • Infer Origin from a Personal Name

It is found in the MarketPlace:

2015_NamSor_Ext_For_RapidMiner_65as well as on github.

Combined with RapidMiner other extensions, it can be used for many different use cases, in academic, public and private sectors, for example:

  • Gender Studies,
  • Migration Research,
  • Travel Industry,
  • ‘Big Data’ and predictive analytics,
  • Segmentation for Sentiment Analysis

You’ll find below our presentation at RapidMiner Wisdom

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

Leave a comment

Filed under General

Video Tutorial: NamSor extension for RapidMiner to measure a Genger Pay Gap and other Diversity Analytics

[NEW! Get your API Key directly from NamSor and use the online tool as a complement to RapidMiner]

This is a hands-on video tutorial on measuring the gender pay gap and other diversity analytics, with the example of open data published by the city of Palo Alto, California.

Palo Alto public data on employees salaries in 2013.

Palo Alto public data on employees salaries in 2013.

The original file has a lot of data, but doesn’t have any diversity information so we use NamSor extension for RapidMiner to extract from names the gender and likely origin.

NamSor API can infer gender information with high precision, recognizing for example that Andrea Rossini is likely Italian male, whereas Andrea Parker is more likely a female name. Onomastics or onomatology is the study of the origin, history, and use of proper names.

Using NamSor to determine gender and likely country of origin

For this tutorial, you will need,

Data mining can be fun, open data and better corporate transparency can make a difference. Please RT and join us in thanking Palo Alto City for their transparency:

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

An API to measure the Gender Gap accross all professional fields

NamSor API was presented yesterday at the amazing APIDays.io Paris conference. The Gender Gap Grader project will be featured as a Keynote at the next APIDays conference in May.

Download presentation : APIDays-slides.pdf

Further reading:

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. We support the @GenderGapGrader initiative. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

Video tutorial – How to Extract the Gender of Personal Names, using RapidMiner

RapidMiner is a leading software for advanced analytics, including predictive analytics, data mining, and text mining. We’ve built an onomastics extension for RapidMiner to enrich any database and infer the gender of personal names of all languages/cultures/alphabets/countries. The GendRE API offers unmatched accuracy, recognizing that “Andrea Rossini” is most likely an Italian name and so a male, whereas “Andrea Parker” is most likely an anglosaxon name and so female; 声涛周 is most likely a male ; “O. Sokolova” is most likely a female.

We’ve used RapidMiner and GendRE API to measure the gender gap among EU Officials, mining the 2014 European Union Directory. This video tutorial will show you step-by-step how it was done:

To redo this study or make your own, download RapidMiner with Onomastics extension and Documentation.

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. We support the @GenderGapGrader initiative. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

What’s the Gender Gap in the European Union Whoiswho?

This summer, the European Union has published an updated version of The Official Directory of the European Union [mirror: FYWW13001ENC.pdf, in ZIP/Excel format TheEUDirectory_xls.zip].

The EU Directory is not a good candidate for the GenderGapGrader initiative – because it’s not a truly global database like the FAA Airman Directory or the Internet Movie Database (IMDB), which we’ve used to disclose the gender gap among airline pilots or in the film industry.

But it’s a useful document, which lists people from all countries of Europe with their title (Mr, Ms). We can use it to validate our methodology, specifically compute the error rate of the software we use to predict the gender of names. The document lists about fifteen thousand EU executives, with different roles : Member, Head of Unit, Substitute, Director, Counsellor, Head of Division, Legal secretary, Vice-Chair, Attaché, First Secretary, Member of the Bureau, Administrator, Adviser, Head of Delegation, Second Secretary, Acting Head of Unit, Vice-President, Head of Service, Chair, Head of Office, Director-General, Delegate, Head of Sector, Assistant to the Director-General, Bureau member, Third Secretary, Judge, Head of private office, Acting Director, Head of Cabinet …

We’ve used RapidMiner with the Onomastics extension (available free in RapidMiner MarketPlace) to calculate the Gender Gap, then we’ve compared the results from then actual counts using titles (Mr, Ms). The software recognizes the origin of personal names to solve cases such as Andrea Rossini (likely Male) or Andrea Parker (likely Female), Jean, Kim etc. The use of the ‘gender scale’ metric (a numeric value between -1 and +1), instead of the predicted gender (Male, Female or Unknown), helps level cases of truly genderless names such as Kerry.


20140905_TheEUDirectory_GenderGap_scale_cc_vF

In the chart above, the difference cannot be seen with the human eye: onomastics are a reliable tool to measure the gender gap. In most cultures, names can be used to accurately predict gender (with the notable exception of Chinese names, Korean names once transliterated).

The European Institute for Gender Equality is an EU agency based in Vilnius, Lithuania, which promotes gender equality, fights discrimination based on sex and raises gender awareness. What about promoting gender equality among EU political institutions;-) We believe such new data mining capabilities and growing open data initiatives can help public and private organisations find innovative solutions to close the gender gap.

20140905_Genderizing_TheEUDirectory_using_RapidMiner

Using RapidMiner Onomastics extension to measure the Gender Gap

Additional information in the EU Directory allows to visualize other factors affecting the gender gap : EU Country of affiliation, role, …

20140905_TheEUDirectory_GenderGap_a_vF

20140905_TheEUDirectory_GenderGap_b_vF

We are continuously improving the software to predict the origin and gender of names. The coming version of GendRE API v0.0.16 will still enhance the accuracy. Also, please stay tuned for the upcoming GenderGapGrader study on women entrepreneurship, start-ups and access to financing (VCs, Business Angels…)

To redo this study or make your own

– Get the source data files used in this article [mirror: FYWW13001ENC.pdf, in ZIP/Excel format TheEUDirectory_xls.zip]

– Get RapidMiner with Onomastics extension and Documentation, watch the tutorial video

– Or download directly the results TheEuropeanUnionDirectory_genderized.zip

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. We support the @GenderGapGrader initiative. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Creative Commons License
‘What’s the Gender Gap in the European Union Whoiswho?’ by gendergapgrader is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at http://namesorts.com/2014/09/09/whats-the-gender-gap-in-the-european-union-whoiswho/.

Leave a comment

Filed under General

RapidMiner to enrich Gender data

[UPDATE September-2014 : watch the 3 minutes tutorial video]

[UPDATE July-2014 : NamSor Onomastics Extension is now available in RapidMiner MarketPlace]

[UPDATE June-2014 : we have built an opensource (AGPL) extension for RapidMiner, get it on GitHub]

With Open Data from the Internet Movie Database (IMDb) and a gender prediction API, it was possible to assess the gender gap in the global film industry in minutes. We found that only ~22% of three hundred thousand movie directors worldwide are women.

We used technical skills and a small program to do this first analysis. Could it be done using a friendlier data mining tool? This article shows how a similar gender study can be conducted with RapidMiner.

Get RapidMiner

Install RapidMiner from SourceForge with additional extensions (Help->Updates and Extensions) : Text Mining and Web Mining.

In this example, we will read an Excel file with two columns (firstName, lastName), enrich with a first column containing the Gender (on a -1..+1 scale). Our test file is a list of members of the exclusive Club ‘Le Siècle‘ (2010), which periodically gathers the French élite : Club_LeSiecle.xlsx (Source : La Marseillaise/cryptome).

Import Excel Data

Drag and drop the Read Excel operator (Import->Data->Read Excel) and launch the Import Configuration Wizard.

2014_RapidMiner_1_ReadExcel

Default values should be OK through the wizard, except Encoding should be set to UTF-8 (Unicode, especially required if you would like to genderize Chinese, Russian or Arabic names).

Enrich Data by Webservice

Next, you will call the Gender prediction API to infer the likely sex/gender for each row in your Excel file.

Drag and drop the Enrich Data by Webservice operator (Web Mining->Services->Enrich Data by Webservice) and connect it to the Read Excel operator.

2014_RapidMiner_2_Enrich_by_WebService

You can use our free Gender API or the Freemium on Mashape. For this example, we shall use the free plain text API, entering this kind URL:

NB: we also provide a REST JSON format, not used in this example

We need to configure the Enrich Data by Webservice operator to pass the parameters and assign the result to a new variable GenderScale (-1 is Male ..+1 is Female):

– query type :’Regular Region’

– attribute type : ‘Numerical’

– regular region queries : add a single attribute ‘GenderScale’ containing the entire result from calling the API (ie. anything between the beginning of the line ^ and the end of the line $)

– request method : ‘GET’

– url : FN and LN will be replaced by the firstName and lastName at runtime  http://api.namsor.com/onomastics/api/gendre/<%FN%>/<%LN%>/fr

– encoding : UTF-8

2014_RapidMiner_3_Enrich_by_WebService_Parameters

Write CSV

Next, you will write the output to a CSV file (Export->Data->Write CSV), setting an output file name and selecting UTF-8 encoding again.

Run the Process

Last, set the process encoding to UTF-8 and run it.

2014_RapidMiner_7_Run

The output should look like:

“FN”;”LN”;”GenderScale”
“Philippe”;”Jaffré”;-1.0
“Bertrand”;”Collomb”;-1.0
“André”;”Lévy-Lang”;-0.96

What’s the verdict ? Women account for ~17% of the French Elite Club ‘Le Siècle’ (2010).

2014_RapidMiner_4_LeSiece_GenderGap

Further reading:

Meet us on 29 April 2014 at DataTuesday Paris with Girls in Tech Paris, on the topic ‘Women & Data’.

Leave a comment

Filed under General