Tag Archives: United States

NamSor at RapidMiner Wisdom NYC 2016

In January this year, NamSor founder Elian CARSENAT was at RapidMiner Wisdom conference in New-York City. Discover theCUBE video.

The Big Apple: the World owns a piece.

We’ve analyzed NYC Open Data on Real Property (ACRIS) using RapidMiner with NamSor customer segmentation tool. Based on the socio-linguistics of personal names, we inferred gender, cultural origin and ethnicity to produce various maps and data visualizations.

It is fascinating how diverse New-York is. Who lives in NYC? Who owns a property? How do people vote?

Check out our presentation to discover some of the findings:

Try NamSor Extension for client segmentation

You can try NamSor API for free. NamSor names processing extension is an open source RapidMiner add-on available for download in RapidMiner MarketPlace.

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com


Leave a comment

Filed under EthnoViz, General

Video Tutorial: NamSor extension for RapidMiner to measure a Genger Pay Gap and other Diversity Analytics

[NEW! Get your API Key directly from NamSor and use the online tool as a complement to RapidMiner]

This is a hands-on video tutorial on measuring the gender pay gap and other diversity analytics, with the example of open data published by the city of Palo Alto, California.

Palo Alto public data on employees salaries in 2013.

Palo Alto public data on employees salaries in 2013.

The original file has a lot of data, but doesn’t have any diversity information so we use NamSor extension for RapidMiner to extract from names the gender and likely origin.

NamSor API can infer gender information with high precision, recognizing for example that Andrea Rossini is likely Italian male, whereas Andrea Parker is more likely a female name. Onomastics or onomatology is the study of the origin, history, and use of proper names.

Using NamSor to determine gender and likely country of origin

For this tutorial, you will need,

Data mining can be fun, open data and better corporate transparency can make a difference. Please RT and join us in thanking Palo Alto City for their transparency:

About NamSor

NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. http://namsor.com

About GenderGapGrader

GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com

Leave a comment

Filed under General

Making sense of Big Data : mining Twitter names

Millions of geo tweets in various languages, discussing anything from ‘hey, I’m here‘ to finance, geopolitics or marketing. How do you make sense of them?

We’ve used name recognition (applied onomastics) to filter information and produce unique maps of the e-Diasporas. Where are the digitally connected Italian, Turkish and Russian today? They may be migrants, tourists, business travellers, student, visiting scientists…

To jump directly to the interactive map, click here : http://cdb.io/1iSeWw2 or read more about our methodology.

Italian, Russina, Turkish Twitter

Italian, Russina, Turkish Twitter

TIP : Filter out layers and zoom in/out.
Below we filtered out the Turkish Twitter layer to visualize where the Russian & Italian tourists go to holiday in Turkey

Russian, Italians in Turkey

Russian, Italians in Turkey

The Italian America :

Italian America

Italian America

Further reading :

Leave a comment

Filed under EthnoViz

Tsarnaev Brothers: The right kind of Caucasian

There has been a lot of confusion last week, in the aftermath of the identification of two suspect terrorists in Boston : Dzhokhar and Tamerlan Tsarnaev. Twitter mistaked Czech Republic for Chechnya and Czech Ambassador Petr Gandalovic had to issue an official statement. A twitter user from Bulgaria complained that in the US suddenly “anyone with a last name ending on -EV or OV is a supposed Chechen“. Sarah Kendzior’s article The wrong kind of Caucasian made a parallel with the case of Leon Czolgosz, a 28-year-old American of Polish descent who assassinated US President William McKinley in 1901.

Recognizing language and ethnicity is a critical function of the kind of name recognition software used by governmental security forces to match misspelled foreign names against existing watchlists. That the FBI didn’t know Boston bomber traveled to volatile Dagestan region in Russia in 2012 because ‘his name was misspelled on travel documents‘ is rather suprising.

Chechnya is a subject of the Russian Federation, but Chechen names are recognisable from ethnic Russian names. The onomastics of the Russian Federation are complex due to the large number of ethnic groups, the population transfers during Soviet times, the rural exodus and various other factors. The following article shows our analysis of a few name classes, to illustrate the underlying complexity.

Russian Federation - a complex identity and ethnicity

About the author

Atlasys is a cartography workshop specialized in geopolitical analysis.

About the contributors

RUSSOSCOPIE™ is the distributor of INTEGRUM, a leading professional database covering Russian media, companies, and decision makers.

NAMSOR™ is a provider of applied onomastics, sociolinguistics and name recognition software.


Filed under EthnoViz