Tag Archives: bigdata

NamSor at RapidMiner Wisdom NYC 2016

In January this year, NamSor founder Elian CARSENAT was at RapidMiner Wisdom conference in New-York City. Discover theCUBE video.

The Big Apple: the World owns a piece.

We’ve analyzed NYC Open Data on Real Property (ACRIS) using RapidMiner with NamSor customer segmentation tool. Based on the socio-linguistics of personal names, we inferred gender, cultural origin and ethnicity to produce various maps and data visualizations.

It is fascinating how diverse New-York is. Who lives in NYC? Who owns a property? How do people vote?

Check out our presentation to discover some of the findings:

Try NamSor Extension for client segmentation

You can try NamSor API for free. NamSor names processing extension is an open source RapidMiner add-on available for download in RapidMiner MarketPlace.

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

 

Leave a comment

Filed under EthnoViz, General

NamSor, onomastics applied to GEOINT

Two weeks ago, we participated to the GEOINT conference organized at the French Geographical Society by ENS/Sorbonne. You will find our presentation below (all slides from 3 are in English) :

We will pursue our developments further in this field and participate to Esri Startup Program. Esri is an international supplier of Geographic Information System software, web GIS and geodatabase management applications.

[read in French Onomastique et GEOINT à la Société de Géographie]

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

Leave a comment

Filed under General

NamSor Extension for RapidMiner 6.5

RapidMiner empowers enterprises to easily mashup data, create predictive models and operationalize predictive analytics within any business process.It is a leading mining tool for the ‘big data’.

At RapidMiner Wisdom 2015, the user conference that took place in Ljubljana, Slovenia – a new release was launched with two forever free editions and one commercial edition of RapidMiner™ Studio 6.5.

We’ve also updated our Names Processing Extension for RapidMiner and it offers all the functions found in NamSor API :

  • Parse a Personal Name
  • Infer Gender from a Personal Name
  • Infer Origin from a Personal Name

It is found in the MarketPlace:

2015_NamSor_Ext_For_RapidMiner_65as well as on github.

Combined with RapidMiner other extensions, it can be used for many different use cases, in academic, public and private sectors, for example:

  • Gender Studies,
  • Migration Research,
  • Travel Industry,
  • ‘Big Data’ and predictive analytics,
  • Segmentation for Sentiment Analysis

You’ll find below our presentation at RapidMiner Wisdom

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

Leave a comment

Filed under General

[AGENDA] Paris GEOINT Conference

[UPDATE] read NamSor #GEOINT presentation at the French Geographical Society.

On 11/12 September in Paris, an international conference will gather geospatial intelligence professionals at the French Geographical Society (founded in 1827).

The scientific committee includes eminent researchers from Panthéon-Sorbonne University, IFG/Paris8, ENS, Bordeaux University as well as GEOINT professionals from the public and private sector (Ministry of Defense, Airbus Defense and Space).

Elian Carsenat, founder of NamSor, will present several applications of onomastics to mine the ‘big data’, infer valuable intelligence about identity and territories.

Other contributors include speakers from : DRM (French Military Intelligence), MoD, CNES, Magellium, IGN, GEO4I, GEO2012, THALES, Geodec Consult, DGGN (Ministry of Interior Affairs), ONERA, Carmenta, Spallian, ESRI, Luciad.

Please find below, the detailed agenda and location of the event:

Colloque international

Sous la direction de Philippe Boulanger

(Professeur à l’Institut français de géopolitique, Université Paris VIII)

Geospatial Intelligence

Révolution technologique, représentation spatiale et analyse géopolitique

En partenariat avec le département de géographie de l’Ecole normale supérieure-Ulm Airbus Defense and Space

Programme

Société de géographie

Grand amphithéâtre

184 bd Saint-Germain (VIe arr.), Paris

Vendredi 11 et samedi 12 septembre 2015

Entrée libre et sans réservation

COMPOSITION DU COMITE SCIENTIFIQUE

Colonel Philippe Arnaud (directeur, BGHOM, Ministère de la Défense), Pr Pierre Beckouche (Université Panthéon-Sorbonne), Renaud Bellais (Direction des affaires publiques, Airbus Group), Pr Philippe Boulanger (Président, Institut français de géopolitique, Université Paris VIII), Pr Emmanuèle Cunningham-Sabot (Directrice du département de géographie de l’Ecole normale supérieure de Paris), Col. Gilles Darricau (EMA, Ministère de la Défense), Pr Sébastien Laurent (Université de Bordeaux), Pr Barbara Loyer (Directrice de l’Institut français de géopolitique, Université Paris VIII), Myriam Fargues (Maître de conférences, Université Panthéon-Sorbonne), Alexandre Papaemmanuel (Grand Compte, Fonction Renseignement, Airbus Defense and Space), Pr Yann Richard (Directeur UFR de géographie, Université Panthéon-Sorbonne)

Vendredi 11 septembre

9h00 : Ouverture

Séance 1 : Le Geoint en appui des opérations

Président : Pr Philippe Boulanger (Institut français de géopolitique, Université Paris VIII)

9h15-9h45 : « Le centre Geoint des armées françaises: un centre d’excellence à la Direction du renseignement militaire, en appui des opérations » par le Général Christophe Gomart (Directeur du Renseignement militaire)

9h45-10h15 : « Observation spatiale militaire : enjeux et perspectives » par le Général Jean- Daniel Testé (Directeur du Commandement interarmées de l’espace)

10h15-10h30 : Discussion 10h30-11h00 : Pause

Séance 2 : Le Geoint : nouveaux enjeux, nouveaux équilibres
Président : Alexandre Papaemmanuel (Grand Compte, Fonction Renseignement, Airbus Defense and Space) ou autre responsable Airbus

11h00-11h20 : « Développement des applications spatiales, Quand les activités de Prospective et de Veille dans les autres pays alimentent un nouvel axe de Geoint » par Murielle Lafaye (CNES) et Thierry Rousselin (Magellium).

11h20-11h40 : « La géovisualisation, outil d’analyse Geoint » par Vincent Caillard (IGN).

11h40-12h00 : « Le tournant de la géographie descriptive et géométrique vers le raisonnement géospatial intégré ».par Alain Zumsteeg (Délégation générale de l’armement, Département géographie physique).

12h00-12h15 : Discussion
Séance 3 : Le Geoint : nouveaux enjeux, nouvelles réflexions Président : Pr Yann Richard (Université Panthéon-Sorbonne)

14h00-14h20 : « Les enjeux sociétaux de la géolocalisation : Geoint et analyse géopolitique » par Eric Morel (Directeur Défense et sécurité, Geospatial, Airbus Defense and Space) et Philippe Boulanger (IFG-Université Paris VIII)

14h20-14h40 : « Le Geoint, un outil d’action au service de la sécurité » par Jean-Philippe Morisseau et Lionel Kerrello (GEO4I)

14h40-15h00 : « Réflexions et propositions pour les minimiser dans le monde du Geoint » par Nicolas Saporiti (Geo2012)

15h00-15h20 : « Les outils d’aujourd’hui au service du Geoint de demain : analyse prédictive et incursion sur le big data » par Philippe Larde (Thalès Communications-Security)

15h20-15h40 : « Du « Geoint », slogan percutant mais réducteur d’une agence américaine…à la maîtrise de l’aide à la décision, véritable enjeu mondial d’influence et de gouvernance ! » par Jean-Armel Hubault (Geodec Consult)

15h40-16h00 : « Vers un cadre juridique pour le développement du Geoint : point de rencontre ou synthèse des droits ? » par Numa Isnard (IDEST – Université Paris Sud)

16h00-16h15 : Discussion 16h15-16h45 : Pause
Séance 4 : Table ronde Formations au Geoint et perspectives (16h45-18h30) Président : Pr Sébastien Laurent (Université de Bordeaux)

« L’information géospatiale produite à des fins de sécurité civile et son intérêt pour l’analyse géopolitique (Réflexions autour d’une expérience d’enseignement) » par Myriam Fargues

(Université Panthéon-Sorbonne)

« Réflexions sur un curriculum Geoint pour futurs managers sur la base des expériences 1999-2015 de Mines ParisTech et NTNU Trondheim » par Thierry Rousselin (Responsable du cours MP18 « Geointelligence for Natural Resource Evaluation and Sustainable Management », Honorary Professor Monget Jean Marie, créateur et animateur du cours de 1999 à 2006), Prof Emeritus Sinding Larsen Richard (créateur et animateur du cours NTNU1

« Geointelligence for Natural Resource Evaluation and Sustainable Management » de 1999 à 2010), Karine Guérin (chargée de cours MP18 à Mines ParisTech)

« Formation géopolitique et Geoint » par Barbara Loyer (Professeur et directrice de l’Institut français de géopolitique, Université Paris VIII)

« Quelle formation pour le renseignement géospatial? » par Bernard Kientz (Airbus Defence and Space)

Samedi 12 septembre
Séance 5 : Les outils du Geoint, l’analyse prédictive et la sécurité
Président : Renaud Bellais (Direction des affaires publiques, Airbus Group)

9h00-9h20 : « Un outil prototype de Geoint en gendarmerie : le SC2 ou la cartographie de l’éphémère pour la gestion de crise » par LcL Thibault Lucazeau et Cen (TA) Christophe Blanc (Direction des opérations, DGGN, Ministère de l’Intérieur).

9h20-9h40 : « Comment le ROIM peut-il contribuer au Geoint ? Quelques éléments de réponse »

Par Alain Michel (ONERA)

9h40-10h00 : « Les réalités du Geoint dans le cadre d’opérations extérieures (OPEX) à caractère multinational » par Marie Laboureix et Mans Beckman (Carmenta)
10h00-10h15 : Discussion 10h15-10h45 : Pause

Séance 6 : Les outils du Geoint et la gestion des données

Président : Jean Gusinel (journaliste spécialisé sur les questions de défense et de sécurité, Le Point)

10h45-11h05 : « Cartographier la Big Data ou comment ancrer l’intelligence économique dans le XXIe siècle » par Guillaume Farde (Spallian)

11h05-11h25 : « Le SIG une plateforme centrale pour le Geoint » par Jérémie Majerowicz (ESRI)

11h25-11h45 : « L’onomastique appliquée au décryptage des enjeux identitaires et de territoire » par Elian Carsenat (NamSor.com)

11h45-12h05 : « Luciad et le Geoint » par Jérôme Lutz (Luciad)

12h15-Conclusion du colloque

PDF Version http://www.geographie.ens.fr/IMG/file/Programmation%20colloque%20Geoint%20version%20juin%202015.pdf

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

Leave a comment

Filed under General

NamSor presented during Symposium on Academic Excellence

Our friend Tania Vichnevskaia of the French National Institute for Health (INSERM) presented the paper ‘Applying onomastics to scientometrics‘ yesterday at IREG International symposium organised by University of Maribor and Shanghai Jiao Tong University.

NamSor as a private start-up company has been solicited in 2014 by a European country to help measure the ‘brain drain’ affecting its competitiveness in the BioTech sector and to produce a global map of its scientific Diaspora (who are they, where are they and what are they doing). The objective was to build up the country’s scientific international cooperation and to engage its Diaspora.

Serendipity led analysts to discover interesting patterns in the way scientists names affect co-authorship and citation – not just for this particular country, but globally.

Last year, during ICOS2014 conference at Glasgow University, we presented how data mining millions of scientific articles in PubMed/PMC LifeSciences database uncovered amazing patterns in the way scientists names correlate with whom they publish, and who they cite in their papers.

We were interested to mine the large commercial bibliographic databases (Thomson WoS, Scopus) because they offer better data quality on citations and useful additional information, compared to PubMed:

– firstly, they have the full name in addition to the short name cited with just initials; this significantly reduces the error rate of onomastic classification

– secondly, they link scientists to research institutions (affiliations) and geographies (country of affiliation) ; this allows additional analysis on the topic of Diasporas and brain drain, comparing -for example- the research output of Chinese / Chinese American scientists in the US with that of scientists of Mainland China;

– thirdly, those databases have a larger coverage in terms of scientific disciplines, allowing comparison between different fields of research.

So collaboration started between NamSor and bibliometric experts at INSERM –the French National Institute for Health- to evaluate and visualize the effects of migration, Diaspora engagement and possibly cultural biases in Science.

This is Tania’s presentation at the conference:

What does the ‘onomastic millefeuille‘ of the global Cancer Research community look like?

201501_ThomsonWoS_CancerResearchOn this same topic:

The agenda of the Symposium is presented below

2nd Maribor Academicus Event

Academic Excellence: BETWEEN HOLY GRAIL AND MEASURABLE OBJECTIVES

International symposium  organised by University of Maribor and Shanghai Jiao Tong University

within the IREG Project on Academic Excellence

19-20 January 2015, Maribor, Slovenia

Higher education can importantly benefit from the rankings and league tables when used in a context with clear perspective of what ranking actually reflects (Prof. Jan Sadlak, President of IREG)
Active participants at the conference will be:

  •            Prof. Jan Sadlak, President of IREG,
  •            Prof. Gero Federkeil, CHE (Coordinator of Multi-Ranking),
  •            Prof. Nian Cai Liu,  Jiao Tong University in Shanghai (Author of the Shanghai ranking list),
  •            Prof. Seeram Ramakrishna,  National University of Singapore,
  •            Prof. Santo Fortunato,  Aalto University,
  •            Prof. Karin Stana Kleinschek, University of Maribor,
  •            Prof. Henryk Ratajczak, member of Czech Academy of Sciences,
  •            Prof. Edvard Kobal, Slovenian Science Foundation,
  •            Roberta Sinatra, PhD, Northeastern University,
  •            Tania Vichnevskaia, French National Institute for health (INSERM),
  •            Prof. Andrée Sursock, Senior Adviser at EUA,
  •            Prof. Øivind Andersen, University of Oslo.

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

NamSor launched FDIMagnet,  a consulting offering to help Investment Promotion Agencies and High-Tech Clusters leverage a Diaspora to connect with business and scientific communities abroad.

Leave a comment

Filed under FDI Magnet, General

Hispanic, French, German names in the United-States

NamSor has mapped Hispanic Twitter accounts around the world. Not just Hispanic: French and German as well.

This interactive world map of the Hispanic, French and German e-Diasporas was produced using Twitter account data.

To access the interactive map, click here: http://cdb.io/1dqVd2n

20140503_US_Twitter_GEOnomastics_vF

Twitter is an interesting source because about 3 per cent of Twitter accounts opt-in to show their Tweet location (using GPS from a smartphone) and can be visualised on a map.

Our method of anthroponomical classification can be summarized as follow: judging from the Twitter name only and the publicly available list of all ~150k Olympic athletes since 1896, for which team would the person most likely run (of France, Spain, Germany)?

NamSor Applied Onomastics is a European vendor of name recognition software (NamSor sorts names), which aims to help understand international flows of money, ideas and people. namsor.com

Further reading :

Leave a comment

Filed under EthnoViz

What’s in a Twitter name? A glance at the Irish digital Diaspora

To jump directly to the interactive map, click here : http://cdb.io/1beWaVB

(onomastics.co.uk reblog)

It’s been a while since I published a first ‘Feature of the Month’ in onomastics.co.uk and I can measure the progress made. The article, published in March 2013, showed maps of French and English investments in Africa, established by recognizing the names of Company Directors, instead of the traditional measurement of capital flows (FDI).

At the time, NamSor Applied Onomastics software was new and I was still exploring how such data mining tool, which recognizes personal names, could be useful. I was uncertain whether the social benefits would exceed the risks inherent to such powerful technology.

Names are a Code and contain a lot of information about an individual, but there is no determinism. Human groups of different levels can be recognized through names, but human societies are fractals. Each group can be broken down again and again, from different angles. A first name,  a last name, a Twitter handle are part of a person’s identity and may indicate a social intent, the belonging to an ethnic/linguistic group, a geographic origin, beliefs, … however at the finest grain level, every individual is unique and an exception to the group.

Genetic code, at one point, was thought to contain all the information needed to ‘build’ an individual from the physical point of view. After years of research, it seems that part of the information and the ‘algorithm’ are elsewhere…  Still there is huge interest in applied research such as 23andMe that ‘decrypt’ the genetic code to provide insights into a person’s ancestry, as well as hints about potential health issues.

The Name Code and the Genetic Code share the same ability to fascinate : each can somehow statistically be recognized to have an influence on your life, social status, average income, career… both relate to a family history. Each Code can be misleading and yet insightful. Fleur Pellerin, the French SME & ICT Minister, was born Kim Jong-suk in South Korea. She is both truly French and truly Korean, one name indicating a culture, the other a phenotype and genetic heritage. Considering only the Genetic Code would be denying a part of our humanity, which comes from being a child, a teenager, experiencing life, interacting socially, being part of a country and a culture, making one’s choices.

Twin studies would tell a lot about the links between those two codes (Name, Genetic) – if only there were more twins. Even though identical twins possess the same genetic makeup, they may go through different experiences throughout their lives that shape their personality, behaviour, and psychopathology in ways that make them unique relative to each other (Hughes et al., 2005). Twins will have a different first name.  Twins might also have a different last name, if -hypothetically- one twin was raised in Russia and the other twin was adopted and raised in the United States. In that case, what would the Name Code and the Genetic Code tell about potential Health issues (smoking or alcohol addiction, obesity & diabetes, life expectancy, etc.) ?

An article published last month caught my eye ‘Scientists seek volunteers willing to have genetic code published on internet‘: the hunt is on for 100,000 British volunteers to post their genetic information online in the name of science, as a North American open-access DNA project arrives in Europe. Personal Genome Project UK’s mission is ‘to make a wide spectrum of data about humans accessible to increase biological literacy and improve human health‘. The organization recognizes that ‘Even if a person’s name, home address or facial photograph is specifically excluded, a dataset like the one we are building is far from anonymous. It is simply too easy for someone to connect the dots and reveal a person’s identity.’ Genetic Code is a very personal data. Would you like to see yours published along with your Name Code and Identity? Yet if the identity of participants can be protected, I can see huge scientific value in such Open Data.

The Name Code, as such, is not personal data. Personal data is all information about yourself, that you should be allowed to keep confidential. A name is given to you as a communication tool, to interact with the World. There is a social intent in giving a child a common name, or a rare name that will more immediately identify a person – though I believe that one should be allowed to change names, just as Casanova did (who named himself Chevalier de Seingalt). There are legitimate reasons to keep one’s name and identity secret sometimes: you should be free to do so, unless that freedom infringes on someone else’s rights. A personal name (except possibly when it becomes a trademark) doesn’t belong to anyone : it’s been used before, it’ll be used again, it’s often shared by several people, it’s found in the press, it’s made up for fiction books … Could a democracy work without the citizen knowing their politicians’ names? How could historians do their research if we were to erase all personal names from the archives?

We see potential social benefits in applied onomastics and name data mining, that clearly exceed the risks of misuse : not just in social sciences research, but also in economic development, tourism, marketing, health, urban planning … We’ve helped one EU country reach out to its Diaspora in the US to originate foreign direct investments (FDI) and create jobs. We’re currently helping a BioTech scientific cluster raise its game through better understanding where the talents lay in that field, and where the brain juice flows internationally. We’re trying to find local partners to launch AgroDiaspora, an economic development initiative in Africa to foster stronger links between Sustainable Agriculture Transformation Projects and top-level BioTech scientists of African heritage, who could help make local plants climate-change resistant, among other benefits. We are also very excited and enthusiastic about a paper we submitted to ICOS 2014, the XXV International Congress of Onomastic Sciences, which will take place in Glasgow in August – as we foresee very positive outcome from that research.

In last month onomastics.co.uk feature ‘The Impact of Diasporas on the Making of Britain‘, Eleanor Rye mentions a very interesting research into what surname-based sampling can reveal about historic male migrations in the UK and Ireland.

We are currently conducting similar applied research on Twitter. I love Twitter. The freedom to choose one’s handle and name. The limited amount of structured information that goes with an account : a location, a language, a short profile, a few pictures. What’s in a Twitter name or handle? Anything : real names, company names, fancy names, pictograms, … the amount of information produced through Twitter is enormous, but it’s possible to filter this ‘bigdata’ in a way to make sense of it. We created geographic maps of e-Diasporas, by recognizing the Twitter names of geotagged tweets: Irish, Swedish, Russian, etc. We call this Twitter GEOnomastics, borrowing a term from Dr. Evgeny Shokhenmayer. Below is the map of the Irish e-Diaspora, along with Swedish and Russian.

Irish Twitter GEOnomastics

Irish Twitter GEOnomastics

Click here to access the interactive map:
http://cdb.io/1beWaVB

How does it work? The software accurately recognizes that ‘NamSor Applied Onomastics’ (@NomTri) is probably a trade mark or a company name, whereas ‘Elian Carsenat’ (@ElianCarsenat) is probably a personal name – and most likely a French name. Fancy names are also recognized and filtered out.

We see wide applications of such maps. When Captain James Cook explored the seas in the 18th century, having accurate maps could mean life or death for a ship and its crew. Working out latitude had been known for centuries, but measuring longitude was still tricky and inaccurate. In today’s digital world, I see latitude as ‘recognizing the semantics’ in a message expressed in a particular language and longitude as ‘recognizing the culture’ of the target audience. We’re full of curiosity on how and to whom this map can be useful, possibly Twitter itself. We’re going from Paris to Dublin in two weeks to find out : we hope to meet people at Twitter European Headquarters. Twitter just issued its IPO but is also not clear how to make its money. We’ll also meet Irish urban planners, people working in the tourism industry, investment analysts and Diaspora experts.

Read our next posts to discover more Twitter GEOnomastics maps showing Irish, French, German, Spanish, Russian, Turkish, Swedish, Italian, Dutch e-Diasporas (or cultural influence).

NB. The maps are currently interactive, so you can zoom in and out of a particular territory, however this may be shut down in a month or two.

[onomastics.co.uk | get a pdf version | academia.edu] Related : Can name data mining help economic development?

1 Comment

Filed under General

Onomastics for Business Data Mining

This is a reblog of ParisTech Review original article.

Can name data mining help economic development?

As of today, the main business application of onomastics is naming, or branding: finding the proper name for your company or your product to stand out in the world. Meaningfully, Onoma – the Greek root for name – is also a registered trademark of Nomen, the naming agency founded by Marcel Botton in 1981. Nomen initially licensed one of Roland Moreno’s inventions, the Radoteur name generator, and created many distinctive and global brand names such as: Vinci, Clio or Amundi. But once your business has a name, should you forget about onomastics? Not anymore. Globalization, digitalization and the Big Data open new fields to experiment disruptive applications in Sales & Marketing, Communication, HR and Risk Management. Though discriminating names carries a high risk of abuse, it can also drive new, unexpected ways for developing poor areas.

Our human brain interprets names every day, as we understand a language, as we know a particular culture or region of the world: the likely menu of a restaurant, the industrial sector of a company… even a dog’s name might tell you something about its owner. Personal names (first name, last name, a Twitter handle) carry meanings which vary according to one’s language and culture, but often form an essential part of one’s identity.

Extracting semantics from names

How exactly my brain works is not clear even to myself, but what if I could program a computer to extract semantics from names: would it provide valuable business intelligence? Some people in the US think so. The Central Intelligence Agency (CIA) has a long standing experience in extracting intelligence from personal names: back in the 80s they used LAS name recognition software to help identify Russian spies, recognize false identities, track soviet influence. LAS could rely on the CIA to help collect a database with one billion names to calibrate the software. That’s about the total world developed population at the time.

After thriving on the surge in US security and foreign intelligence budgets post-9/11, LAS considered diversification and started to address other markets: Marketing, Financial Services Compliance (notably KYC, ie. Know Your Customer). LAS was acquired by IBM in 2006. But to further increase their leadership, in 2011 the US security agencies used the MITRE Corporation to help foster further “innovation in technologies of interest to the federal government. Challenge #1 entailed multicultural name matching—a technology that is a key component of identity matching, which involves measuring the similarity of database records referring to people. Uses include verifying eligibility for Social Security or medical benefits, identifying and reunifying families in disaster relief operations, vetting persons against a travel watch list, and merging or eliminating duplicate records in databases. Person name matching can also be used to improve the accuracy and speed of document searches, social network analysis, and other tasks in which the same person might be referred to by multiple versions or spellings of a name”. A name tells more – or something different – than just a nationality of origin. For example, Boston terrorists Tamerlan and Dzhokhar Tsarnaev have names with a -v termination typical of Slavic names (as found in Russia or in Bulgaria) but can be recognized as originally from Caucasus. There was some media report in the aftermath of the bombing that the FBI didn’t know Boston bomber travelled to volatile Dagestan region in Russia in 2012 because “his name was misspelled on travel documents”. However this information remained unconfirmed and is probably not accurate given the massive US investment in name-matching technology.

In Europe, the legal framework to leverage such tools varies from country to country, but is generally very strict. The directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, article 8, states that “Member States shall prohibit the processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, […]” . In principle, this directive applies to Security Agencies as well, however there are exemptions which member states can interpret differently.

By making the distinction between the language ‘discriminatory ethnic profiling’ rather than the more common ‘ethnic profiling’ to describe the practice of basing law enforcement decisions solely or mainly on an individual’s race, ethnicity or religion the European Union recognizes the need of security forces to understand the complex relationships that exist between nationality, geography, and more subjective concepts such as: ethnic origins, cultural backgrounds, civilisations, religions. How the knowledge might be applied, how the data might be collected remains a matter of national security. The UK and France, for example, are known to have different views on this topic. In any case, what is done in practice by anti-terrorism agencies is not public information.

Security, border control, etc. is a business in its own right. What about other sectors?

Customer intelligence: business potential and ethical issues

In Sales & Marketing, onomastics can be used to enrich a customer database with information extracted from names that would not be practically or economically available otherwise. So retailers and luxury brands – especially in food, clothing and cosmetics where ethnicity plays a significant role – can improve customer intelligence and use those insights to better interact through online channels. Echoing concerns expressed by early 20th century John Wanamaker “Half the money I spend on advertising is wasted; the trouble is I don’t know which half”, companies like L’Oreal that spend several billion dollars a year on communication and advertising continuously try to improve the efficiency of their targeting.

Let us look at a more sophisticated example, for example public–private partnership (PPP) projects in mining, energy or infrastructure. Those projects can have significant social impacts in a territory and raise various political or economic issues. Understanding the human geography and recognizing the interests of the communities cohabiting in that territory can be critical to obtain a buy-in from all stakeholders. Onomastics, combined with geo-demographic segmentation, can help rapidly build geographic maps that can be used both for decision making and communication purposes. Automatic name clustering is the underlying technology that will help decrypt the complex identities present in large or small territories (from a continent, to a road). The objective is to answer tough questions and manage unavoidable frustrations though appropriate communication. Where should a tramway line pass in a multi-ethnic region? How to redistribute offshore oil revenues in the lands?

Concerning HR, I recently spoke with an executive at a large European bank who regretted that not enough trustworthy expatriates had been sent to control a large acquisition in a BRIC country, costing several hundred million Euros in write-offs. Among thousands of employees at the European head-office, the bank could have recognized the names of few people likely to accept an expatriation back to their home country. Having some people knowing both languages and both corporate cultures would have helped bridge the inter-cultural gap between the local management and other expatriates, saving millions of Euros.

In the digital world, onomastics brings a new view angle to social graph analysis: it can help colourize online communities, profile opinion leaders according to their audience. On Twitter, for example, you can more easily create a communication channels, well targeted on a particular community (business expatriates, tourists, migrants, but also international investors…)

Let’s now consider a provocative and controversial use of onomastics that will help us move on to the topic of ethics. Different cultures, nationalities and social backgrounds imply different behaviours, with respect to Money and Risk taking: earning, saving, spending, gambling, investing, donating, risking death and loosing it all… It is a fact that people with aristocratic names (in places where there is such an object as aristocratic names) would earn more and obtain cheaper credit than people with names typical of the lower class or a recent immigration wave. Why not take shortcuts: a bank could adjust the price of a credit, according to the borrower’s name; a car insurance company could adjust its evaluation of the risk (including the risks of insurance fraud, dangerous driving…) according to the name on the application form. They would better measure their risk. Furthermore, they could offer more competitive prices for categories of clients and they could better target them commercially.

Such use is highly controversial, since it raises the question of Equality (or inequalities) and discrimination. But discrimination is a fact, and onomastics can allow us to better see and understand how it works. Why should people with different sounding names hit glass ceilings in the first place, regardless of their skills? Casanova chose his own name de Seingalt and wondered if D’Alembert would have attained his high fame, his universal reputation, if he had been satisfied with his name of M. Le Rond, or Mr. Allround.

I am a supporter of Equal Opportunity Rights. And yet, I built a powerful discrimination algorithm based on names. NamSor is a piece of name recognition software which applies onomastics to analyse global flows of money, ideas and people. As any powerful new technology, it carries potential risks of abuse but I believe there is a positive use for it.

One classical application where onomastics plays a significant role is called geo-demographics: it consists in analysing the sociology of a particular territory (including the cultural and ethnic origins of its inhabitants) inferred from open sources and census data. Geo-demographics can be a useful tool to ensure, for example, that all populations have an equitable access to public services, such as hospitals. The company Experian is one of the leaders in that field, especially strong in the UK.

The effective use of the Big Data & Open Data is widely considered to be a critical enabler for future SmartCities : enabling dynamic allocation of resources, more efficient use of energy, prompt response to a crisis and so on. The combination of social networks and mobile applications with geo-localized devices opens new possibilities. Recognizing the diversity of populations that cohabit across space and time can help design more inclusive cities and transportation systems. Sensors that discriminate populations (in the sense of perceiving) can draw the clear picture needed to prevent discrimination (in the sense of favouring) and help defuse some of the time bombs ticking here and there.

Targeting diasporas: a game changer for development?

But the most promising use of software such as NamSor could be elsewhere – though it still deals with territorial equality. It is quite common for regions of the world that are less economically developed to use their own weakness (poorer people) as a strength (cheaper labour) to attract investments. The idea is to trigger a virtuous circle of job creation, infrastructure development, better education, migration flow reversal, etc. commonly known as the FDI Magnet effect. The region becomes more attractive and gradually moves up in the global value chain. As it loses competitiveness in terms of cheap labour because of the new wealth of its population, it develops a different economy based on innovation, services, tourism, consumption.

Most countries implement some kind of policy to direct flows of investments in poorer regions, as a mean to preserve their territorial cohesion and integrity. Those policies are most effective when they combine with successful private initiatives. So the objective of many Investment Promotion Agencies (IPA) is not so much to attract big money, as to attract a great business that will employ and help grow their people. The global competition to attract such investments is fearful.

Poorer regions have another weakness, which can be turned into a strength. Emigration is generally an opportunity loss, but after some years it generates a Diaspora which can be leveraged to attract investments back to the region.

For example, Ireland took decisive steps during the early 80’s to proactively reconnect with its emigrants or with successful businessmen of Irish descent. Rebekah Berry reminds us that “as recently as 1986 Ireland was one of the poorest countries in the European Union, but [in 2002] it is one of the richest. The engine of this new Irish prosperity has been Foreign Direct Investment (FDI). [Between 1986 and 2002], the Irish have done almost everything right. They have attracted huge amounts of money from America – due largely to a century of personal and familial ties – and they have used this money to build factories ”.

The regions of Ningxia, Gansu and Qinghai have amongst the lowest number of millionaires in China. But if they could reconnect with the few they have, in Beijing, Shanghai or even abroad, wouldn’t it make a difference?

For that purpose, onomastics can be a useful tool and it has served the development strategy of a European country, Lithuania.

InvestLithuania is the first Investment Promotion Agency (IPA) to use name recognition to originate FDI deals. With three million people living in Lithuania and nearly one million people of Lithuanian origin living abroad, there is a good many personal and familial ties to be leveraged to attract new investment projects to the country. NamSor name recognition software helped discover those ties. Another method to accelerate the origination of new investment leads is to better understand and leverage the existing network of foreign businessmen in the country itself. Domas Girtavicius, a Senior consultant at Invest Lithuania, said “we were impressed by the accuracy of the name recognition software: it reliably predicts the country of origin and the number of false positives is fully manageable”.

This project with InvestLithuania was very successful and consequently I was invited to participate to the World Lithuanian Economic Forum (WLEF), which took place in Vilnius this year, on the 3rd of June. This Forum is organized by Global Lithuanian Leaders (GLL), a non-profit association whose mission is to reconnect with Lithuanians and friends of Lithuania abroad. I found the GLL to be a great initiative, providing the country with a wealth of expertise from different parts of the word, across all domains (politics, education, culture, business…), and also bridging some of the cultural gaps that necessarily exist in such a matrix (place / domain). Specifically, the GLL helps bring elements of culture from the US and UK, such as entrepreneurship and business networking.

While some diasporas, especially those originating from the Mediterranean, have a millennium standing culture of business and personal networking, other countries struggle to adjust to their new situation. What is the value of a social network such as LinkedIn to the Lebanese Diaspora? Low. What better communication tool in Marseilles than “word of mouth” to launch Massilia Mundi, which aims to become the social network of that city international Diaspora? But for many Investment Promotion Agencies (IPAs), LinkedIn is an essential tool. For example, in traditional Lithuanian culture, people treasure strong family ties and personal links with close friends, but do not nurture a wide network of professional connections or casual contacts. I believe many countries are in a similar situation, where a dedicated organisation could help reconnect people : for them, tools such as the social networks, professional databases and onomastics can make a difference.

Could that work also for regions in China? In 2005-2009, while I was working for a global consulting firm, I had the opportunity of managing an project in banking, with a mix of Chinese and French teams: a team in Paris which included several young ParisTech graduates of Chinese origin and a team in Shanghai. I remember the excitement and the pleasure of the entire team – including myself – to do a project connected with China, with the opportunity of travelling to Shanghai, tasting the food of different regions of China, being introduced to the Chinese culture. Several people from that team, both French and Chinese, are now in China. Jing, now a dear friend, went back to Shanghai in 2009 and I remember how she still felt sentimentally bound to her original city of Xiangtan, Hunan – ready to help in any way she could. From this experience, I understood that if there existed such an organization as ‘Global Ningxia, Gansu and Qinghai Leaders, it would not often encounter rebukes when reaching out for help, money or expertise. Such an organization could be very helpful in closing the economic gap with other regions.

Technically, Chinese names are clearly recognizable amongst other nationalities or origins. So, querying a professional database, we can produce onomastics mapping of Chinese company directors. For example, the following maps represents the density of Chinese and Japanese business communities in Southern Latin America, relatively to each other.

LatAmjapanchina

Source: Factiva DF Copyright 2013 NamSorts.com NomTriTM NamSorTM – All rights reserved

How many of those successful Chinese businessmen (or businessmen of Chinese origin) come from Ningxia, Gansu or Qinghai? This is where applied onomastics can be a game changer. Not that all questions are solved. At the present time, the available software allows us to detect phenomenons, not to understand them perfectly. For instance, I would like to share two data visualizations produced as part of this effort, which I found beautiful and promising.

20130827_ChineseOnoma_Teaser1

20130827_ChineseOnoma_Teaser2

What do we see here? Something – something that still needs to be analysed and understood, but something that may be of great value for someone trying to locate and identify potential investors or decision makers. Chinese last names actually raise specific challenges, since they have been used for many centuries and with rare or less common names disappearing over time, only one hundred different names remain today. But first names still carry regional differences, poetry and other semantics. Roots may be almost invisible, onomastics can still track them. And the more difficult is the tracking, the more valuable are the findings.

Logo Paris Tech Review

This content is licensed under a Creative Commons Attribution 3.0 License

You are free to share, copy, distribute and transmit this content

Logo creative commons

Download documents : Onomastics for Business.pdf (English version) Onomastique et Big Data.pdf (French version) Mirrors: [Harvard.edu] [arXiv]

Leave a comment

Filed under FDI Magnet, General

Onomastics to monitor Russian media

Business intelligence firm AESMA used NamSor name recognition software to analyze the coverage by the Russian media of Arab personalities and businesses, between January 2004 and March 2013, using database INTEGRUM.

Arab People In Russian Media (2013)

 

 

 

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

 

Leave a comment

Filed under EthnoViz

Name Is the Game, still

In 2003, Richard J. De Lotto and Kimberly Collins at Gartner published The ‘Name Is the Game’, a five-page document about Language Analysis Systems (LAS), a small US company specialised in Name recognition. This company had been around for almost twenty years, with hardly any relevant competitor. In fact, it was a “Sole Source Provider” to the US Government for 17 years, meaning that the US administration could find no good alternative at the time. Born in the US during the Cold War, LAS primary market was intelligence and security agencies. Apart from its software product and team of experts, LAS had a major asset: a database of about one billion international names from all countries (that’s about 100% of the population of the more developed countries, so it probably included my name). Given the development of the Internet at the time, it’s unlikely that those names were collected from open sources. More likely, they were collected as part of a mutually benefiting collaboration with US agencies. After thriving on the surge in US security and foreign intelligence budgets post-9/11, LAS considered diversification and started to address other markets: Marketing and Financial Services Compliance.

LAS was acquired by IBM in 2006. In his blog, Jeff Jonas of IBM decribes the benefits of bridging Name resolution and Identity resolution. Today, LAS technology is part of The IBM® InfoSphere® suite of products and is known as IBM Entity Analytics Solutions or IBM Global Name Analytics. So what can it do?

Among other features, given a person name the software can predict (with probabilities): its cultural and linguistic classification, country of origin, gender and spelling variants.

Ain’t It Cool ?

About NamSor

NamSor™ Applied Onomastics is a European vendor of Name Recognition Software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people.

Reach us at: contact@namsor.com

2 Comments

Filed under General