Tag Archives: Turkish names

Onomastic sampling for migration studies

On Friday morning, I had the opportunity to present our breakthrough data mining technology at Regent’s University Turkish Migration Conference (TMC2014, London).

The supporting presentation can be downloaded here (20140530_TMS2014_Pitch_vFf.pdf) or viewed online here.

20150601_TurkishMigrationStudies

During the following sessions by researchers from various countries (Turkey, US, UK, Germany, Netherland, Sweden, Norway, Belgium …), I learned some of the ‘jargon’ of migration studies and also something about the particular research methodologies applied in that field.

My initial vision was that onomastics (the recognition of personal names) could be applied to discover new migration patterns. It was based on several preliminary meetings with international organizations concerned with migration issues. Census data can take up to three years to process. As states struggle to provide timely and accurate data to international organizations (such as the OECD, IOM, United Nations High Commissioner for Refugees UNHCR, …), these organizations can turn to the Big Data to identify and monitor new trends. There are challenges in identifying relevant data sources to provide valuable information about less digitally connected migrants. Twitter, LinkedIn, Google, Facebook, D&B, Thomson WoS … combined with applied onomastics can tell us a lot about the changing migration patterns of STEM Workers, innovators and entrepreneurs.

STEM Workers: workers in science, technology, engineering, and mathematics; art is occasionally considered as well (STEAM Workers).

With several TMS2014 sessions focused on the question of Turkish identity, or the particular migration and integration patterns of the Turkish, Kurdish, Alevi or Circassian communities, applied onomastics clearly offers an innovative tool to look at data from a different angles (nationality/birth place/ethnicity/gender/…)

However, I found that many research studies are conducted based on an initial theoretical hypothesis. Researchers then apply various qualitative or quantitative methods (occasionally both) to assess the hypothesis. Pure quantitative methods such as ‘data mining’ or ‘graph analysis’ as seen as de-humanizing by researchers (anthropologists, sociologists, historians …), primarily interested in the human story of migration. Most researchers conduct surveys to gather the data for their study : they find people, talk to them, ask questions. How do researchers identify to group of people to be surveyed (the sample)? During the conference, I learned another jargon: network/snowball sampling.

Network/snowball sampling: Snowball sampling is based on the selection of target people in personal networks. In a first step, important people within the target group are identified (initial sample) who themselves identify further people who can be also addressed for the survey (McKenzie & Mistiaen, 2007, p. 2; Salentin, 1999, p. 124).

As often, this new word was the magic keyword to find additional resources and understand how NamSor technology could fit with the current start of migration research methodology:

This document clearly describes the various methodologies to identify the initial population of a study and the various sampling procedures. Onomastic sampling is one of them.

‘In many countries, migrants constitute a substantial part of society. In public opinion research, however, they are often inadequately or not at all considered. This paper gives a systematic overview of the underlying methodological challenges that cause this situation. Those challenges are twofold and concern (1) the definition and distinction of the terms migrant and foreigner to describe the target group and (2) the selection of adequate sampling procedures.’

‘The methodological challenge of selecting adequate sampling procedures

Even after defining the target population, researchers still face difficulties regarding sampling. The problems tackled can be divers, for instance in what way the target population can be contacted (which survey modes are culturally accepted?) and how the individual respondents can be selected (e.g. does last-birthday work?). The paper discusses four central sampling procedures which regularly come up in the literature and which are seemingly appropriate for these kinds of surveys:

1. Sampling procedures on the basis of administrative records,

2. Area sampling, like e.g. random-route-procedures,

3. Network/snowball sampling, and

4. Onomastic sampling procedures based on foreign names from directories.’

How NamSor software can help?

1. Sampling procedures on the basis of administrative records

In this sampling method, the administrative records does not reflect the fine-grain identity of the populations: ‘Turkish nationality’ or ‘Born in Turkey’ encompasses many different populations. Applied onomastics can help refine samples to more targeted populations (Turkish, Alevi, Kurdish, Syrian, …)

2. Area sampling, like e.g. random-route-procedures

In this sampling method, it’s critical to understand the geo-demographics of a territory to know where different migrants populations are concentrated. Applied onomastics can help assess the density of migrant populations at various levels (region/city/district or road) from various public data sources.

3. Network/snowball sampling

In this sampling method, the personal network of the researcher is used an an initial seed to identify further prospects for interviews. Applied onomastics could help analyse personal networks of researchers (from social networks such as Twitter, or academic sources  such as bibliographic databases) to identify larger seed networks and generate better sampling. That could help reduce the risk of biases induced by the researcher’s network (reinforcing its own personal or cultural biases).

4. Onomastic sampling procedures based on foreign names

Dictionaries of given names and family names associated with a particular culture have been used for sampling.

NamSor software goes beyond this technique to use sociolinguistics and recognize in a (fistName, lastName) pair the likely origin of a person, with high accuracy. NamSor software can help researchers conduct onomastic sampling, not just from telephone directories but also from a wide range of modern data sources : social networks, opt-in commercial databases, … with high precision and fine-grain targeting.

Conclusion

NamSor powerful technology raises many data privacy and ethical questions, but we’re glad to say that if science and migration studies can be good for society, NamSor can be too.

About NamSor:
NamSor mission is to help understand international flows of money, ideas and people. NamSor launched GendRE API, a free API to conduct analysis of gender equality using opendata. http://namesorts.com/api/

Leave a comment

Filed under General

Turkish Onomastics and Migration Patterns

Next week at Regent’s University Turkish Migration Conference (TMC2014, London), Elian Carsenat will present breakthrough data mining technology to apply onomastics (the recognition of personal names) to the discovery of new migration patterns.

20140522_TMC_Flyer

As states struggle to provide timely and accurate data to international organizations (such as the OECD, IOM, United Nations High Commissioner for Refugees UNHCR, …), these organizations can turn to the Big Data to identify and monitor new trends. What can Twitter, LinkedIn, Google, Facebook, D&B, Thomson WoS … tell us about the changing migration patterns of highly educated professionals, entrepreneurs? We’ll present how applied onomastics and the Big Data can be a game changer in migration studies, with vast implications on how countries or even regions can engage their Diaspora (to attract FDI, remittances, to build networks of expertise, …)

We look forward to see you at Regent’s University Turkish Migration Conference (TMC2014, London). Full program here.

To download the supporting presentation 20140530_TMS2014_Pitch_vFf.pdf

Further reading:

Leave a comment

Filed under EthnoViz

DataViz of the Dutch Digital Diaspora

As a final map in our Twitter GEOnomastics serie, we present today the Dutch e-Diaspora.

To prepare the mapping, we recognized Twitter names as Dutch, Turkish or Spanish and filtered those having a geotag (~3% of tweets).

Emigration from the Netherlands has been happening for at least the last eight centuries. In several former Dutch colonies and trading settlements, there are ethnic groups of partial Dutch ancestry. Emigrants from the Netherlands since the Second World War went mainly to the United States, Canada, Australia, New Zealand, and until the 1970s South Africa. There are recognisable Dutch immigrant communities in these countries. Smaller numbers of Dutch immigrants can be found in most developed countries. In the last decade, short-range cross-border migration has developed along the Netherlands borders with Belgium and Germany. Source: Wikipedia

To access the interactive map: http://cdb.io/1fsjItu

Dutch Digital Diaspora

Finally, we present the summary of the different Twitter GEOnomastics mappings we’ve published so far :

 

NamSor Applied Onomastics is a European vendor of name recognition software (NamSor sorts names), which aims to help understand international flows of money, ideas and people. namsor.com

NamSor will be at Big Data Paris on the 2nd of April 2014 and present at 5PM the potential benefits of mining the Big Data to reduce inequalities, promote Foreign Direct Investments in less favoured territories, using Diaspora Marketing. Meet us there!

Leave a comment

Filed under EthnoViz, General

Making sense of Big Data : mining Twitter names

Millions of geo tweets in various languages, discussing anything from ‘hey, I’m here‘ to finance, geopolitics or marketing. How do you make sense of them?

We’ve used name recognition (applied onomastics) to filter information and produce unique maps of the e-Diasporas. Where are the digitally connected Italian, Turkish and Russian today? They may be migrants, tourists, business travellers, student, visiting scientists…

To jump directly to the interactive map, click here : http://cdb.io/1iSeWw2 or read more about our methodology.

Italian, Russina, Turkish Twitter

Italian, Russina, Turkish Twitter

TIP : Filter out layers and zoom in/out.
Below we filtered out the Turkish Twitter layer to visualize where the Russian & Italian tourists go to holiday in Turkey

Russian, Italians in Turkey

Russian, Italians in Turkey

The Italian America :

Italian America

Italian America

Further reading :

Leave a comment

Filed under EthnoViz