Harmonizing Corruption Items in Cross-national Surveys

by Ilona Wysmułek, Graduate School for Social Research, Polish Academy of Sciences

Corruption, given its secretive nature, is a phenomenon that is hard to capture in the interview situation.

In corruption research, surveys are among the major sources of our knowledge about the subject (Heath, Richards and de Graaf 2016; Karalashvili, Kraay and Murrell 2015).  However, there are several methodological challenges to studying cross-national trends in corruption with public opinion data. Corruption, given its secretive nature, is a phenomenon that is hard to capture in the interview situation. Some respondents are reluctant to answer sensitive questions and some may understand the concept differently than intended by researchers (Azfar and Murrell 2009; Bertrand and Mullainathan 2001). Moreover, international survey projects dealing with corruption continue to face challenges of unequal country representation. Estimation of rare event determinants also remains problematic, given that reported corruption instances are, for most modern democracies, highly infrequent.

To overcome some of these methodological problems, I apply ex-post harmonization of cross-national survey data in corruption research. In my dissertation project, I study corruption perception and individual corruption experience of giving informal payments (as a bribe or a gift) in public schools in Europe. I use cross-national survey data on corruption in public schools in Europe combined with country-level indicators, for example from the World Bank Education Statistics and OECD’s Education at a Glance. I follow the Survey Data Recycling (SDR) framework developed by the research team of Kazimierz M. Slomczynski, which provides a blueprint for ex-post survey data harmonization and for integrating surveys and other data sources (please see corruption project for more detailed information).

In this article, I present an overview of existing survey data suitable for research on corruption, and their documentation. The overview of available data is important in the process of conducting research, similar to summarizing relevant literature; however, scholars rarely discuss it explicitly. This information is essential for creating a ‘common file’ with source variables of interest – itself a key step in ex-post harmonization. The growing availability of survey data during the last 20 years offers rich topic coverage and multiple research opportunities, but also demands knowing where data sources are located, and what issues on corruption they cover. I intend for this article to help others find these sources.

By and large, the leading role in providing public opinion data for corruption analysis is played by two international organizations: Transparency International, with the Global Corruption Barometer survey project based on an adult population sample, and the World Bank, with the World Bank Enterprise Survey based on a firm-level sample (Holmes 2015). Additionally, there are a number of high-quality international public opinion surveys that cover the topic of corruption, along with other items on government perceptions, democratic values, and institutional experiences, however research on corruption rarely uses them.

In reviewing questionnaires and codebooks of international public opinion surveys in search of items on corruption, I started with the pool of 22 international projects in the Data Harmonization Project. Based on the review of literature and consultations with experts, I also went through the collections of data archiving institutions, such as the Inter-university Consortium for Political and Social Research (ICPSR), GESIS Data Archive for the Social Sciences, and ROPER Public Opinion Research Archive. From the pool of possible data sources, I selected the 17 listed in Table 1, based on the criteria listed below.

Selection Criteria

I pooled the datasets and their documentation (master codebooks and questionnaires for project waves) for all relevant survey project waves meeting following criteria:

–    Surveys are cross-national and feature European countries;

–    Their samples are intended to be representative of the adult population of a given country;

–    They are academically driven and available free of charge for non-commercial use (in public domain or upon request);

–    Their documentation is written in English;

–    They contain at least one item dealing with corruption.

Using Cygwin command-line environment for automatic search, the input files (codebooks, questionnaires and SPSS dictionaries) were checked for lines containing a match with such key words (and their grammar variations) as ‘corrupt’, ‘bribe’, ‘gift’, ‘tip’, ‘favor’ (‘favour’), ‘compensation’, ‘reward’, ‘payment’, ‘present’, ‘tie’, ‘connection’ and  ‘network’.  If the program detected a matching key word, I explored the neighboring questions and response categories, to get information on the contextual meaning of the key word and to document the availability of the filtering questions (e.g. contact with an institution) and follow-up questions (e.g. amount of bribe). Key words refer to a broad definition of corruption as ‘the misuse of public position for private gains’ (see Heath, Richards and de Graaf 2016 for corruption definition overview) and capture items that ask generally about corruption and about its specific types, such as for example bribe-giving or nepotism.

Characteristics of Selected Datasets

Table 1 presents basic information on survey projects that met my selection criteria. The table shows the full name of the project, its abbreviation, the number of waves in which the corruption item appeared, year coverage, and availability of documentation sources for survey projects.  It also shows the total number of items on corruption available in the survey project.

Table 1. International Surveys featuring Corruption Items: Basic Characteristics and Documentation

Survey name (abbreviation) Number of waves Years* Master Documentation Number of corruption items


#Questio-nnaires #SPSS dictionaries
Special surveys
Eurobarometer Corruption Themed 64.3, 68.2, 72.2, 76.1, 79.1 (EB_corr) 5 2005-2013 5 5 283
Global Corruption Barometer (GCB) 8 2003-2013 8 1 349
International Crime Victims Survey (ICVS)** 4 1992-2005 4 1 108
Life in Transition Survey (LITS) 2 2006-2010 2 2 43
Large general surveys
European Social Survey (ESS)** 2 2004-2010 2 2 5
European Values Study (EVS)** 3 1990-2008 3 1 4
International Social Survey Programme (ISSP) 3 2004-2009 3 3 7
World Values Survey (WVS)** 4 1989-2005 4 1 5
Smaller surveys: general
Asia Europe Survey (ASES) 1 2000 1 1 3
Comparative Study of Electoral Systems (CSES) 1 2001 1 1 1
European Quality of Government Survey (QoG) 2 2010-2013 2 2 20
General Eurobarometer (EB) 7 1997-2012 7 7 12
International Social Justice Project (ISJP)* * 2 1991-1996 2 1 4
Pew Global Attitudes Project (PEW) 4 2002-2012 4 4 9
Smaller surveys: regional
Candidate Countries Eurobarometer (CCEB) 2 2003 2 2 5
Caucasus Barometer (CB) 4 2009-2012 4 4 10
Consolidation of Democracy in Central and Eastern Europe (CDCEE)** 2 1990-1998 1 1 11
New Baltic Barometer (NBB)** 6 1993-2004 1 1 14
Values and Political Change in Postcommunist Europe (VPCPCE)** 1 1993 5 5 2
TOTAL 63 1989-2013 35 26 45 895

* Note that in situations when survey projects do not specify the year of survey wave but do give year brackets (for example WVS 2005-2007), I recorded the year when the questionnaire was first launched (WVS 2005).
** The number of documentation sources and data waves was different in two cases: (1) when the merged data set and documentation for all survey waves was used (see CDCEE and NBB) (2) when the datasets and their documentation were available on the country level only but not on the project wave level (see VPCPCE). In all cases if both master codebook and master questionnaire were available, priority was given to the master questionnaire.

I divided surveys into three categories:

  1. Survey projects with a block of corruption items (“special surveys”): all selected survey projects and their waves that contain the block of items on corruption (more than ten items per survey wave);
  2. Large general survey projects with some corruption items: large survey projects in terms of country and year coverage with less than ten items on corruption;
  3. Smaller survey projects with some corruption items: smaller survey projects covering

minimum three countries and with less than ten items on corruption per survey wave, subdivided into other general survey projects and other regional survey projects.
For each item on corruption, I gathered information on: survey name, survey wave, year of survey wave, name of the variable, question wording, response categories, comments (including mainly information on filtering items and waves repeating the item).
A total of 895 items on corruption are distributed in the 19 international survey projects and their 63 waves (or editions) that met the selection criteria. Generally, the number of corruption items is not equally distributed across different surveys and survey waves. Some selected survey waves contained the block of corruption items, whereas other waves of the same survey project may contain one or two items.

My study revealed that there are several public opinion survey projects that contain a large block of corruption items, and a great amount of survey projects with several corruption-related questions per wave. Despite all the information available, there are rarely attempts to harmonize it and to verify our knowledge using multiple sources of data (see Table 6). Interest in the subject is growing rapidly, especially since 2003, which opens new possibilities on corruption research (see Figure 1).

Figure 1. Number of survey project waves and the average number of corruption variables per survey project wave by year



Azfar, Omar & Peter Murrell. (2009) Identifying Reticent Respondents: Assessing the Quality of Survey Data on Corruption and Values. Economic Development & Cultural Change 57(2): 387-411.

Bertrand, M. & S. Mullainathan (2001) Do People Mean What They Say? Implications for Subjective Survey Data. American Economic Review 91: 67-72.

Karalashvili, N., Kraay, A., Murrell, P. (2015) Doing the Survey Two-Step: The Effects of Reticence on Estimates of Corruption in Two-Stage Survey Questions. World Bank Policy Research Working Paper 7276.

Seligson, Mitchell A. (2006). The Measurement and Impact of Corruption Victimization: Survey Evidence from Latin America. World Development 34(2): 381-404.

van Kesteren, J. N. (2007) Integrated Database from the Crime Victim Surveys (ICVS) 1989-2005, Tilburg, INTERVICT.

Heath, Anthony F., Lindsay Richards, and Nan Dirk de Graaf. (2016) Explaining Corruption in the Developed World: The Potential of Sociological Approaches. Annual Review of Sociology 42, no. 1: 51–79.

Ilona Wysmułek is a research assistant at the Research Group on Comparative Analysis of Social Inequality and a PhD candidate in sociology at the Institute of Philosophy and Sociology, Polish Academy of Sciences. Her dissertation research is supported by the Mobility Grant of the Ministry of Science and Higher Education of Poland funding a half-year visiting fellowship (winter-spring 2016) at the Mershon Center for International Security Studies, The Ohio State University and the University of Chicago (National Opinion Research Center) (1292/MOB/IV/2015/0).

This article is adapted from the Harmonization newsletter.