Building Multi-Source Databases for Comparative Analyses

In Winter 2019, from December 16 to 20, CONSIRT – Cross-national Studies: Interdisciplinary Research and Training program of The Ohio State University and the Polish Academy of Sciences, is organizing the international event Building Multi-Source Databases for Comparative Analyses. The event comprises two days of conference-style presentations on survey data harmonization in the social sciences, followed by a 3-day workshop on ex-post survey data harmonization methodology.

Both the conference and the workshop will be held at the Institute of Philosophy and Sociology, Polish Academy of Sciences, Warsaw, Poland. They are jointly set within the Survey Data Recycling (SDR) Project (NSF 1738502) and the Political Voice and Economic Inequality across Nations and Time (POLINQ) Project (NCN 2016/23/B/HS6/03916).

About the Conference

The Conference (December 16-17) aims to facilitate discussions on methodology of survey data harmonization, and collaboration on a co-edited book that Christof Wolf (University of Mannheim, and GESIS) and the PIs of the Survey Data Recycling (SDR) Project, Kazimierz M. Slomczynski, Irina Tomescu-Dubrow and J. Craig Jenkins are preparing. To garner insights from discipline-specific and interdisciplinary views on the challenges inherent to harmonization and how these challenges are met, the conference will join contributions from sociology, political science, demography, economics, and health and medicine.

About the Workshop

The Workshop (December 18-20) will feature the SDR database as a key empirical resource to discuss substantive and methodological considerations in building multi-source databases for comparative analyses. The SDR database covers more than four million respondents surveyed from 1966-2017 in ca. 140 countries. It contains individual-level measures of socio-demographics, political attitudes and behaviors, social capital, and well-being, constructed via ex-post harmonization of social survey data pooled from ca. 3,400 national surveys stemming from 23 major cross-national survey projects, including the World Values Survey, the European Social Survey, and the International Social Survey Programme, among others.

The SDR database also contains source survey quality and harmonization process metadata that we stored as control variables in the database and that are available for analyses. An initial version of the SDR database, covering the period 1966 – 2013, 1721 national surveys from 22 cross-national projects, and 2.2 million respondents, is available by contacting the SDR project, and from Harvard Dataverse.

Experiences within the Survey Data Recycling (SDR) and the Political Voice and Economic Inequality across Nations and Time (POLINQ) projects inform the Workshop. Using the SDR and other databases, POLINQ constructs a dataset of survey-based aggregate measures of political participation and representation featuring young and established democracies since the 1990s.

Day 1 of the Workshop will be devoted to discussing (a) survey data recycling (SDR) as a framework for reprocessing extant cross-national survey data and ex-post harmonization, (b) the structure of the SDR database, and (c) conceptual and practical issues of constructing datasets stemming from the SDR database, including for the POLINQ project. Discussions will be led by members of the SDR and POLINQ projects.

Day 2 will be devoted to missing data imputation. Stef van Buuren, professor of Statistical Analysis of Incomplete Data at the University of Utrecht and statistician at the Netherlands Organisation for Applied Scientific Research TNO in Leiden, will deliver the lectures on missing data imputation for survey datasets with a multi-level structure, focusing on how to solve comparability problems by multiple imputation. Dr. Michał Kotnarowski from the Institute of Philosophy and Sociology, Polish Academy of Sciences, will lead the computer lab session.

Day 3 will be devoted to discussing the use of individual-level data from cross-national surveys to construct measures of characteristics of countries in given years (macro-level). Social scientists frequently aggregate survey data, yet they rarely discuss the extent to which country-year indicators constructed via aggregation are valid and reliable. The task is especially difficult when aggregation involves behavioral and attitudinal survey items that lack ‘external benchmarks’ against which to judge the summary statistics derived from survey data. Discussions will be led by members of the SDR and POLINQ projects.

Information about the conference and workshop, including the program, abstracts, supplementary materials, can be found on the EVENTS page.