SDR Data Structure and Documentation
The SDR2 database comprises three datasets produced in the Survey Data Recycling project.
These files are:
- The MASTER file, which stores harmonized information for a total of 4,402,489 respondents. It includes: a. 50 harmonized target variables (T-prefixed variables); b. 74 harmonization control variables (C-prefixed variables), which capture source item properties and harmonization decisions. Harmonization controls generally do not vary across respondents of a given national survey. Exceptions are noted in specific target variables’ General Variable Report, GVR. c. 7 source data quality controls (Q-prefixed variables), including flags for non-unique cases or cases related to suspicious values of source age and household size. These variables are measured at the respondent level (e.g., missing case ID).
- The PLUG-SURVEY file is an auxiliary dataset containing controls for source data quality and a set of technical variables needed for merging this file with the MASTER file. Quality control variables on the national survey level pertain to two main aspects: (1) survey methodology and survey quality as reflected in documentation, and (2) general data quality (e.g., presence of non-unique records in a national survey, availability and correctness of survey weights). There are also quality control variables measured at the survey wave level, specifically focusing on processing error measures of discrepancies between data and documentation for selected variables.
- The PLUG-COUNTRY file is a dictionary of countries and territories used in the MASTER file. It contains basic geographical information: alpha/numeric standardized ISO-3166 country codes, country names, and codes and names for micro- and macro-regions of the world.
Structure of the SDR2 Database: description of files and their contents
Name of file | Content description | Key variables / case identifiers | Documentation files |
MASTER | Individual (respondent) level data including target, harmonization control and quality control variables | T_DATAFILE T_ID T_PROJECT_NAME T_PROJECT_WAVE T_COUNTRY_L1U T_COUNTRY_L2U | SDR2_MASTER_File_Overview.pdf SDR2_Missing_Codes_Schema.pdf SDR2_Source_Data_Files.xlsx SDR2_Cotton_File.xlsm SDR2_Metadata.xlsx SDR2_MASTER_Syntax_File.sql General Target Variable reports (GVR) pdf files Detailed Target Variable reports (DVR) xlsx files Cross-Walk Table reports (CWT) xlsx files for target variable of SDR2 (see Appendix for the detailed file list) |
PLUG-COUNTRY | Country-level data including ISO-3166 country codes and names of geographical regions | T_COUNTRY | SDR2_PLUG_COUNTRY_File_Overview.pdf SDR2_Source_Data_Files.xlsx |
PLUG-SURVEY | Characteristics of national surveys including sampling method and response rate, indicators of survey quality, availability and correctness of survey weights, and discrepancies between data and documentation | T_PROJECT_NAME T_PROJECT_WAVE T_COUNTRY_L1U T_COUNTRY_L2U | SDR2_PLUG_SURVEY_File_Overview.pdf SDR2_Source_Data_Files.xlsx |
This Table provides basic information about the SDR2 files, including their names as referenced throughout SDR2 documentation, key variables, and names of the corresponding documentation files. The key variables can be used either for relating individual respondents in the MASTER file and source data files (T_DATAFILE and T_ID), or for merging the MASTER file with the PLUG files. The link between MASTER’s T_DATAFILE and the source data file can be established through the list of source data files in SDR2_Source_Data_Files.xlsx. T_ID is an SDR variable. It does not appear in source data files, but it refers to the position of a case (respondent) in the corresponding source file, i.e., it uniquely identifies a case (respondent) within a source data file.
We thoroughly documented each of the three SDR2 datasets, as well as each of the substantive target variables, following extant standards in the literature. All documentation can be accessed on the SDR2 page of the Harvard Database.