Big Data and Political Behavior

by J. Craig Jenkins, Kazimierz M. Slomczynski, and Joshua Kjerulf Dubrow

An upsurge in popularity is not necessarily a revolution.

The wealth of quantitative data—including data from cross-national survey projects, official governmental and nongovernmental organization (NGO) statistics, newspapers and electronic newswires, and a variety of Internet-based websites, blogs, and social media sites—has generated a large and growing empirically based literature on political behavior.

Yet, social scientists have only begun to use this wealth to its fullest capacity, as advances in computing infrastructures, methods, and Internet communication technologies create new opportunities for developing and integrating diverse types of information into social science data. Social science faces the challenge of “big data,” a new era of the quantification and analysis of political behavior on an unprecedented scope and scale.

Will it rise to this challenge?

We guest edited a special issue of the International Journal of Sociology addresses recent uses of “big data,” its multiple meanings, and the potential that this may have in building a stronger understanding of political behavior. In our introduction, “Political Behavior and Big Data,” we address recent uses of “big data,” its multiple meanings, and the potential that this may have in building a stronger understanding of political behavior.

“Big data” has become a buzzword of no common definition. In general, big data refers to any data set that has an unusually large number of cases and is composed of a diversity of sources. The number of variables can be very small, there is no limit on time or space coverage, and there is no specification of how diverse “diversity of sources” should be.

In academia, big data research can be found almost everywhere, but the bulk of it is located in a handful of disciplines and published within the past few years.

FIGURE 1 “Big Data” as a Web of Science topic in all academic products, 2008–2014. Note: Academic products include article, book, book review, book chapter, editorial material, proceedings paper, review, and letters.

popularity of big data in the social sciences

As research on big data in sociology and political science is so new—the vast majority of publications appeared in the past year and a half—there is no discernible trend in how social science talks about, or uses, big data. Without much empirical work to draw upon, sociology and political science tend to speculate on what big data means for their academic disciplines and for society in general. Some social scientists who study political behavior see an advantage in more and bigger data. Others are concerned about ethics in the age of big data.

The biggest big data source for studying political behavior comes from the Internet and, at this early date, two of the most prominent are search engines (e.g., Google) and social media. Articles on the substantive promise and methodological perils of social media point to problems of sample bias—Twitter users are not evenly distributed in the population, and thus are not fully representative—and the inherent ambiguities of their data. These ambiguities stem primarily from Silicon Valley companies’ refusal to provide academics with the necessary information to understand the who, what, when, and where of their data. An issue that has received little attention concerns how to integrate existing social science data that have a known range of reliability with big data sources—such as social media—that have an unknown range of reliability. Social media and the Internet have become main ways of communicating and acting politically; their integration with existing social science data has become inevitable.

Our purpose for the issue of the International Journal of Sociology is to meaningfully join the upsurge in popularity of big data in the social sciences by calling for more methodological research. An upsurge in popularity is not necessarily a revolution. As the authors reveal, to make good on the substantive promise of big data we would first need to solve big data’s inherent methodological problems. At this early date, we are only now beginning to identify and solve these problems. We hope future researchers will heed the warnings, add to the growing body of methodological studies, and chart their research course with care.