In: Political analysis: PA ; the official journal of the Society for Political Methodology and the Political Methodology Section of the American Political Science Association, Band 19, Heft V2, S. 1-4
In their landmark study of a field experiment, Gerber and Green (2000) found that get-out-the-vote callsreduceturnout by five percentage points. In this article, I introduce statistical methods that can uncover discrepancies between experimental design and actual implementation. The application of this methodology shows that Gerber and Green's negative finding is caused by inadvertent deviations from their stated experimental protocol. The initial discovery led to revisions of the original data by the authors and retraction of the numerical results in their article. Analysis of their revised data, however, reveals new systematic patterns of implementation errors. Indeed, treatment assignments of the revised data appear to be even less randomized than before their corrections. To adjust for these problems, I employ a more appropriate statistical method and demonstrate that telephone canvassingincreasesturnout by five percentage points. This article demonstrates how statistical methods can find and correct complications of field experiments.
Data Analysis for Social Science provides a friendly introduction to the statistical concepts and programming skills needed to conduct and evaluate social scientific studies. Using plain language and assuming no prior knowledge of statistics and coding, the book provides a step-by-step guide to analyzing real-world data with the statistical program R for the purpose of answering a wide range of substantive social science questions. It teaches not only how to perform the analyses but also how to interpret results and identify strengths and limitations.
AbstractThe democratic peace—the idea that democracies rarely fight one another—has been called "the closest thing we have to an empirical law in the study of international relations." Yet, some contend that this relationship is spurious and suggest alternative explanations. Unfortunately, in the absence of randomized experiments, we can never rule out the possible existence of such confounding biases. Rather than commonly used regression-based approaches, we apply a nonparametric sensitivity analysis. We show that overturning the negative association between democracy and conflict would require a confounder that is forty-seven times more prevalent in democratic dyads than in other dyads. To put this number in context, the relationship between democracy and peace is at least five times as robust as that between smoking and lung cancer. To explain away the democratic peace, therefore, scholars would have to find far more powerful confounders than those already identified in the literature.
In: Political analysis: PA ; the official journal of the Society for Political Methodology and the Political Methodology Section of the American Political Science Association, Band 29, Heft 3, S. 405-415
AbstractThe two-way linear fixed effects regression (2FE) has become a default method for estimating causal effects from panel data. Many applied researchers use the 2FE estimator to adjust for unobserved unit-specific and time-specific confounders at the same time. Unfortunately, we demonstrate that the ability of the 2FE model to simultaneously adjust for these two types of unobserved confounders critically relies upon the assumption of linear additive effects. Another common justification for the use of the 2FE estimator is based on its equivalence to the difference-in-differences estimator under the simplest setting with two groups and two time periods. We show that this equivalence does not hold under more general settings commonly encountered in applied research. Instead, we prove that the multi-period difference-in-differences estimator is equivalent to the weighted 2FE estimator with some observations having negative weights. These analytical results imply that in contrast to the popular belief, the 2FE estimator does not represent a design-based, nonparametric estimation strategy for causal inference. Instead, its validity fundamentally rests on the modeling assumptions.
Although it is widely known that the self-reported turnout rates obtained from public opinion surveys tend to substantially overestimate actual turnout rates, scholars sharply disagree on what causes this bias. Some blame overreporting due to social desirability, whereas others attribute it to nonresponse bias and the accuracy of turnout validation. While we can validate self-reported turnout by directly linking surveys with administrative records, most existing studies rely on proprietary merging algorithms with little scientific transparency and report conflicting results. To shed light on this debate, we apply a probabilistic record linkage model, implemented via the open-source software package fastLink, to merge two major election studies—the American National Election Studies and the Cooperative Congressional Election Survey—with a national voter file of over 180 million records. For both studies, fastLink successfully produces validated turnout rates close to the actual turnout rates, leading to public-use validated turnout data for the two studies. Using these merged data sets, we find that the bias of self-reported turnout originates primarily from overreporting rather than nonresponse. Our findings suggest that those who are educated and interested in politics are more likely to overreport turnout. Finally, we show that fastLink performs as well as a proprietary algorithm.
AbstractMany researchers use unit fixed effects regression models as their default methods for causal inference with longitudinal data. We show that the ability of these models to adjust for unobserved time‐invariant confounders comes at the expense of dynamic causal relationships, which are permitted under an alternative selection‐on‐observables approach. Using the nonparametric directed acyclic graph, we highlight two key causal identification assumptions of unit fixed effects models: Past treatments do not directly influence current outcome, and past outcomes do not affect current treatment. Furthermore, we introduce a new nonparametric matching framework that elucidates how various unit fixed effects models implicitly compare treated and control observations to draw causal inference. By establishing the equivalence between matching and weighted unit fixed effects estimators, this framework enables a diverse set of identification strategies to adjust for unobservables in the absence of dynamic causal relationships between treatment and outcome variables. We illustrate the proposed methodology through its application to the estimation of GATT membership effects on dyadic trade volume.
In: Political analysis: PA ; the official journal of the Society for Political Methodology and the Political Methodology Section of the American Political Science Association, Band 24, Heft 2, S. 263-272
In both political behavior research and voting rights litigation, turnout and vote choice for different racial groups are often inferred using aggregate election results and racial composition. Over the past several decades, many statistical methods have been proposed to address this ecological inference problem. We propose an alternative method to reduce aggregation bias by predicting individual-level ethnicity from voter registration records. Building on the existing methodological literature, we use Bayes's rule to combine the Census Bureau's Surname List with various information from geocoded voter registration records. We evaluate the performance of the proposed methodology using approximately nine million voter registration records from Florida, where self-reported ethnicity is available. We find that it is possible to reduce the false positive rate among Black and Latino voters to 6% and 3%, respectively, while maintaining the true positive rate above 80%. Moreover, we use our predictions to estimate turnout by race and find that our estimates yields substantially less amounts of bias and root mean squared error than standard ecological inference estimates. We provide open-source software to implement the proposed methodology.
Social scientists are often interested in testing multiple causal mechanisms through which a treatment affects outcomes. A predominant approach has been to use linear structural equation models and examine the statistical significance of the corresponding path coefficients. However, this approach implicitly assumes that the multiple mechanisms are causally independent of one another. In this article, we consider a set of alternative assumptions that are sufficient to identify the average causal mediation effects when multiple, causally related mediators exist. We develop a new sensitivity analysis for examining the robustness of empirical findings to the potential violation of a key identification assumption. We apply the proposed methods to three political psychology experiments, which examine alternative causal pathways between media framing and public opinion. Our analysis reveals that the validity of original conclusions is highly reliant on the assumed independence of alternative causal mechanisms, highlighting the importance of proposed sensitivity analysis. All of the proposed methods can be implemented via an open source R package, mediation. ; National Science Foundation (U.S.) (SES-0918968)
In: Political analysis: PA ; the official journal of the Society for Political Methodology and the Political Methodology Section of the American Political Science Association, Band 21, Heft 2, S. 141-171
Social scientists are often interested in testing multiple causal mechanisms through which a treatment affects outcomes. A predominant approach has been to use linear structural equation models and examine the statistical significance of the corresponding path coefficients. However, this approach implicitly assumes that the multiple mechanisms are causally independent of one another. In this article, we consider a set of alternative assumptions that are sufficient to identify the average causal mediation effects when multiple, causally related mediators exist. We develop a new sensitivity analysis for examining the robustness of empirical findings to the potential violation of a key identification assumption. We apply the proposed methods to three political psychology experiments, which examine alternative causal pathways between media framing and public opinion. Our analysis reveals that the validity of original conclusions is highly reliant on the assumed independence of alternative causal mechanisms, highlighting the importance of proposed sensitivity analysis. All of the proposed methods can be implemented via an open source R package,mediation.
Empirical testing of competing theories lies at the heart of social science research. We demonstrate that a well-known class of statistical models, called finite mixture models, provides an effective way of rival theory testing. In the proposed framework, each observation is assumed to be generated either from a statistical model implied by one of the competing theories or more generally from a weighted combination of multiple statistical models under consideration. Researchers can then estimate the probability that a specific observation is consistent with each rival theory. By modeling this probability with covariates, one can also explore the conditions under which a particular theory applies.We discuss a principled way to identify a list of observations that are statistically significantly consistent with each theory and propose measures of the overall performance of each competing theory. We illustrate the relative advantages of our method over existing methods through empirical and simulation studies. Adapted from the source document.
In: Political analysis: PA ; the official journal of the Society for Political Methodology and the Political Methodology Section of the American Political Science Association, Band 20, Heft 1, S. 47-77
The validity of empirical research often relies upon the accuracy of self-reported behavior and beliefs. Yet eliciting truthful answers in surveys is challenging, especially when studying sensitive issues such as racial prejudice, corruption, and support for militant groups. List experiments have attracted much attention recently as a potential solution to this measurement problem. Many researchers, however, have used a simple difference-in-means estimator, which prevents the efficient examination of multivariate relationships between respondents' characteristics and their responses to sensitive items. Moreover, no systematic means exists to investigate the role of underlying assumptions. We fill these gaps by developing a set of new statistical methods for list experiments. We identify the commonly invoked assumptions, propose new multivariate regression estimators, and develop methods to detect and adjust for potential violations of key assumptions. For empirical illustration, we analyze list experiments concerning racial prejudice. Open-source software is made available to implement the proposed methodology.