Replicating and improving quantitative war data
In: SAGE Research Methods. Cases
This case describes an effort to improve quantitative data on interstate wars. Specifically, the effort arose from three concerns about the very widely used Correlates of War data set on interstate wars. First, there are factual errors in the Correlates of War data set. Second, there are inconsistent applications of coding rules within the Correlates of War data. Third, the structure of the Correlates of War data set aggregated information in a manner that concealed important information and in some cases led to inappropriate empirical inferences. These factors led my coauthors and I to create a new data set on interstate wars, the Interstate War Data. That data set attempted to replicate the Correlates of War data, in the sense of using essentially the same coding rules as Correlates of War, albeit with more extensive historical research, more consistent applications of coding rules, and with slightly different structure to provide more information to users. The results of our efforts were quite surprising. We found an error (from flawed historical research or inconsistent application of coding rules) in the coding of at least one consequential variable (whether the conflict qualified as a war, who started the war, who participated in the war, and who won the war) in more than one third of Correlates of War interstate wars. We also found that the structure of how Correlates of War treated some wars, namely, aggregating very complicated multilateral conflicts such as World War II, omitted a tremendous amount of important information about war participation and outcomes. This project demonstrated the critical importance of investing time and energy in maximizing the quality of data.