Hostname: page-component-7c8c6479df-hgkh8 Total loading time: 0 Render date: 2024-03-28T19:25:38.429Z Has data issue: false hasContentIssue false

Causal Inference without Balance Checking: Coarsened Exact Matching

Published online by Cambridge University Press:  04 January 2017

Stefano M. Iacus
Affiliation:
Department of Economics, Business and Statistics, University of Milan, Via Conservatorio 7, I-20124 Milan, Italy. e-mail: stefano.iacus@unimi.it
Gary King*
Affiliation:
Institute for Quantitative Social Science, Harvard University, 1737 Cambridge Street, Cambridge, MA 02138
Giuseppe Porro
Affiliation:
Department of Economics and Statistics, University of Trieste, P.le Europa 1, I-34127 Trieste, Italy. e-mail: giuseppe.porro@econ.units.it
*
e-mail: king@harvard.edu (corresponding author)

Abstract

We discuss a method for improving causal inferences called “Coarsened Exact Matching” (CEM), and the new “Monotonic Imbalance Bounding” (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of statistical properties not available in most other matching methods but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R, Stata, and SPSS that implement all our suggestions.

Type
Research Article
Copyright
Copyright © The Author 2011. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Jonathan N. Katz

Authors' note: Open source R, Stata, and SPSS software to implement the methods described herein (called CEM) is available at http://gking.harvard.edu/cem; the CEM algorithm is also available via a standard interface offered in the R package MatchIt. Thanks to Erich Battistin, Nathaniel Beck, Matt Blackwell, Andy Eggers, Adam Glynn, Justin Grimmer, Jens Hainmueller, Ben Hansen, Kosuke Imai, Guido Imbens, Fabrizia Mealli, Walter Mebane, Clayton Nall, Enrico Rettore, Jamie Robins, Don Rubin, Jas Sekhon, Jeff Smith, Kevin Quinn, and Chris Winship for helpful comments. All information necessary to replicate the results in this paper appear in Iacus, King, and Porro (2011b).

References

Abadie, Alberto, and Gardeazabal, Javier. 2003. The economic costs of conflict: A case study of the Basque Country. American Economic Review 93: 113–32.CrossRefGoogle Scholar
Abadie, Alberto, and Imbens, Guido W. 2007. Bias-corrected matching estimators for average treatment effects. Unpublished manuscript. http://ksghome.harvard.edu/aabadie/research.html.Google Scholar
Austin, Peter C., and Mamdani, Muhammad M. 2006. A comparison of propensity score methods: A case-study estimating the effectiveness of post-AMI statin use. Statistics in Medicine 25: 2084–106.CrossRefGoogle ScholarPubMed
Battistin, Erich, and Chesher, Andrew. 2004. The impact of measurement error on evaluation methods based on strong ignorability. Working paper, Institute for Fiscal Studies, London.Google Scholar
Carpenter, Daniel Paul. 2002. Groups, the media, agency waiting costs, and FDA drug approval. American Journal of Political Science 46: 490505.CrossRefGoogle Scholar
Cochran, William G., and Rubin, Donald B. 1973. Controlling bias in observational studies: A review. Sankhya: The Indian Journal of Statistics, Series A 35, Part 4:417–66.Google Scholar
Crump, Richard K., Joseph Hotz, V., Imbens, Guido W., and Mitnik, Oscar. 2009. Dealing with limited overlap in estimation of average treatment effects. Biometrika 96: 187.CrossRefGoogle Scholar
Dehejia, Rajeev H., and Wahba, Sadek. 1999. Causal effects in nonexperimental studies: Re-evaluating the evaluation of training programs. Journal of the American Statistical Association 94: 1053–62.CrossRefGoogle Scholar
Dehejia, Rajeev H., and Wahba, Sadek. 2002. Propensity score matching methods for non-experimental causal studies. Review of Economics and Statistics 84: 151–61.CrossRefGoogle Scholar
Diamond, Alexis, and Sekhon, Jasjeet. 2005. Genetic matching for estimating causal effects: A new method of achieving balance in observational studies. Working paper, http://jsekhon.fas.harvard.edu/ (accessed 2005).Google Scholar
Freedman, David, and Diaconis, Persi. 1981. On the histogram as a density estimator: L2 theory. Probability Theory and Related Fields 57: 453–76.Google Scholar
Galdo, Jose, Smith, Jeffrey, and Black, Dan. 2008. Bandwidth selection and the estimation of treatment effects with unbalanced data. Working paper, University of Michigan.CrossRefGoogle Scholar
Girosi, Federico, and King, Gary. 2008. Demographic forecasting. Princeton, NJ: Princeton University Press. Unpublished manuscript. http://gking.harvard.edu/files/smooth/ (accessed 2008).CrossRefGoogle Scholar
Hansen, Ben. 2008. The prognostic analogy of the propensity score. Biometrika 95: 481–88.CrossRefGoogle Scholar
Heckman, James, Ichimura, H., and Todd, P. 1997. Matching as an econometric evaluation estimator: Evidence from evaluating a job training program. Review of Economic Studies 64: 605–54.CrossRefGoogle Scholar
Ho, Daniel, Imai, Kosuke, King, Gary, and Stuart, Elizabeth. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15: 199236. http://gking.harvard.edu/files/abs/matchp-abs.shtml (accessed 2007).CrossRefGoogle Scholar
Iacus, Stefano M., King, Gary, and Porro, Giuseppe. 2009. CEM: Coarsened Exact Matching Software. Journal of Statistical Software 30(9), http://gking.harvard.edu/cem.CrossRefGoogle Scholar
Iacus, Stefano M., King, Gary, and Porro, Giuseppe. 2011. Multivariate matching methods that are Monotonic Imbalance Bounding. Journal of the American Statistical Association. http://gking.harvard.edu/files/abs/cem-math-abs.shtml.CrossRefGoogle Scholar
Iacus, Stefano M., King, Gary, and Porro, Giuseppe. 2011b. Replication data for: Causal inference without balance checking: Coarsened Exact Matching. Murray Research Archive [distributor] V1 [version]. http://hdl.handle.net/1902.1/15601.CrossRefGoogle Scholar
Iacus, Stefano M., and Porro, Giuseppe. 2007. Missing data imputation, matching and other applications of random recursive partitioning. Computational Statistics and Data Analysis 52: 773–89.CrossRefGoogle Scholar
Iacus, Stefano M., and Porro, Giuseppe. 2008. Invariant and metric free proximities for data matching: An R package. Journal of Statistical Software 25(11): 122.CrossRefGoogle Scholar
Iacus, Stefano M., and Porro, Giuseppe. 2009. Random recursive partitioning: A matching method for the estimation of the average treatment effect. Journal of Applied Econometrics 24: 163–85.Google Scholar
Imai, Kosuke, King, Gary, and Nall, Clayton. 2009. The essential role of pair matching in cluster-randomized experiments, with application to the Mexican universal health insurance evaluation. Statistical Science 24(1): 2953. http://gking.harvard.edu/files/abs/cluster-abs.shtml.CrossRefGoogle Scholar
Imai, Kosuke, King, Gary, and Stuart, Elizabeth. 2008. Misunderstandings among experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A 171, 2: 481502. http://gking.harvard.edu/files/abs/matchse-abs.shtml (accessed 2008).CrossRefGoogle Scholar
Imai, Kosuke, and van Dyk, D. A. 2004. Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association 99: 854–66.CrossRefGoogle Scholar
Imbens, Guido W. 2000. The role of the propensity score in estimating dose-response functions. Biometrika 87: 706–10.CrossRefGoogle Scholar
Imbens, Guido W. 2003. Sensitivity to exogeneity assumptions in program evaluation. American Economic Review 96: 126–32.Google Scholar
Imbens, Guido W. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics 86: 429.CrossRefGoogle Scholar
Imbens, Guido W., and Angrist, Joshua D. 1994. Identification and estimation of local average treatment effects. Econometrica 62: 467–75.Google Scholar
King, Gary, Honaker, James, Joseph, Anne, and Scheve, Kenneth. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95: 4969. http://gking.harvard.edu/files/abs/evil-abs.shtml (accessed 2001).CrossRefGoogle Scholar
King, Gary, Nielsen, Richard, Coberley, Carter, Pope, James, and Wells, Aaron. 2011. Comparative effectiveness of matching methods for causal inference.Google Scholar
King, Gary, and Zeng, Langche. 2006. The dangers of extreme counterfactuals. Political Analysis 14: 131–59. http://gking.harvard.edu/files/abs/counterft-abs.shtml.Google Scholar
King, Gary, and Zeng, Langche. 2007. When can history be our guide? The pitfalls of counterfactual inference. International Studies Quarterly 51: 183210. http://gking.harvard.edu/files/abs/counterf-abs.shtml.CrossRefGoogle Scholar
Lalonde, Robert. 1986. Evaluating the econometric evaluations of training programs. American Economic Review 76: 604–20.Google Scholar
Lu, Bo, Zanuto, Elaine, Hornik, Robert, and Rosenbaum, Paul R. 2001. Matching with doses in an observational study of a media campaign against drug abuse. Journal of the American Statistical Association 96: 1245–53.CrossRefGoogle Scholar
Manski, Charles F. 1995. Identification problems in the social sciences. Cambridge, MA: Harvard University Press.Google Scholar
Mielke, Paul W., and Berry, Kenneth J. 2007. Permutation methods: A distance function approach. New York: Springer.CrossRefGoogle Scholar
Morgan, Stephen L., and Winship, Christopher. 2007. Counterfactuals and causal inference: Methods and principles for social research. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Rosenbaum, Paul R., Ross, Richard N., and Silber, Jeffrey H. 2007. Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. Journal of the American Statistical Association 102: 7583.Google Scholar
Rubin, Donald B. 1976. Inference and missing data. Biometrika 63: 581–92.CrossRefGoogle Scholar
Rubin, Donald B. 1987. Multiple imputation for nonresponse in surveys. New York: John Wiley.CrossRefGoogle Scholar
Rubin, Donald B. 2001. Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology 2: 169–88.CrossRefGoogle Scholar
Rubin, Donald B. 2006. Matched sampling for causal effects. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Scott, David W. 1992. Multivariate density estimation. Theory, practice and visualization. New York: John Wiley & Sons, Inc.CrossRefGoogle Scholar
Shimazaki, Hideaki, and Shinomoto, Shigeru. 2007. A method for selecting the bin size of a time histogram. Neural Computation 19: 1503–27.CrossRefGoogle Scholar
Smith, Jeffrey A., and Todd, Petra E. 2005. Does matching overcome LaLonde's critique of nonexperimental estimators? Journal of Econometrics 125: 305–53.CrossRefGoogle Scholar
Washington, Ebonya L. 2008. Female socialization: How daughters affect their legislator fathers' voting on woman's issues. American Economic Review 98: 311–32.CrossRefGoogle Scholar