Stat Methods Med Res. 2022 Nov 10:9622802221129044. doi: 10.1177/09622802221129044. Online ahead of print.
The regression discontinuity design is a quasi-experimental design that estimates the causal effect of a treatment when its assignment is defined by a threshold for a continuous variable. The regression discontinuity design assumes that subjects with measurements within a bandwidth around the threshold belong to a common population, so that the threshold can be seen as a randomising device assigning treatment to those falling just above the threshold and withholding it from those who fall below. Bandwidth selection represents a compelling decision for the regression discontinuity design analysis as results may be highly sensitive to its choice. A few methods to select the optimal bandwidth, mainly from the econometric literature, have been proposed. However, their use in practice is limited. We propose a methodology that, tackling the problem from an applied point of view, considers units’ exchangeability, that is, their similarity with respect to measured covariates, as the main criteria to select subjects for the analysis, irrespectively of their distance from the threshold. We cluster the sample using a Dirichlet process mixture model to identify balanced and homogeneous clusters. Our proposal exploits the posterior similarity matrix, which contains the pairwise probabilities that two observations are allocated to the same cluster in the Markov chain Monte Carlo sample. Thus we include in the regression discontinuity design analysis only those clusters for which we have stronger evidence of exchangeability. We illustrate the validity of our methodology with both a simulated experiment and a motivating example on the effect of statins on cholesterol levels.