Testing for reviewer anchoring in peer review: A randomized controlled trial

PLoS One. 2024 Nov 18;19(11):e0301111. doi: 10.1371/journal.pone.0301111. eCollection 2024.

ABSTRACT

OBJECTIVE: Peer review frequently follows a process where reviewers first provide initial reviews, authors respond to these reviews, then reviewers update their reviews based on the authors’ response. There is mixed evidence regarding whether this process is useful, including frequent anecdotal complaints that reviewers insufficiently update their scores. In this study, we aim to investigate whether reviewers anchor to their original scores when updating their reviews, which serves as a potential explanation for the lack of updates in reviewer scores.

DESIGN: We design a novel randomized controlled trial to test if reviewers exhibit anchoring. In the experimental condition, participants initially see a flawed version of a paper that is corrected after they submit their initial review, while in the control condition, participants only see the correct version. We take various measures to ensure that in the absence of anchoring, reviewers in the experimental group should revise their scores to be identically distributed to the scores from the control group. Furthermore, we construct the reviewed paper to maximize the difference between the flawed and corrected versions, and employ deception to hide the true experiment purpose.

RESULTS: Our randomized controlled trial consists of 108 researchers as participants. First, we find that our intervention was successful at creating a difference in perceived paper quality between the flawed and corrected versions: Using a permutation test with the Mann-Whitney U statistic, we find that the experimental group’s initial scores are lower than the control group’s scores in both the Evaluation category (Vargha-Delaney A = 0.64, p = 0.0096) and Overall score (A = 0.59, p = 0.058). Next, we test for anchoring by comparing the experimental group’s revised scores with the control group’s scores. We find no significant evidence of anchoring in either the Overall (A = 0.50, p = 0.61) or Evaluation category (A = 0.49, p = 0.61). The Mann-Whitney U represents the number of individual pairwise comparisons across groups in which the value from the specified group is stochastically greater, while the Vargha-Delaney A is the normalized version in [0, 1].

PMID:39556577 | DOI:10.1371/journal.pone.0301111

By Nevin Manimala