Integr Environ Assess Manag. 2026 Jan 27:vjag006. doi: 10.1093/inteam/vjag006. Online ahead of print.
ABSTRACT
The revised 2023 European Food Safety Authority (EFSA) guidance on the risk assessment of plant protection products on bees introduced a major change in the statistical evaluation of higher tier studies, replacing difference testing with the equivalence testing approach. This paper evaluates several statistical models for equivalence testing of colony strength endpoints in honey bee semi-field studies, including a t-test, a two-way ANOVA, and a linear-mixed-effects model incorporating an autoregressive (AR) structure. Using a range of simulated scenarios, model performance was compared to determine suitability and the likely level of replication needed to conclude a low risk of a test substance with a true effect size of <10% reduction in colony strength. The linear mixed-effects model with AR structure and baseline adjustment offered the highest statistical power among the tested approaches. In all simulated scenarios, achieving 80% power to conclude equivalence required substantially more replication than the minimum of three replicates recommended in the EPPO (2010) test guideline. Under the best-case scenario, a minimum of seven replicates was needed when the true effect size was 0, whereas effects close to the equivalence margin (a true 9% reduction) required extremely large sample sizes, up to 612 replicates, to achieve sufficient power. Potential modifications to the study design to reduce replication needs were also explored. Reducing initial inter-colony variability alone did not meaningfully decrease required sample sizes, whereas increasing temporal correlation among repeated observations improved power and lowered replication requirements. Nevertheless, it is questioned whether the large numbers of replicates illustrated here are manageable in a practical study setup. Caution is needed during the implementation of the equivalence approach for regulatory evaluation until applicants and regulatory bodies better understand if such studies can be feasibly designed and conducted to demonstrate acceptable risk against the specific protection goals.
PMID:41592234 | DOI:10.1093/inteam/vjag006