Sci Rep. 2026 Jun 2. doi: 10.1038/s41598-026-53540-1. Online ahead of print.
ABSTRACT
While full ledger access is theoretically possible on public blockchains, in reality it is often not possible. Things that can be seen are limited by storage limitations, client design, indexing services, and off-chain execution pathways. This means that entire ledger objects are rarely used for empirical blockchain analysis; instead, observable projections are typically used. In this research, the observability of blockchain is recast as an inferential problem with incomplete observation. Studying identifiability, information loss, and irreducible uncertainty under coarsened access, the framework defines a full ledger, an observable ledger, and an observability mechanism. Three distinct visibility regimes, independent Bernoulli, clustered, and activity-dependent, are assessed in the simulation study. Reduced visibility raises uncertainty inflation, root mean squared error, variance, and mean squared error across all three regimes. The most severe deterioration happens when the condition of the underlying ledger determines visibility. This empirical study employs Google BigQuery’s publicly indexed Ethereum block data spanning blocks 18,000,000 to 18,001,000. Over the chosen Ethereum period, descriptive summaries reveal a large amount of fluctuation in gas utilised, transaction count, and basic charge per gas at the block level. Experiments with controlled missingness on the observed slice reveal that RMSE and trend estimate bias grow with increasing missingness, and that the degree of distortion is significantly affected by whether the incompleteness is MCAR-like, MAR-like, or MNAR-like. This research proves that partial observability isn’t just a secondary data issue; it can significantly affect inference on Ethereum block-level summaries.
PMID:42230851 | DOI:10.1038/s41598-026-53540-1