THE CARBON FOOTPRINT OF BIOINFORMATICS

Mol Biol Evol. 2022 Feb 10:msac034. doi: 10.1093/molbev/msac034. Online ahead of print.

ABSTRACT

Bioinformatic research relies on large-scale computational infrastructures which have a non-zero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org). We assessed (i) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics and molecular simulations, as well as (ii) computation strategies, such as parallelisation, CPU (central processing unit) vs GPU (graphics processing unit), cloud vs. local computing infrastructure and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, e.g. upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data centre to a more efficient one can reduce carbon footprint by ∼34%. Memory over-allocation can also be a substantial contributor to an algorithm’s greenhouse gas (GHG) emissions. The use of faster processors or greater parallelisation reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimise kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.

PMID:35143670 | DOI:10.1093/molbev/msac034

By Nevin Manimala