Nucleic Acids Res. 2022 Nov 18:gkac1068. doi: 10.1093/nar/gkac1068. Online ahead of print.
Carbohydrate Active EnZymes (CAZymes) are significantly important for microbial communities to thrive in carbohydrate rich environments such as animal guts, agricultural soils, forest floors, and ocean sediments. Since 2017, microbiome sequencing and assembly have produced numerous metagenome assembled genomes (MAGs). We have updated our dbCAN-seq database (https://bcb.unl.edu/dbCAN_seq) to include the following new data and features: (i) ∼498 000 CAZymes and ∼169 000 CAZyme gene clusters (CGCs) from 9421 MAGs of four ecological (human gut, human oral, cow rumen, and marine) environments; (ii) Glycan substrates for 41 447 (24.54%) CGCs inferred by two novel approaches (dbCAN-PUL homology search and eCAMI subfamily majority voting) (the two approaches agreed on 4183 CGCs for substrate assignments); (iii) A redesigned CGC page to include the graphical display of CGC gene compositions, the alignment of query CGC and subject PUL (polysaccharide utilization loci) of dbCAN-PUL, and the eCAMI subfamily table to support the predicted substrates; (iv) A statistics page to organize all the data for easy CGC access according to substrates and taxonomic phyla; and (v) A batch download page. In summary, this updated dbCAN-seq database highlights glycan substrates predicted for CGCs from microbiomes. Future work will implement the substrate prediction function in our dbCAN2 web server.