Brief Bioinform. 2026 May 4;27(3):bbag267. doi: 10.1093/bib/bbag267.
ABSTRACT
Transcriptome assembly and quantification are crucial steps in the differential expression analysis of RNA-seq data. As transcriptome assembly precedes quantification, its results inevitably influence the outcomes of quantification. This study investigates the impact of transcriptome assembly algorithms on quantification outcomes in next-generation RNA-seq data analysis. From the perspective of quantification results, we evaluate the performance of transcriptome assembly algorithms. We assess the assembly quality and stability of three commonly used transcriptome assemblers-StringTie2, Scallop, and Cufflinks-on both simulated and real datasets. Our evaluation provides references for downstream analyses and identifies the most effective and stable pipeline, which is specifically the pipeline combining HISAT2 (for transcriptome alignment) and StringTie2 (for assembly). Furthermore, we compare simulated data generated by RNA-seq data simulation tools with real RNA-seq data and reveal that simulated data fails to fully capture the complexity of real data. Through this analysis, we identify transcript features associated with poor assembly and quantification performance, specifically highlighting two extreme cases: long, low-expression transcripts that are often overlooked and short transcripts that are prone to quantification errors. These findings offer valuable insights into future software development directions.
PMID:42202281 | DOI:10.1093/bib/bbag267