Microb Genom. 2026 Apr;12(4). doi: 10.1099/mgen.0.001690.
ABSTRACT
Whole-genome sequencing provides a vast amount of genetic information, but its use in clinical and epidemiological studies often depends on the accurate inference of genomic variants. Comparative genomic studies in Mycobacterium tuberculosis typically involve mapping short reads from a diverse population to the same reference genome. This approach can lead to the incorrect characterization of many genomic regions that are susceptible to mapping bias when the reference is too distantly related to the sample. We analysed the consequences of mapping reads from different lineages of M. tuberculosis to the commonly used reference H37Rv and showed that the mapping bias varied depending on both the lineage and the gene mapped. To resolve these issues, we propose a new hybrid workflow which involves three steps: first, building a de novo assembly from short reads; second, aligning this assembly to a reference genome; and finally, mapping the reads to this aligned assembly. We show that many of the lineage and gene biases were corrected using this approach, which leads to a better characterization of lineages and hypervariable regions in comparative analysis. Our proposed approach will enable researchers to elucidate more genetic variations in M. tuberculosis and other bacterial pathogens.
PMID:41961532 | DOI:10.1099/mgen.0.001690