Tool that uses rearrangement distances to build a relatedness network of plasmid genomes. Allows one to explore relatedness of plasmids in a way that respects the biological mechanisms through which they vary.
Code: https://github.com/iqbal-lab-org/pling
Main Paper: https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001300
Protocols for how to use it: https://www.biorxiv.org/content/10.1101/2025.09.02.673752v1
Alignment to millions of bacterial genomes, allowing BLAST-style queries to all current bacterial data (eg AllTheBacteria).
Code: https://github.com/shenwei356/LexicMap
AMR gene identification from long reads, designed to correctly identify multi-copy genes.
Code: https://github.com/Danderson123/amira
Preprint: https://www.biorxiv.org/content/10.1101/2025.05.16.654303v2
Rigorous assembler for tiled amplicon sequencing of viruses.
Tool for rapid light-weight analysis of Mycobacterium tuberculosis, Staphylococcus aureus, Shigella sonnei, Salmonella typhi and Salmonella enterica serotype Paratyphi B, giving species/lineage information and drug resistance predictions.
Code: https://github.com/Mykrobe-tools/mykrobe
Papers: DOI: 10.1038/ncomms10063, https://doi.org/10.12688/wellcomeopenres.15603.1
Bacterial genomes can be remarkably variable even within a species, leading to the concept of a pan-genome. With standard tools, it is only possible to study SNP/mutation variation in the parts of the genome that are shared across all samples in a cohort (the "core"). Using a new genome graph implementation, we developed a new tool, pandora, for joint analysis of SNP and gene-presence information in the entire bacterial pan-genomes. Pandora supports nanopore and illumina data.
Python implementation of the Recursive-Cluster-Collapse algorithm described in the Pandora paper, builds genome graphs from either MSA or VCF. Used by both gramtools and pandora.
Tool for joint analysis of SNP/indel variation in cohorts, allowing analysis of mutations on different haplotypes and on alternate backgrounds to long deletions. The underlying data structure is a generalised BWT. Application has been focussed primarily on surface antigens in P. falciparum). Gramtools supports illumina data.
Tool for combining multiple callsets (VCF) made for the same sample using different variant callers (eg samtools, freebayes etc), and using a genome graph to adjudicate when the two callsets disagreed. Used heavily in the CRyPTIC project to analyse tens of thousands of M. tuberculosis genomes.
Code: https://github.com/iqbal-lab-org/minos
Paper: https://link.springer.com/article/10.1186/s13059-022-02714-x
Tool which runs multiple (Illumina) variant callers (usually Cortex and samtools, but you could change that) with different strengths, and then combines the results rigorously using minos.
Code: https://github.com/iqbal-lab-org/clockwork
Paper: see the Minos paper above, where it is evaluated on M. tuberculosis, S. aureus and K. pneumoniae.
Tool (introduced in the minos paper) for evaluating a VCF file of calls when you have a high quality truth assembly (as is common with bacteria – no issues with phasing calls). A probe is constructed for each record in the VCF, with flanking sequence from the reference genome (with nearby variants applied), and then this is mapped to the truth genome. This allows varifier to measure precision. Measuring recall depends on having reliable true variants; varifier can use minimap and nucmer to compare the reference genome and truth assembly and find a conservative "truth set" of variants (and uses the above probe method to filter the minimap+nucmer calls to exclude errors), and then uses that truth set to measure recall.
Tool for creating kmer index of large sets of microbial sequence data.
Code: https://github.com/iqbal-lab-org/BIGSI
Paper: https://pubmed.ncbi.nlm.nih.gov/30718882/
Note that if you want to build a BIGSI of many samples, the method outlined in the paper is quite memory intensive. We have a better method of merging indexes, documented on the wiki. However, BIGSI is quite outdated now and mostly of historical interest only.
High performance (faster and less disk use) C++ reimplementation with new ideas, of BIGSI.
This rather venerable tool builds coloured de Bruijn graphs and uses them to detect variation between a sample and a reference, or between different samples.
Code: https://github.com/iqbal-lab/cortex
Paper: https://www.nature.com/articles/ng.1028
Cortex is no longer actively developed but it is heavily used, in particular our group have analysed large cohorts of M. tuberculosis by using both cortex and samtools, and then combining the results with minos - this is packaged in Clockwork. We don't think there is a modern reimplementation of this tool - ska comes close, and uses similar ideas, so if you are interested only in SNPs, we would recommend you use that; we are not sure what the best recommendation is if you care about indels.