Using ALF to simulate large, closely related populations of bacteria

I am currently trying to use ALF (the stand-alone version) to simulate data from a custom tree, and include realistic parameters for SNP rate, INDEL rate, gene loss and recombination rates. This is a little different to what I think the program was originally designed for – small numbers of divergent organisms – but is probably an easier problem.

ALF is good because it includes a lot of features of evolution more naive models don’t encompass, and gives good output useful for further simulation and testing work.

I’ve made the following notes and tweaks to fix issues as I’ve been going along, which I hope may be of use to anyone trying to use the software for this purpose

  • For custom INDEL distributions, they must be specified in the parameters file as (note the double bracket):
    IndelModel(0.02,'CUSTOM', [[0.5,0.25,0.2,0.05]], 20)

    (thanks to the author Daniel Dalquen for helping me with this)

  • Custom trees must have no labels on the internal nodes. To ignore these you can remove the InternalLabels argument on line 820 of lib/simulator/SE_DataOutput.drw
  • Make sure ‘unitIsPam’ is set to false for trees with substitutions per site, which is the default unit for e.g. Raxml trees
  • If you’re simulating a lot of lateral gene transfer events with multiple genes, you’ll run into a transLoc out of range error due to a bug in the code. This can be fixed by changing line 604 in lib/simulator/SE_Evolutionary_Events.drw to
    place := Rand(0..length(geneR[org]) - lgtSize);

I have also written some helper scripts, which can be found in https://github.com/johnlees/bioinformatics/tree/master/sequence_evolution/ALF

  • gff2darwin.pl: Helps convert gff annotation files to custom input starting sequences
  • alf_db_to_fasta.pl: Converts the DB output formatting into a single fasta contig for an organism -> observed organism genome
  • alf_msa_concat.pl: Converts MSA output (which is by gene) into true alignments by organism -> true alignment
  • genes_to_contig.pl: Concatenates all contigs to create a whole genome alignment file (output from alf_msa_concat,pl) -> true alignment for population
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s