MAFFT Module

This module provides a Julia wrapper for MAFFT (Multiple alignment program for amino acid or nucleotide sequences). Provides functions to call mafft with different pre-configurations (analogues to the provided anliases by mafft, see the mafft manpage) or custom parameters. For general use of Mafft consult the Mafft manual or manpage.

Tested with MAFFT v7.215 (2014/12/17)

Dependencies

  • MAFFT has to be installed
  • Julia Packages
    • FastaIO

Exported functions

mafft

mafft(fasta_in::String, preconfiguration=:default)

Runs mafft with the provided fasta file and returns the alignment in FastaIO dataformat. By default mafft is called with the --auto option.

fasta_in: path to FASTA file

preconfiguration: optional commandline arguments for MAFFT (array of strings)

mafft_from_string

mafft_from_string(fasta_in::String, preconfiguration=:default)

Calls MAFFT with the given FASTA string as input and returns aligned FASTA in the FastaIO dataformat.

fasta_in: FASTA string

preconfiguration: optional commandline arguments for MAFFT (array of strings)

mafft_from_fasta

mafft_from_fasta(fasta_in, preconfiguration=:default)

Calls MAFFT with the given FASTA in FastaIO format

fasta_in: FASTA in FastaIO format

preconfiguration: optional commandline arguments for MAFFT (array of strings)

mafft_profile

mafft_profile(group1::String, group2::String)

Group-to-group alignments

group1 and group2 have to be files with alignments. Returns aligned FASTA in the FastaIO dataformat.

mafft_profile_from_string

mafft_profile_from_string(group1::String, group2::String)

Group-to-group alignments with input strings in FASTA format.

group1 and group2 have to be strings with alignments in FASTA format.

mafft_profile_from_fasta

mafft_profile_from_fasta(group1, group2)

Group-to-group alignments with input in FastaIO format

group1 and group2 have to be in FastaIO format and have to be alignments

Helper functions for aligned FASTA

This module also includes a few helper functions for the FastaIO dataformat (which is returned by the mafft functions of this module).

alignment_length

alignment_length(fasta)

Returns the length of the alignment.

fasta: A FastaIO dataformat object

to_aminoacids

to_aminoacids(fasta)

Converts a FastaIO-formatted array into an array of BioSeq AminoAcid.

fasta: A FastaIO dataformat object

print_fasta(fasta)

Prints a FastaIO object in a nicely formatted way to the screen.

fasta: A FastaIO dataformat object

Supported pre-configurations (strategies)

The following mafft strategies are supported by built-in preconfigurations which can be used by supplying the function calls with the corresponding symbol (in the parentheses). The descriptions where taken from the Mafft manpage.

  • L-INS-i (:linsi): probably most accurate; recommended for <200 sequences; iterative refinement method incorporating local pairwise alignment information
  • G-INS-i (:ginsi): suitable for sequences of similar lengths; recommended for <200 sequences; iterative refinement method incorporating global pairwise alignment information
  • E-INS-i (:einsi): suitable for sequences containing large unalignable regions; recommended for <200 sequences
  • FFT-NS-i (:fftnsi): iterative refinement method; two cycles only
  • FFT-NS-2 (:fftns): fast; progressive method
  • NW-NS-i (:nwnsi): iterative refinement method without FFT approximation; two cycles only
  • NW-NS-2 (:nwns): fast; progressive method without the FFT approximation

Usage

mafft("examples/fasta/il4.fasta")

Runs mafft with the provided fasta file and returns the alignment in FastaIO dataformat. By default mafft is called with the --auto option.

mafft("examples/fasta/il4.fasta", ["--localpair", "--maxiterate", "1000"])

Calling mafft with custom arguments. Arguments have to be a array of strings. This call is also equivalent to calling:

 mafft("examples/fasta/il4.fasta", :linsi)

References

  • Katoh, Standley 2013 (Molecular Biology and Evolution 30:772-780)
    MAFFT multiple sequence alignment software version 7: improvements in performance and usability.
    (outlines version 7)
  • Kuraku, Zmasek, Nishimura, Katoh (Nucleic Acids Research 41:W22-W28)
    aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.
    (describes an interactive sequence collection/selection service by aLeaves, MAFFT and Archaeopteryx)
  • Katoh, Frith 2012 (Bioinformatics 28:3144-3146)
    Adding unaligned sequences into an existing alignment using MAFFT and LAST.
    (describes the --add and --addfragments options)
  • Katoh, Toh 2010 (Bioinformatics 26:1899-1900)
    Parallelization of the MAFFT multiple sequence alignment program.
    (describes the multithread version)
  • Katoh, Asimenos, Toh 2009 (Methods in Molecular Biology 537:39-64)
    Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada
    (outlines DNA alignment methods and several tips including group-to-group alignment and rough clustering of a large number of sequences)
  • Katoh, Toh 2008 (BMC Bioinformatics 9:212)
    Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.
    (describes RNA structural alignment methods)
  • Katoh, Toh 2008 (Briefings in Bioinformatics 9:286-298)
    Recent developments in the MAFFT multiple sequence alignment program.
    (outlines version 6; Fast Breaking Paper in Thomson Reuters' ScienceWatch)
  • Katoh, Toh 2007 (Bioinformatics 23:372-374Errata
    PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences.
    (describes the PartTree algorithm)
  • Katoh, Kuma, Toh, Miyata 2005 (Nucleic Acids Res. 33:511-518)
    MAFFT version 5: improvement in accuracy of multiple sequence alignment.
    (describes [ancestral versions of] the G-INS-i, L-INS-i and E-INS-i strategies)
  • Katoh, Misawa, Kuma, Miyata 2002 (Nucleic Acids Res. 30:3059-3066)
    MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
    (describes the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies)