MAFFT Module
This module provides a Julia wrapper for MAFFT (Multiple alignment program for amino acid or nucleotide sequences). Provides functions to call mafft with different pre-configurations (analogues to the provided anliases by mafft, see the mafft manpage) or custom parameters. For general use of Mafft consult the Mafft manual or manpage.
Tested with MAFFT v7.215 (2014/12/17)
Dependencies
- MAFFT has to be installed
- Julia Packages
- FastaIO
Exported functions
mafft
mafft(fasta_in::String, preconfiguration=:default)
Runs mafft with the provided fasta file and returns the alignment in FastaIO dataformat. By default mafft is called with the --auto
option.
fasta_in: path to FASTA file
preconfiguration: optional commandline arguments for MAFFT (array of strings)
mafft_from_string
mafft_from_string(fasta_in::String, preconfiguration=:default)
Calls MAFFT with the given FASTA string as input and returns aligned FASTA in the FastaIO dataformat.
fasta_in: FASTA string
preconfiguration: optional commandline arguments for MAFFT (array of strings)
mafft_from_fasta
mafft_from_fasta(fasta_in, preconfiguration=:default)
Calls MAFFT with the given FASTA in FastaIO format
fasta_in: FASTA in FastaIO format
preconfiguration: optional commandline arguments for MAFFT (array of strings)
mafft_profile
mafft_profile(group1::String, group2::String)
Group-to-group alignments
group1 and group2 have to be files with alignments. Returns aligned FASTA in the FastaIO dataformat.
mafft_profile_from_string
mafft_profile_from_string(group1::String, group2::String)
Group-to-group alignments with input strings in FASTA format.
group1 and group2 have to be strings with alignments in FASTA format.
mafft_profile_from_fasta
mafft_profile_from_fasta(group1, group2)
Group-to-group alignments with input in FastaIO format
group1 and group2 have to be in FastaIO format and have to be alignments
Helper functions for aligned FASTA
This module also includes a few helper functions for the FastaIO dataformat (which is returned by the mafft functions of this module).
alignment_length
alignment_length(fasta)
Returns the length of the alignment.
fasta: A FastaIO dataformat object
to_aminoacids
to_aminoacids(fasta)
Converts a FastaIO-formatted array into an array of BioSeq AminoAcid.
fasta: A FastaIO dataformat object
print_fasta
print_fasta(fasta)
Prints a FastaIO object in a nicely formatted way to the screen.
fasta: A FastaIO dataformat object
Supported pre-configurations (strategies)
The following mafft strategies are supported by built-in preconfigurations which can be used by supplying the function calls with the corresponding symbol (in the parentheses). The descriptions where taken from the Mafft manpage.
- L-INS-i (
:linsi
): probably most accurate; recommended for <200 sequences; iterative refinement method incorporating local pairwise alignment information - G-INS-i (
:ginsi
): suitable for sequences of similar lengths; recommended for <200 sequences; iterative refinement method incorporating global pairwise alignment information - E-INS-i (
:einsi
): suitable for sequences containing large unalignable regions; recommended for <200 sequences - FFT-NS-i (
:fftnsi
): iterative refinement method; two cycles only - FFT-NS-2 (
:fftns
): fast; progressive method - NW-NS-i (
:nwnsi
): iterative refinement method without FFT approximation; two cycles only - NW-NS-2 (
:nwns
): fast; progressive method without the FFT approximation
Usage
mafft("examples/fasta/il4.fasta")
Runs mafft with the provided fasta file and returns the alignment in FastaIO dataformat. By default mafft is called with the --auto
option.
mafft("examples/fasta/il4.fasta", ["--localpair", "--maxiterate", "1000"])
Calling mafft with custom arguments. Arguments have to be a array of strings. This call is also equivalent to calling:
mafft("examples/fasta/il4.fasta", :linsi)
References
-
Katoh, Standley 2013
(Molecular Biology and Evolution 30:772-780)
MAFFT multiple sequence alignment software version 7: improvements in performance and usability.
(outlines version 7) -
Kuraku, Zmasek, Nishimura, Katoh
(Nucleic Acids Research 41:W22-W28)
aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.
(describes an interactive sequence collection/selection service by aLeaves, MAFFT and Archaeopteryx) -
Katoh, Frith 2012
(Bioinformatics 28:3144-3146)
Adding unaligned sequences into an existing alignment using MAFFT and LAST.
(describes the --add and --addfragments options) -
Katoh, Toh 2010
(Bioinformatics 26:1899-1900)
Parallelization of the MAFFT multiple sequence alignment program.
(describes the multithread version) -
Katoh, Asimenos, Toh 2009
(Methods in Molecular Biology 537:39-64)
Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada
(outlines DNA alignment methods and several tips including group-to-group alignment and rough clustering of a large number of sequences) -
Katoh, Toh 2008
(BMC Bioinformatics 9:212)
Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.
(describes RNA structural alignment methods) -
Katoh, Toh 2008
(Briefings in Bioinformatics 9:286-298)
Recent developments in the MAFFT multiple sequence alignment program.
(outlines version 6; Fast Breaking Paper in Thomson Reuters' ScienceWatch) -
Katoh, Toh 2007
(Bioinformatics 23:372-374) Errata
PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences.
(describes the PartTree algorithm) -
Katoh, Kuma, Toh, Miyata 2005
(Nucleic Acids Res. 33:511-518)
MAFFT version 5: improvement in accuracy of multiple sequence alignment.
(describes [ancestral versions of] the G-INS-i, L-INS-i and E-INS-i strategies) -
Katoh, Misawa, Kuma, Miyata 2002
(Nucleic Acids Res. 30:3059-3066)
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
(describes the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies)