DNAMAN provides fast and optimal alignment methods for aligning a large number of DNA and protein sequences. The multiple alignment function is compatible with many other sequence analysis programs. A multiple sequence editor (MASED) is used to handle alignment results. The editor can produce homology and phylogenetic trees in graphic windows, and export alignment results in different sequence formats.
DNAMAN is able to recognize 8 sequence formats that are differentiated by the keywords or initials in the beginning of each file. Sequence files in these formats can be used for multiple alignment without conversion. If your sequence files do not contain the corresponding keywords or initials, you may add these text to the files and then save them.
DNAMAN accepts following sequence formats:
DNAMAN format (keyword: ORIGIN)
- Single or multiple sequences
GenBank format (keyword: LOCUS and ORIGIN)
- Single or multiple sequences
EMBL/Swiss Prot formats (keyword: ID)
GCG/MSF format (Initial: PileUp)
CLUSTAL format (Initial: CLUSTAL)
FASTA format (Initial: >)
NBRF/PIR format (Initial: >)
GDE format (Initial: # for DNA, % for protein)
If the Fast Alignment option is chosen in the first page of Multiple Alignment dialog box, DNAMAN performs the multiple alignment with a fast alignment method. With the fast alignment method, DNAMAN aligns one sequence to another with all sequences, constructs a homology tree from the results of pairwise alignment and finally build up alignment based on the homology tree with the previous results of pairwise alignment. Using these fast alignment methods, you can quickly align a large number of DNA or protein sequences. If these sequences have low degrees of divergence, the fast alignment delivers a relatively accurate result for the multiple alignment.
You may choose the Quick Alignment or Dynamic Alignment method for the pairwise alignment in the second page of the dialog box. The Quick Alignment performs pairwise alignment with all sequences using the method developed by Wibur and Lipman, 1983, Proc. Natl. Acad. Sci. USA 80:726. The Optimal Alignment performs pairwise alignment with all sequences using a dynamic method). The Dynamic Alignment method aligns sequences more accurately, but may be much slower than Quick Alignment when sequence number becomes large and sequences are long.
There are four parameters in the Quick Alignment method:
1) Gap penalty is a negative score for each gap insertion. This score is the fixed penalty that is not related to the size of a gap.
2) K-tuple defines the minimum number of identical residues as an exactly matching fragment. Increasing the K-tuple value decreases the sensitivity for alignment but speeds up the alignment.
- For DNA alignment, the K-tuple size is 1 to 6.
- For protein alignment, the K-tuple size is 1 to 3.
3) No. of Top diagonals are used to define the number of diagonals with the most K-tuple matches. Increasing the Top Diagonals value may increase a little sensitivity for alignment but slightly reduces alignment speed.
4) Window size is the number of diagonals around each of the top diagonals. Increasing the Window size may increase the sensitivity for alignment but slightly reduces the alignment speed.
There are three parameters in the Dynamic Alignment method:
1) Gap open penalty is a negative score for opening each gap.
2) Gap extension penalty is a negative score for extending each residue in an existing gap.
- For DNA alignment, the default penalty is 5.
- For protein alignment, the default penalty is 0.1.
3) Weight matrix. You may assign a transition (A->G, C->T) weight for DNA alignment. For protein alignment, you must choose one of the weight matrices for similarity calculation.
If you choose the Full Alignment, Profile Alignment, or New Sequences on Profile option in the first page of Multiple Alignment dialog box, DNAMAN performs the multiple alignment with an optimal alignment method. DNAMAN performs firstly the pairwise alignment with all necessary sequences, constructs a homology tree from the results of pairwise alignment and finally build up alignment based on the homology tree with an optimal group alignment.
There are three types of optimal alignment:
If the sequence files consist of one or more multiple alignment profiles, DNAMAN will disregard the original alignments existed in the profiles and realign them completely.
Two multiple alignment profiles should be inputted for alignment. DNAMAN aligns the two profiles without disturbing the original alignment existed in each profile.
One multiple alignment profile and one or more sequences can be inputted. DNAMAN aligns the profile without changing its original alignment with other sequences. The profile file has to be the first one in the sequence list box.
* Multiple alignment profile is a set of aligned sequences. DNAMAN allows you to export multiple alignment sequences into a text window and save the data as a multiple alignment profile.
In all the three methods, DNAMAN constructs a homology tree using the same approach as the Fast Alignment. You may choose the Quick Alignment or Dynamic Alignment method for similarity calculation. After the tree construction, dynamic programming is finally used to optimize group alignment (Feng and Doolittle, 1987, J. Mol. Evol. 25:351-360; Thompson, et al., 1994, Nucleic Acids Res. 22:4673-4680). This method generates better alignments but could be very slow with long sequences.
There are several parameters in the final multiple alignment:
- For DNA alignment, the default penalty is 5.
- For protein alignment, the default penalty is 0.1.
- BLOSUM (blocks substitution matrix). From Henikoff and Henikoff, 1992, Proc. Natl. Acad. Sci. USA 89:10915. This is the default protein weight matrix.
- PAM (percent accepted mutation matrix). From Dayhoff et al., 1978, in “Atlas of Protein Sequence and Structure”, Vol 5, Suppl.3, pp 345, National Biomedical Research Foundation, Silver Spring, Maryland, USA.
- Identity Matrix. This matrix assigns equal weight to each amino acid.