• Multiple sequence alignment
  • DNAMAN provides fast and optimal alignment methods for aligning a large number of DNA and protein sequences. The multiple alignment function is compatible with many other sequence analysis programs. A multiple sequence editor (MASED) is used to handle alignment results. The editor can produce homology and phylogenetic trees in graphic windows, and export alignment results in different sequence formats.



    1. Performing multiple alignment


    2. Input formats for multiple alignment
    3. DNAMAN is able to recognize 8 sequence formats that are differentiated by the keywords or initials in the beginning of each file. Sequence files in these formats can be used for multiple alignment without conversion. If your sequence files do not contain the corresponding keywords or initials, you may add these text to the files and then save them.

      DNAMAN accepts following sequence formats:

      DNAMAN format (keyword: ORIGIN)

      - Single or multiple sequences

      GenBank format (keyword: LOCUS and ORIGIN)

      - Single or multiple sequences

      EMBL/Swiss Prot formats (keyword: ID)

      GCG/MSF format (Initial: PileUp)

      CLUSTAL format (Initial: CLUSTAL)

      FASTA format (Initial: >)

      NBRF/PIR format (Initial: >)

      GDE format (Initial: # for DNA, % for protein)



    4. Fast alignment method
    5. If the Fast Alignment option is chosen in the first page of Multiple Alignment dialog box, DNAMAN performs the multiple alignment with a fast alignment method. With the fast alignment method, DNAMAN aligns one sequence to another with all sequences, constructs a homology tree from the results of pairwise alignment and finally build up alignment based on the homology tree with the previous results of pairwise alignment. Using these fast alignment methods, you can quickly align a large number of DNA or protein sequences. If these sequences have low degrees of divergence, the fast alignment delivers a relatively accurate result for the multiple alignment.

      You may choose the Quick Alignment or Dynamic Alignment method for the pairwise alignment in the second page of the dialog box. The Quick Alignment performs pairwise alignment with all sequences using the method developed by Wibur and Lipman, 1983, Proc. Natl. Acad. Sci. USA 80:726. The Optimal Alignment performs pairwise alignment with all sequences using a dynamic method). The Dynamic Alignment method aligns sequences more accurately, but may be much slower than Quick Alignment when sequence number becomes large and sequences are long.

      There are four parameters in the Quick Alignment method:

      1) Gap penalty is a negative score for each gap insertion. This score is the fixed penalty that is not related to the size of a gap.

      2) K-tuple defines the minimum number of identical residues as an exactly matching fragment. Increasing the K-tuple value decreases the sensitivity for alignment but speeds up the alignment.

      - For DNA alignment, the K-tuple size is 1 to 6.

      - For protein alignment, the K-tuple size is 1 to 3.

      3) No. of Top diagonals are used to define the number of diagonals with the most K-tuple matches. Increasing the Top Diagonals value may increase a little sensitivity for alignment but slightly reduces alignment speed.

      4) Window size is the number of diagonals around each of the top diagonals. Increasing the Window size may increase the sensitivity for alignment but slightly reduces the alignment speed.

      There are three parameters in the Dynamic Alignment method:

      1) Gap open penalty is a negative score for opening each gap.

      2) Gap extension penalty is a negative score for extending each residue in an existing gap.

      - For DNA alignment, the default penalty is 5.

      - For protein alignment, the default penalty is 0.1.

      3) Weight matrix. You may assign a transition (A->G, C->T) weight for DNA alignment. For protein alignment, you must choose one of the weight matrices for similarity calculation.



    6. Optimal Alignment
    7. If you choose the Full Alignment, Profile Alignment, or New Sequences on Profile option in the first page of Multiple Alignment dialog box, DNAMAN performs the multiple alignment with an optimal alignment method. DNAMAN performs firstly the pairwise alignment with all necessary sequences, constructs a homology tree from the results of pairwise alignment and finally build up alignment based on the homology tree with an optimal group alignment.

      There are three types of optimal alignment:

      * Multiple alignment profile is a set of aligned sequences. DNAMAN allows you to export multiple alignment sequences into a text window and save the data as a multiple alignment profile.

      In all the three methods, DNAMAN constructs a homology tree using the same approach as the Fast Alignment. You may choose the Quick Alignment or Dynamic Alignment method for similarity calculation. After the tree construction, dynamic programming is finally used to optimize group alignment (Feng and Doolittle, 1987, J. Mol. Evol. 25:351-360; Thompson, et al., 1994, Nucleic Acids Res. 22:4673-4680). This method generates better alignments but could be very slow with long sequences.

      There are several parameters in the final multiple alignment: