• Sequence Assembly
  • This chapter explains how to assemble DNA fragments and edit contigs derived from assembly.



    1. Assembly of DNA fragments
    2. Choosing the Sequence | Sequence Assembly command shows the Sequence Assembly dialog box. In order to achieve the best results, adequate parameters should be chosen for processing sequence assembly.



    3. Sequence sources
    4. Sequence fragment data can be retrieved from sequence files, sequence channels and DNA databases. Press the Add File button to retrieve sequences from disk files, the Folder button from a folder (all files in the folder), the Channel button from channels, or the Database button from the default database. You may remove a sequence from the list by selecting it and pressing the Remove button, or remove all sequences in the list by pressing the Clear button.

      The sequence files for assembly should be in DNAMAN format with the ORIGIN keyword (See the Section IV.3. If any of your sequence files does not have the ORIGIN format, check the Load entire file if format unknown option. Trace files (ABI and SCF) derived from automated sequencing are also accepted in sequence assembly.

      DNAMAN may remove uncertain regions of a DNA fragment. You may check the remove flanking regions when ACGT% < option and set a percent value. When this option turned on, DNAMAN will calculate the composition of flanking regions of all input sequences. Regions with lower ACGT% will be removed.

      You have an option to remove vector sequence from all source sequences. In this case, you must load the vector sequence into the default sequence channel. If there are many vectors for the source sequences, it is recommended to combine all vectors in one sequence file and load it into the default sequence channel. When this option turned on, DNAMAN will compare all input sequences against the vector. Flanking regions containing the vector sequence will be removed. The shorter the vector sequence is, the faster the comparison will be. Therefore, you can accelerate the process by removing unnecessary sequence in the vector.

      When you work with a sequencing project, it is suggested to make a database for the project. You can save all the sequences as well as vectors of the project in the database. The database approach makes your sequencing project well organized and facilitates sequence assembly.



    5. Methods of sequence assembly
    6. You may choose Quick Alignment or End Comparison method for sequence assembly. The Quick Alignment method is recommended in most cases of sequence assembly.

      The End Comparison method compares progressively the end of each fragment. No insertion is introduced to the sequence ends. Overlapping is determined according to the qualification criteria. The End Comparison method should be used only for a small number of fragments while the overlaps are short and quality of overlapping is high.

      With the Quick alignment method, alignments are performed using the quick alignment algorithm (Wilbur and Lipman, 1983, Proc. Natl. Acad. Sci. USA, 80:726-730) with all sequences in both plus and minus strands. The sequences will be assembled according to the qualification criteria. Inserts will be added in overlapping region if necessary. This method can find the correct contigs even in presence of ambiguity in the sequences.

      Parameters involved are:



    7. Assembly analysis
    8. Click the Assemble button to start searching for overlapping sequences. DNAMAN performs firstly pairwise comparison of all input sequences and keeps updating the results. The number of overlaps found during searching is indicated. The process of sequence assembly is threaded, therefore, you may performing other tasks during sequence assembly.

      There are five options for processing sequence assembly.

      You may stop pairwise comparison any time during the assembly process by pressing the Stop button.

      If all sequences cannot be assembled to one continuous sequence, they may be divided into different groups that are also indicated during the merge process. Isolated sequences will be ignored in the assembly result. Click the Show Result button to display the result in a sequence assembly editor.



    9. Sequence assembly editor
    10. The Sequence Assembly Editor facilitates the visualization and editing of assembled sequences. There are three windows in a sequence assembly editor: Name list window, Sequence window and Graphic window.