This chapter explains how to assemble DNA fragments and edit contigs derived from assembly.
Choosing the Sequence | Sequence Assembly command shows the Sequence Assembly dialog box. In order to achieve the best results, adequate parameters should be chosen for processing sequence assembly.
Sequence fragment data can be retrieved from sequence files, sequence channels and DNA databases. Press the Add File button to retrieve sequences from disk files, the Folder button from a folder (all files in the folder), the Channel button from channels, or the Database button from the default database. You may remove a sequence from the list by selecting it and pressing the Remove button, or remove all sequences in the list by pressing the Clear button.
The sequence files for assembly should be in DNAMAN format with the ORIGIN keyword (See the Section IV.3. If any of your sequence files does not have the ORIGIN format, check the Load entire file if format unknown option. Trace files (ABI and SCF) derived from automated sequencing are also accepted in sequence assembly.
DNAMAN may remove uncertain regions of a DNA fragment. You may check the remove flanking regions when ACGT% < option and set a percent value. When this option turned on, DNAMAN will calculate the composition of flanking regions of all input sequences. Regions with lower ACGT% will be removed.
You have an option to remove vector sequence from all source sequences. In this case, you must load the vector sequence into the default sequence channel. If there are many vectors for the source sequences, it is recommended to combine all vectors in one sequence file and load it into the default sequence channel. When this option turned on, DNAMAN will compare all input sequences against the vector. Flanking regions containing the vector sequence will be removed. The shorter the vector sequence is, the faster the comparison will be. Therefore, you can accelerate the process by removing unnecessary sequence in the vector.
When you work with a sequencing project, it is suggested to make a database for the project. You can save all the sequences as well as vectors of the project in the database. The database approach makes your sequencing project well organized and facilitates sequence assembly.
You may choose Quick Alignment or End Comparison method for sequence assembly. The Quick Alignment method is recommended in most cases of sequence assembly.
The End Comparison method compares progressively the end of each fragment. No insertion is introduced to the sequence ends. Overlapping is determined according to the qualification criteria. The End Comparison method should be used only for a small number of fragments while the overlaps are short and quality of overlapping is high.
With the Quick alignment method, alignments are performed using the quick alignment algorithm (Wilbur and Lipman, 1983, Proc. Natl. Acad. Sci. USA, 80:726-730) with all sequences in both plus and minus strands. The sequences will be assembled according to the qualification criteria. Inserts will be added in overlapping region if necessary. This method can find the correct contigs even in presence of ambiguity in the sequences.
Parameters involved are:
Click the Assemble button to start searching for overlapping sequences. DNAMAN performs firstly pairwise comparison of all input sequences and keeps updating the results. The number of overlaps found during searching is indicated. The process of sequence assembly is threaded, therefore, you may performing other tasks during sequence assembly.
There are five options for processing sequence assembly.
You may stop pairwise comparison any time during the assembly process by pressing the Stop button.
If all sequences cannot be assembled to one continuous sequence, they may be divided into different groups that are also indicated during the merge process. Isolated sequences will be ignored in the assembly result. Click the Show Result button to display the result in a sequence assembly editor.
The Sequence Assembly Editor facilitates the visualization and editing of assembled sequences. There are three windows in a sequence assembly editor: Name list window, Sequence window and Graphic window.
All assembled sequence names are shown in this window. The corresponding sequence is right beside the name in the Sequence window. You may change the display order of these sequences by drag-and-drop the names to move them up or down. You may also change the name of any sequences by double clicking it.
All sequences are listed in this window. The consensus sequence is place on top of the window. You cannot directly edit the consensus sequence, however you may modify it by editing each overlapping sequence. Any modification of source sequence may result in the change of the consensus sequence.
The edit functions for assembly sequences are the following:
1)Add or remove bases
2)Add or remove gaps
3)Select a block of one sequence and delete it.
4)Change selected sequences to uppercase letters
5)Change selected sequences to lowercase letters
6)Find a sequence in the project.
The graphic window contains the sequence assembly diagram. There are three elements in the graph.
1) A straight line represents the consensus sequence. The length and position of the consensus sequence is indicated at the two ends of the straight line.
You can move the whole diagram within the graphic window by placing the cursor at the beginning of the consensus sequence. When the cursor switches to , press the left mouse button and then drag and drop the diagram to the location you want. You can also change the length of the straight line (consensus sequence). Place the cursor at the end of the consensus sequence. When the cursor switches to east-west arrow shape, press the left mouse button and then drag and drop the end to anywhere
2) If a rectangle appears in the diagram, it indicates an interruption in the consensus sequence. The rectangle is used to divide different sequence groups. You can make the interruption larger or smaller by moving it to left or right. In this case, the sequences in Sequence window will move according to your adjustment.
3) Each source sequence is shown as an arrow line. Arrows indicate the directions of the sequences. By pointing the cursor to an arrow line, you can move it upwards or downwards. The names of all input sequences can be displayed on the arrow or at the left side of the panel.
See the section V.1 for adding text contents in the Sequence Assembly diagram.
Click the Options button in the Graphic window to select options for the editor.
Click the Export button in the Graphic window to export sequences in a text window. You have the option to export all sequences or the consensus sequence only.