DNAMAN provides many methods of searching for sequences. You may search for a set of DNA or protein sequences, and display the results in a graphic window. You may also search for direct repeats, mirror repeats, inverted repeats or stem-loop structures. DNAMAN also searches for amino acid sequences and possible open reading frames of a given DNA sequence.
DNAMAN searches for nucleotide sequences from both strands of the current DNA sequence and presents the searching results in a graphical window or a text window.
Choose the Sequence | Search for | Sequence command to open the Search window, in which you can search for nucleotide sequences and defined consensus sequences as well.
Clicking the Query button opens the Enter Sequence dialog box. Type the nucleotide sequence to search. There are several searching formats (the letters are case insensitive):
type: AGGCGATG search: AGGCGATG
type: AGGCNNNGATG search: AGGCNNNGATG (N = A, C, G, or T.)
type: AGGC(N3)GATG search: AGGCNNNGATG
type: AGGC(N3-10)GATG search: AGGC and GATG with 3 - 10 nucleotides between them.
type: AGGC(X)GATG search: AGGC and GATG with any number of nucleotides between them.
type: AGGC[ACGT/TCGA]GATG search: AGGCACGTGATG or AGGCTCGAGATG
You may also use IUPAC code to search for nucleotides. Choose the Info | Nucleotides command to display the IUPAC code table.
For example:
type: AGGWCGAT (W = A or T.) search: AGGACGAT or AGGTCGAT
Type GATT, for example, in the Search dialog box and then click the OK button to display the list of the found sequences.
DNAMAN allows you to search for more than one sequence and list all sites. Click the Query button again to open the Search dialog box. Type another sequence, e.g. AATAAA, in the dialog box and click the OK button. DNAMAN will add the found sequences in the list of the searching results as well as in the searching diagram. The number following the found sequence indicates the group of the query sequence.
Clicking the Export button displays the searching results in a text window. You can print the text by using the printing command.
There are two panels in the search window: Sequence List and Graphic Presentation.
Sequence List window contains four columns.
You may change the size of each column by resizing the corresponding header.
DNAMAN also displays the searching results in a graphic presentation. There are two lines in the presentation; the upper line (Sequence Line) represents the target sequence and the lower line (Zoom Line) shows the zoom of the upper line.
The graphic presentation is a general graphic document. You may handle it with tools described in the chapter V (Adding text objects, copying graphs ).
The positions of all found subsequences are shown on the Sequence Line. Different subsequence groups are indicated with different colors. If a subsequence is checked in the Sequence List window, its position is marked with an arrow.
How to zoom:Place the cursor on one of the two long-vertical lines, the cursor switches to an up-arrow . By moving the arrow you can focus on a smaller region. While you sliding the up-arrow along on the Sequence Line, the position is indicated on the left corner of the screen.
How to display sequence: Place the cursor on the Sequence Line or Zoom Line. Press and hold mouse left button while moving to select a region. Release the mouse button and a dialogue box appears to show the selected sequence.
How to move graph: Place the cursor at the left end of the Sequence Line. When the cursor switches to , press the left mouse button and then drag and drop the diagram to wherever you want. You may change the relative position of the two lines by moving the Zoom Line. Place the cursor at the left end of the Zoom Line, press the left mouse button and then drag and drop the Zoom Line to an appropriate position.
How to resize graph: You may change the length of the Sequence Line and Zoom Line. Place the cursor at the right end of the Sequence Line or Zoom Line. When the cursor switches to , press the left mouse button and then drag and drop the line to an appropriate position.
You may remove the checked subsequences by clicking the Rem. Check button, or clear the list by using the Rem. All button.
Clicking the Options button opens the Sequence Searching Options dialog box. You may modify the following parameters:
You may search for consensus sequences, such as promoters and regulatory factor binding sites, in the graphic search window.
Clicking the Consensus button opens the Consensus Sequence dialog box. You may select individual consensus by clicking the sequence name. The related information will be displayed in the Sequence Information box. You may also search for the entire list of the consensus sequences by clicking the Search All button.
The consensus sequence information is stored in a data file. You can edit it by clicking the Edit button. There is a simple format for editing the data file. The information for each consensus sequence is separated by the // and consists of three lines, the name, the consensus sequence and the related information. After editing, choose the File | Save command to save the changes.
IUPAC code can be used in a consensus sequence.
You may search for a set of sequences, such as a list of consensus sites. The results are shown in text format.
Choose the Sequence | Search for | Nucleotide Sequence Set or Protein Sequence Set command. A dialog box appears where you may choose the file containing the search list.
Click the File button to browse and select the list file. The list file may be in DNAMAN consensus format or a user-defined format. If DNAMAN consensus sequence list is chosen, you may ignore the parameters thereafter. A sample of DNAMAN consensus file may be found in the CONSENS folder of the DNAMAN program. If User-defined sequence list is chosen, you should enter the information about the list file format. DNAMAN extracts listed sequences according to provided information.
Example 1:
List file content:
CONSENSUS ID SEQUENCE COMMMENTS
ASD23 GGGCGTAACCCATTTTC ! reference book
You should
Example 2:
List file content:
ORIGIN
NAME ID:
ASD23
DNA Sequence:
GGGCGTAACCCATTTTC
Reference:
reference book here
You should
The searching results are shown in a text window. If a sequence is found, DNAMAN displays
DNAMAN does not show any information on the sequences if they are not found.
DNAMAN searches for direct repeat and mirror repeat sequences. Choose the Sequence | Search for | Direct Repeats or Mirror Repeats or Inverted Repeats menu. Type the minimum length (bases) of a repeat sequence. The result shows the repeat sequences found in the current sequence. If a sequence repeats more than twice, the repeat will not be directly indicated in the results. You can find it by checking the position and the sequence of each repeat. For example, if the sequence AAGCTGCGTG appears at positions 50 and 205, and AGCTGCGTG is at 506, the results will be:
Length(bp) Sites Direct Repeat Sequence
1050, 205 AAGCTGCGTG
951, 506 AGCTGCGTG
9206, 506 AGCTGCGT
If the number of repeat sequences is more than 200, DNAMAN will ask you to increase the minimum length of repeat unit, or define a smaller analysis region using the Sequence | Sequence Channel | Analysis Range Definition command (see the section VI.2) for searching.
DNAMAN searches for complementary sequences in the current sequence that potentially form stem loop/hairpin structures. Choose the Sequence | Search for | Stem Loops menu. There are two parameters related to the searching: minimum length (bases) of paired nucleotides in the stem of a hairpin structure and maximum length (bases) of unpaired nucleotides between the complementary sequences.
DNAMAN searches for an amino acid sequence (in one letter code) and its variations from the six reading frames of the current sequence.
Mismatch can be any position in the amino acid sequence.
For example:
Protein consensus sequences
Click the Consensus button in the Search dialog box to open the consensus sequence box. Choose one consensus sequence and it will be loaded to the Search dialog box. You may edit the consensus list. The editing methods for an amino acid sequence are the same as for DNA consensus sequence. For the file format, consult the SEARCHNT.DAT file in the CONSENS folder of the DNAMAN program.
You may search for open reading frames (ORFs) from six reading frames of the current sequence. By default, DNAMAN uses the universal genetic code table for the searching. In some cases, you may want to use CTG instead of ATG or both as the start codons. DNAMAN allows you to make these changes by choosing the Protein | Genetic Code Table command and then selecting an appropriate genetic code table (see the section XI.1 XI.1). DMANAN will search the ORFs according to the start and stop codons in the selected genetic code table.
DNAMAN shows only thetwo largest ORFs from each reading frame.
Possible Open Reading Frame in example (1-886)
Strand RF AA Num Position Sequence
Plus 3 121 219-584 ATG...TAA
Plus 2 119 173-532 ATG...TGA
Plus 1 73 1-222 ......TGA
Plus 2 40 2-124 ......TAA
Plus 3 27 732-815 ATG...TAA
Plus 1 16 721-771 ATG...TGA
Minus 1 109 379-708 ATG...TGA
Minus 2 35 440-547 ATG...TAG
Minus 2 30 653-745 ATG...TAA
Minus 3 15 3-50 ......TGA
Minus 3 14 843-887 ATG......
Minus 1 8 265-291 ATG...TAA
If you want to overview all the ORFs of the current sequence, you can use the Protein | Overview command to display the six reading frames in a graphic presentation (see the section XI.2).