• Sequence Search
  • DNAMAN provides many methods of searching for sequences. You may search for a set of DNA or protein sequences, and display the results in a graphic window. You may also search for direct repeats, mirror repeats, inverted repeats or stem-loop structures. DNAMAN also searches for amino acid sequences and possible open reading frames of a given DNA sequence.

    1. Searching for nucleotide sequences
    2. DNAMAN searches for nucleotide sequences from both strands of the current DNA sequence and presents the searching results in a graphical window or a text window.

      Choose the Sequence | Search for | Sequence command to open the Search window, in which you can search for nucleotide sequences and defined consensus sequences as well.



    3. Query formats
    4. Clicking the Query button opens the Enter Sequence dialog box. Type the nucleotide sequence to search. There are several searching formats (the letters are case insensitive):

      type: AGGCGATG search: AGGCGATG

      type: AGGCNNNGATG search: AGGCNNNGATG (N = A, C, G, or T.)

      type: AGGC(N3)GATG search: AGGCNNNGATG

      type: AGGC(N3-10)GATG search: AGGC and GATG with 3 - 10 nucleotides between them.

      type: AGGC(X)GATG search: AGGC and GATG with any number of nucleotides between them.

      type: AGGC[ACGT/TCGA]GATG search: AGGCACGTGATG or AGGCTCGAGATG

      You may also use IUPAC code to search for nucleotides. Choose the Info | Nucleotides command to display the IUPAC code table.

      For example:

      type: AGGWCGAT (W = A or T.) search: AGGACGAT or AGGTCGAT

      Type “GATT”, for example, in the Search dialog box and then click the OK button to display the list of the found sequences.

      DNAMAN allows you to search for more than one sequence and list all sites. Click the Query button again to open the Search dialog box. Type another sequence, e.g. AATAAA, in the dialog box and click the OK button. DNAMAN will add the found sequences in the list of the searching results as well as in the searching diagram. The number following the found sequence indicates the group of the query sequence.

      Clicking the Export button displays the searching results in a text window. You can print the text by using the printing command.

      There are two panels in the search window: Sequence List and Graphic Presentation.



    5. Sequence list window
    6. Sequence List window contains four columns.

      You may change the size of each column by resizing the corresponding header.



    7. Graphic presentation
    8. DNAMAN also displays the searching results in a graphic presentation. There are two lines in the presentation; the upper line (Sequence Line) represents the target sequence and the lower line (Zoom Line) shows the zoom of the upper line.

      The graphic presentation is a general graphic document. You may handle it with tools described in the chapter V (Adding text objects, copying graphs…).

      The positions of all found subsequences are shown on the Sequence Line. Different subsequence groups are indicated with different colors. If a subsequence is checked in the Sequence List window, its position is marked with an arrow.

      How to zoom:Place the cursor on one of the two long-vertical lines, the cursor switches to an up-arrow . By moving the arrow you can focus on a smaller region. While you sliding the up-arrow along on the Sequence Line, the position is indicated on the left corner of the screen.

      How to display sequence: Place the cursor on the Sequence Line or Zoom Line. Press and hold mouse left button while moving to select a region. Release the mouse button and a dialogue box appears to show the selected sequence.

      How to move graph: Place the cursor at the left end of the Sequence Line. When the cursor switches to , press the left mouse button and then drag and drop the diagram to wherever you want. You may change the relative position of the two lines by moving the Zoom Line. Place the cursor at the left end of the Zoom Line, press the left mouse button and then drag and drop the Zoom Line to an appropriate position.

      How to resize graph: You may change the length of the Sequence Line and Zoom Line. Place the cursor at the right end of the Sequence Line or Zoom Line. When the cursor switches to , press the left mouse button and then drag and drop the line to an appropriate position.

      You may remove the checked subsequences by clicking the Rem. Check button, or clear the list by using the Rem. All button.



    9. Sequence searching options
    10. Clicking the Options button opens the Sequence Searching Options dialog box. You may modify the following parameters:



    11. Searching for consensus sequences
    12. You may search for consensus sequences, such as promoters and regulatory factor binding sites, in the graphic search window.

      Clicking the Consensus button opens the Consensus Sequence dialog box. You may select individual consensus by clicking the sequence name. The related information will be displayed in the Sequence Information box. You may also search for the entire list of the consensus sequences by clicking the Search All button.

      The consensus sequence information is stored in a data file. You can edit it by clicking the Edit button. There is a simple format for editing the data file. The information for each consensus sequence is separated by the “//” and consists of three lines, the name, the consensus sequence and the related information. After editing, choose the File | Save command to save the changes.

      IUPAC code can be used in a consensus sequence.



    13. Searching for a set of sequences
    14. You may search for a set of sequences, such as a list of consensus sites. The results are shown in text format.

      Choose the Sequence | Search for | Nucleotide Sequence Set or Protein Sequence Set command. A dialog box appears where you may choose the file containing the search list.

      Click the File button to browse and select the list file. The list file may be in DNAMAN consensus format or a user-defined format. If DNAMAN consensus sequence list is chosen, you may ignore the parameters thereafter. A sample of DNAMAN consensus file may be found in the CONSENS folder of the DNAMAN program. If User-defined sequence list is chosen, you should enter the information about the list file format. DNAMAN extracts listed sequences according to provided information.

      Example 1:

      List file content:

      CONSENSUS ID   SEQUENCE             COMMMENTS
      ASD23         GGGCGTAACCCATTTTC  ! reference book
      …
      …
      …
      

      You should

      Example 2:

      List file content:

      ORIGIN
      NAME ID:
        ASD23
      DNA Sequence:
        GGGCGTAACCCATTTTC
      Reference:
        reference book here
      
      …
      …
      …
      

      You should

      The searching results are shown in a text window. If a sequence is found, DNAMAN displays

      DNAMAN does not show any information on the sequences if they are not found.



    15. Searching for repeat sequences
    16. DNAMAN searches for direct repeat and mirror repeat sequences. Choose the Sequence | Search for | Direct Repeats or Mirror Repeats or Inverted Repeats menu. Type the minimum length (bases) of a repeat sequence. The result shows the repeat sequences found in the current sequence. If a sequence repeats more than twice, the repeat will not be directly indicated in the results. You can find it by checking the position and the sequence of each repeat. For example, if the sequence AAGCTGCGTG appears at positions 50 and 205, and AGCTGCGTG is at 506, the results will be:

      Length(bp)   Sites   Direct Repeat Sequence
      1050,        205          AAGCTGCGTG
       951,        506          AGCTGCGTG
      9206,        506          AGCTGCGT
      

      If the number of repeat sequences is more than 200, DNAMAN will ask you to increase the minimum length of repeat unit, or define a smaller analysis region using the Sequence | Sequence Channel | Analysis Range Definition command (see the section VI.2) for searching.

    17. Searching for Stem-Loop structures
    18. DNAMAN searches for complementary sequences in the current sequence that potentially form stem loop/hairpin structures. Choose the Sequence | Search for | Stem Loops menu. There are two parameters related to the searching: minimum length (bases) of paired nucleotides in the stem of a hairpin structure and maximum length (bases) of unpaired nucleotides between the complementary sequences.



    19. Searching for amino acid sequences
    20. DNAMAN searches for an amino acid sequence (in one letter code) and its variations from the six reading frames of the current sequence.

      Mismatch can be any position in the amino acid sequence.

      For example:

      Protein consensus sequences

      Click the Consensus button in the Search dialog box to open the consensus sequence box. Choose one consensus sequence and it will be loaded to the Search dialog box. You may edit the consensus list. The editing methods for an amino acid sequence are the same as for DNA consensus sequence. For the file format, consult the SEARCHNT.DAT file in the CONSENS folder of the DNAMAN program.



    21. Searching for open reading frames
    22. You may search for open reading frames (ORFs) from six reading frames of the current sequence. By default, DNAMAN uses the universal genetic code table for the searching. In some cases, you may want to use CTG instead of ATG or both as the start codons. DNAMAN allows you to make these changes by choosing the Protein | Genetic Code Table command and then selecting an appropriate genetic code table (see the section XI.1 XI.1). DMANAN will search the ORFs according to the start and stop codons in the selected genetic code table.

      DNAMAN shows only thetwo largest ORFs from each reading frame.

      Possible Open Reading Frame in example (1-886)

      Strand  RF  AA Num  Position    Sequence
       Plus    3   121    219-584     ATG...TAA
       Plus    2   119    173-532     ATG...TGA
       Plus    1   73       1-222     ......TGA
       Plus    2   40       2-124     ......TAA
       Plus    3   27     732-815     ATG...TAA
       Plus    1   16     721-771     ATG...TGA
      Minus    1   109    379-708     ATG...TGA
      Minus    2   35     440-547     ATG...TAG
      Minus    2   30     653-745     ATG...TAA
      Minus    3   15       3-50      ......TGA
      Minus    3   14     843-887     ATG......
      Minus    1   8      265-291     ATG...TAA

      If you want to overview all the ORFs of the current sequence, you can use the Protein | Overview command to display the six reading frames in a graphic presentation (see the section XI.2).