• Sequence and Text Files
  • This chapter introduces the DNAMAN sequence files. It also illustrates how to use, handle and edit your sequence files.

    DNAMAN sequence files are in text format. You may edit a sequence file with the DNAMAN text editor or any text editors. In order to visualize or compute the length of any part of sequence, a fixed font must be used in the editor. DNAMAN uses Monaco as the default font the editor. You may change to other fonts if necessary. When you open a sequence file with other word processor, use a fixed font to correctly align the sequence.

    1. Characters in DNA sequence
    2. DNAMAN recognizes the characters “A” (Adenine), “C” (Cytosine), “G” (Guanine), “T” (Thymine) and IUPAC code as components of a DNA sequence. These letters are case insensitive and both uppercase and lowercase letters are accepted.

    3. Characters in protein sequence
    4. DNAMAN recognizes all the alphabetic letters as components of a protein sequence, except the letters “B”, “J”, “O” and “Z”. Letter “X” stands for any amino acid. These letters are case insensitive and both uppercase and lowercase letters are acceptable.

    5. ORIGIN format
    6. The “ORIGIN” keyword is used to separate the sequence content and comments in a sequence file. This keyword has to be placed in a separated line and at the beginning of the line. Any text before the keyword is considered as comments and will not be involved in sequence analyses. If a sequence file does not contain the keyword, you may simply type “ORIGIN” at the beginning of the sequence content.

      For example:

      1) Nucleotide sequence file

      ORIGIN
      1     ATGACAAAAC ACTCATGTAT TACGGGAATG
      31    ATGGTGTCTA TGGATCGTTC AATTGCATCT

      2) Amino acid sequence file

      ORIGIN
      1     MTKHSCITGM MVSMDRSIAS CMIMHMLNQF
      31    SCACESGIEY PATCASASIN V*
      

      In absence of the ORIGIN keyword, a text file may still be used for analysis. The content of the file can be loaded into a sequence channel or used for other analyses (multiple sequence alignment, sequence assembly) upon user's confirmation.

    7. Annotations in sequence
    8. You may define annotations for sequence analysis (Defined Annotation), or place annotations in sequence for visualization (In-sequence Annotation).

      1) Defined Annotation

      Defined Annotations are used in sequence analysis, such as translation of nucleotide sequence to protein sequence, and drawing sequence maps. Defined Annotations are placed before the keyword "ORIGIN". The format is similar to GenBank files. Under the keyword "FEATURES", you may define annotations in a DNAMAN sequence file. For example:

      
      FEATURES
      
      0          mRNA            join(196..486,765..972,1422..1516)
      1                          /name="a-globin"
      2          CDS             join(358..486,765..972,1422..1516)
      3                          /gene="a-globin"
      4          terminator      1835..1837
      5          polyA_signal    1900..1905
      6          polyA_signal    1994..1999
      7          mRNA            join(6008..6230,6318..6522)
      8                          /name="b-globin"
      9          CDS             join(5825..5916,6008..6230,6318..6446)
      10                         /gene="b-globin"
      11         ORIGIN
      12        ...
      

      DNAMAN does not consider all entries under FEATURES as annotation. ALL ANNOTATIONS MUST BE DEFINED IN A SYSTEM FILE (e.g. Annotat.dat). For example:

      
      DNAMAN Sequence Annotation list
      
      0     ////
      1     total number=18
      2     0 Intron
      3     1 Exon
      4     2 RBS
      5     3 CDS
      6     4 Sig_peptide
      7     5 Mat_peptide
      8     6 promoter
      9     7 enhancer
      10    8 polyA_signal
      11    9 terminator
      12    10 5'UTR
      13    11 3'UTR
      14    12 misc_binding
      15    13 protein_binding
      16    14 Repeat_region
      17    15 repeat_unit
      18    16 rep_origin
      19    17 primer_binding
      

      This file is saved in the DNAMAN system folder. You may edit it to add records. Removing records is not recommended, since databases may have used these records and deleting may result in misleading information of annotations.

      2) In-sequence Annotation

      If you have to include some information after the ORIGIN keyword in a text file, you can add it as annotations. An annotation in a sequence will not be recognized as a part of the sequence in analysis, if it is between parentheses '( )'.

      Choose the File | Open command to display a sequence file. After typing the annotations in the sequence, choose the File | Save command to save this file. Annotations can be more than one line, however, each line has to start with '(' and end with ')'.

      Annotations in a sequence do not appear in the windows of sequence analysis results, such as sequence composition, conversions, alignment or translation.

      Reformatting sequence using the Edit |Format | Sequence command eliminates all annotations in a sequence.