This chapter introduces the DNAMAN sequence files. It also illustrates how to use, handle and edit your sequence files.
DNAMAN sequence files are in text format. You may edit a sequence file with the DNAMAN text editor or any text editors. In order to visualize or compute the length of any part of sequence, a fixed font must be used in the editor. DNAMAN uses Monaco as the default font the editor. You may change to other fonts if necessary. When you open a sequence file with other word processor, use a fixed font to correctly align the sequence.
DNAMAN recognizes the characters A (Adenine), C (Cytosine), G (Guanine), T (Thymine) and IUPAC code as components of a DNA sequence. These letters are case insensitive and both uppercase and lowercase letters are accepted.
DNAMAN recognizes all the alphabetic letters as components of a protein sequence, except the letters B, J, O and Z. Letter X stands for any amino acid. These letters are case insensitive and both uppercase and lowercase letters are acceptable.
The ORIGIN keyword is used to separate the sequence content and comments in a sequence file. This keyword has to be placed in a separated line and at the beginning of the line. Any text before the keyword is considered as comments and will not be involved in sequence analyses. If a sequence file does not contain the keyword, you may simply type ORIGIN at the beginning of the sequence content.
For example:
1) Nucleotide sequence file
ORIGIN
1 ATGACAAAAC ACTCATGTAT TACGGGAATG
31 ATGGTGTCTA TGGATCGTTC AATTGCATCT
2) Amino acid sequence file
ORIGIN
1 MTKHSCITGM MVSMDRSIAS CMIMHMLNQF
31 SCACESGIEY PATCASASIN V*
In absence of the ORIGIN keyword, a text file may still be used for analysis. The content of the file can be loaded into a sequence channel or used for other analyses (multiple sequence alignment, sequence assembly) upon user's confirmation.
You may define annotations for sequence analysis (Defined Annotation), or place annotations in sequence for visualization (In-sequence Annotation).
1) Defined Annotation
Defined Annotations are used in sequence analysis, such as translation of nucleotide sequence to protein sequence, and drawing sequence maps. Defined Annotations are placed before the keyword "ORIGIN". The format is similar to GenBank files. Under the keyword "FEATURES", you may define annotations in a DNAMAN sequence file. For example:
FEATURES
0 mRNA join(196..486,765..972,1422..1516)
1 /name="a-globin"
2 CDS join(358..486,765..972,1422..1516)
3 /gene="a-globin"
4 terminator 1835..1837
5 polyA_signal 1900..1905
6 polyA_signal 1994..1999
7 mRNA join(6008..6230,6318..6522)
8 /name="b-globin"
9 CDS join(5825..5916,6008..6230,6318..6446)
10 /gene="b-globin"
11 ORIGIN
12 ...
DNAMAN does not consider all entries under FEATURES as annotation. ALL ANNOTATIONS MUST BE DEFINED IN A SYSTEM FILE (e.g. Annotat.dat). For example:
DNAMAN Sequence Annotation list
0 ////
1 total number=18
2 0 Intron
3 1 Exon
4 2 RBS
5 3 CDS
6 4 Sig_peptide
7 5 Mat_peptide
8 6 promoter
9 7 enhancer
10 8 polyA_signal
11 9 terminator
12 10 5'UTR
13 11 3'UTR
14 12 misc_binding
15 13 protein_binding
16 14 Repeat_region
17 15 repeat_unit
18 16 rep_origin
19 17 primer_binding
This file is saved in the DNAMAN system folder. You may edit it to add records. Removing records is not recommended, since databases may have used these records and deleting may result in misleading information of annotations.
2) In-sequence Annotation
If you have to include some information after the ORIGIN keyword in a text file, you can add it as annotations. An annotation in a sequence will not be recognized as a part of the sequence in analysis, if it is between parentheses '( )'.
Choose the File | Open command to display a sequence file. After typing the annotations in the sequence, choose the File | Save command to save this file. Annotations can be more than one line, however, each line has to start with '(' and end with ')'.
Annotations in a sequence do not appear in the windows of sequence analysis results, such as sequence composition, conversions, alignment or translation.
Reformatting sequence using the Edit |Format | Sequence command eliminates all annotations in a sequence.