This chapter explains how to handle DNA and protein sequences in DNAMAN.
You may choose a sequence once and perform many kinds of analyses on it. This sequence must be loaded into a sequence channel. DNAMAN provides 20 sequence channels to keep active sequences in memory. You can simultaneously analyze and compare all sequences in the channels. You may also switch one sequence to another during analysis without going back to disk files.
A sequence channel contains not only the sequence, but also related information: 1) Sequence name, 2) sequence type (DNA or Protein), 3) Linear or circular DNA, 4) Analysis region and 5) Annotations.
There are several methods to load sequences into sequence channels. They are shown in the six sub-menus of Sequence | Load Sequence menu.
If a sequence file is opened in a text window, the From Selection command is activated by selecting a sequence content. Choose the activated Sequence | Load Sequence | From Selection command to load the selection into the default sequence channel. You may load any part of text into a sequence channel. If annotations present in the selection, they will be automatically excluded from the sequence content. The "ORIGIN" keyword is ignored and has no effect.
When should I use manual loading?
DNAMAN sequence files (with ORIGIN format), or GCG files can be retrieved directly from disk.. Choosing the Sequence | Load Sequence | From Sequence File or GCG File command will open an Open File dialog box.
Choose the sequence file and press the Open button to load the sequence. If the file name is not in sequence type (*.seq), you may select all files (*.*) in the File of type section.
A GenBank file may contain more than one sequence. The sequences all start with the ORIGIN, and end with //. Annotations may also be described in the file. Any sequence in the file can be directly loaded into the DNAMAN memory using Sequence | Load Sequence | From GenBank File. When you click a sequence name in the sequence list box, the sequence information appears in the Information box. Click the Load button to load this sequence into the default sequence channel.
The Sequence | Load Sequence | From Database command is activated by selecting a database as the default database. Use this command to open a DNAMAN Database dialog box. When you click a record name, the information of the record appears in the Information box. Click the Load Seq button to load this sequence into the default sequence channel.
You may simultaneously load many sequences into sequence channels using the Sequence | Load Sequence | Multiple command. These sequences can be in one or many files (e.g. multiple alignment files and Genbank files may contain many sequences). The sequences will be loaded into the default channel and the next available channels.
You may change the sequence properties using the Analysis Definition function.
Choosing the Sequence | Current Sequence | Analysis Definition command opens the Analysis Definition dialog box.
You can define the following properties of a sequence:
DNAMAN can display the current sequence and its related sequences such as reverse complement sequence and double stranded sequence. When a single-stranded sequence is shown, DNAMAN also indicates its nucleotide composition and the predicted molecular weights.
Choosing the Sequence | Display | Sequence and Composition command invokes a dialog box. You may check six options for DNA sequences: 1) Sequence and Composition; 2) reverse sequence; 3) complementary sequence; 4) reverse complementary sequence; 5) double stranded sequence and 6) RNA sequence. If the default sequence is Protein, only the first option is available.
If annotations have been defined in a sequence, you have options to include or exclude any of these annotations. The required sequence information will be displayed in a text window. With the features of Excluding and Display Only sequence annotations, you may exclude some sequences (e.g. Introns) from the original sequence, or get exclusively some interesting regions (e.g. Exons).
You may draw a map of the current sequence using the Sequence | Draw Sequence Map command. If elements have been defined within the sequence, DNAMAN will incorporate them in the map. You may also add or remove the elements by editing the map. Sequence maps are in DMP format. See the section X.4 about how to modify a DMP file.
You may visualize the thermodynamic properties of a DNA molecule using the Sequence | Plot DNA Properties menu. The plot is drawn in a graphic window. See the section V.5 about handling of the plot window.
For searching the homology of a sequence, you may access the BLAST E-mail Server by sending a query sequence to the National Center for Biotechnology Information (NCBI) at the National Library of Medicine. The query sequence will be compared with the DNA or protein sequence databases in NCBI using Basic Local Alignment Search Tool (Blast). You may find more information about the BLAST server from the NCBI web site. You may also refer to the publication of Altschul, S.F. et al., 1990. J. Mol. Biol. 215:403.
DNAMAN allows you to format the default sequence into a Blast document for accessing the BLAST Server. There are five formats:
1)Blastn compares a nucleotide query sequence against a nucleotide sequence database.
2)Blastx compares the six-frame conceptual translation products of a nucleotide query sequence against a protein sequence database.
3)Tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
4)Blastp compares an amino acid query sequence against a protein sequence database.
5)Tblastn compares a protein query sequence with a nucleotide sequence database dynamically translated in all six reading frames.
For example:
Blastn
e-mail to: blast@ncbi.nlm.nih.gov
PROGRAM blastn
DATALIB nr
BEGIN
>Nt sequence of EXAMPLE1
TTTGACTGCCACTTCCTCGATGAAGGTTTTACTGCCAAG
Blastp
e-mail to: blast@ncbi.nlm.nih.gov
PROGRAM blastp
DATALIB nr
BEGIN
>Protein sequence
FDCHFLDEGFTAKDILDQKINEVSSSDDKDAFYVADLGDILK
The Blast documents are text files. Save a Blast document and send it to the Blast E-mail Server whose address is displayed in the first line. To obtain instructions on using the Blast E-mail Server, send a message to the following address: blast@ncbi.nlm.nih.gov.
You may use the Sequence | Current Channel | Analysis Definition command to define any region of the default sequence as a query sequence. You may also modify the content of a Blast document by deleting or adding any nucleotide or amino acid sequence.
The BLAST documents are derived from a template file (blasttem.dat) that is stored in the DNAMAN system folder. You may modify the template if necessary.