Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

When and Where---Wednesdays at 1pm-2pm Room 438 Library Admin Building Beginning September 10, 2003.


Nucleic Acid Analysis Resources. September 24, 2003

Overview

By clever sleuthing or careful sequencing you obtain a nucleic acid sequence. What tools are available to analyze the sequence? What is it? Has it been mapped to a chromosome? Does it exhibit any codon biases? Does it have a coding region? Are there any vector sequences contaminating it?

 

Introduction/Scope

If we begin with an actual nucleic acid sequence, we can quickly perform a BLAST search to identify it. We might begin with a BLASTN (nucleic acid query versus a nucleic acid database). This search could be narrow or broad in scope. The BLASTN should tell us whether our sequence is similar to any other sequences in the database. In many cases, this will also turn up a putative translation to protein and a host of links to informative resources which might tell us about diseases for instance. If no translation is available we might use the BLASTX (nucleic acid query versus a protein database). If we are lucky the, BLASTX search may indicate some or all of the coding regions which are assembled to make a product. We might as well BLAST the sequence against the chromosomes of the parent species. it's quite possible that links which turn up from these BLAST searches will get us into the information known about this sequence or let us know we have something truly unique. On the other hand, we may discover that our query is "close" to a known sequence but not identical to it. Are the differences reflecting our (sequencing?) errors? Better yet we may see that it's identical to something for which no annotations are yet available. How can we proceed to assess the new sequence? If we have done a species specific search we might look for orthologs and from there we might get our understanding underway.

NCBI has begun a series of curated databases. These databases contain annotations collated by NCBI for a particular gene and often referenced in LocusLink (more on LocusLink later). These sequences are Reference Sequences (RefSeq). The BLAST and ENTREZ searches can locate such sequences. You can recognize them by their distinctive accession number formats (as compared with Genbank Accessions). Here is the RefSeq Accession Key. Note if you use GCG at MUSC, our GENBANK datafiles do NOT contain the RefSeq Entries. If you use ENTREZ to locate sequences, then try to fetch them within GCG you will not be successful. The RefSeq entries are not in the MUSC Genbank archive at present.

Open Reading Frame Finder ORF Finder at NCBI

Electronic PCR at NCBI

Screen a sequence for Vector sequence contamination VecScreen at NCBI

Match mRNA or mRNAs to a genomic Contig with SPIDEY at NCBI

All commercial sequence analysis program suites have a host of DNA tools which perform restriction enzyme mapping, codon bias, translation and a host of additional analyses. If you have access to such software you should probably make use of it because an integrated analysis tool kit will be the most time efficient way for you to examine your sequence. The GCG command line, GCG via SeqLab, GCG via SeqWeb, GCG via W2H, EMBOSS command line EMBOSS via W2H, MACVector, VectorNTI Suite, Sequencher, even old DNAStrider all perform basic functions. By the way ALL of those packages from the above screen shots are available here at MUSC. Currently, however ,many of these same basic functions are actually available from specific web sites (eg Baylor College of Medicine Sequence Analysis tools--no registrations or Biology WorkBench -you have to create an account). There is not currently a site as comprehensive as many of these packages. Besides you STILL have to learn how to use their site/software combination in a handy way.

All of the links below are also listed in various top level or internal pages on the BCR Home Page (http://bcr.musc.edu)

Basic DNA analysis,reformat, statistics, CpG analysis and more are available from the Sequence Manipulation Suite (MUSC mirror)

Restriction Enzyme Cut Site Mapping is available from two decent sites WebCutter and TACG.

Primer design became Web accessible some years ago courtesy MIT Whitehead and Steve Rozen see PRIMER3-MIT , PRIMER3-MUSC

Transcription Factor Binding Site finding, Promoter Location and UTR (from a BCR Web page) analysis are available here.

Gene Finding tools (also from a BCR Web page) that are not BLASTX

Nearly every Bioinformatics-related web server/site out there has something like a "related sites" link. Here's one of the better maintained sites. Amos Bairoch's Bioinformatics Links. Here's a site that's part of the SRS suite at EBI in the UK. Click on the Tools Tab and pick a Nucleic function.


 

A worked Example

Below you have a short sequence to use as a trial for some of the web sites we have been examining. The sequence is already in FASTA format. What is the sequence most like? From which species? Are there Vector contaminants? Are there coding regions in this DNA? Does it match any proteins?
>92403 Unknown for testing
CCCACAGGGGGACCGGCCCTGTGACCCCTCACCGGGGCCGTGGGCCCGAGCCCCGGACTT
CCCTAAGCCGGCAATGACCGCCTGCGCCCGCCGAGCGGGTGGGCTTCCGGACCCCGGGCT
CTGCGGTCCCGCGTGGTGGGCTCCGTCCCTGCCCCGCCTCCCCCGGGCCCTGCGCCGGCT
CCCGCTCCTGCTGCTCCTGCTTCTCCTGCAGCCCCCCGCCCTCTCCGCCGTGTTCACGGT
GGGGGTCCTGGGCCCCTGGGCTTGCGACCCCATCTTCTCTCGGGCTCGCCCGGACCTGGC
CGCCCGCCTGGCCGCCGCCCGCCTGAACCGCGACCCCGGCCTGGCAGGCGGTCCCCGCTT
CGAGGTAGCGCTGCTGCCCGAGCCTTGCCGGACGCCGGGCTCGCTGGGGGCCGTGTCCTC
CGCGCTGGCCCGCGTGTCGGGCCTCGTGTGTCCGGTGATCCCTGCGGCCTGCCGGCCAGC

Here is a section of genomic DNA that's about 7,000bp. This is a section of some
"not quite ready for prime time" sequence but it's all I'm giving you for now.
Is it coding or junk? If coding what protein? What species? Are there CpG islands?
Are there repeat regions, regions of codon bias?


Sample Questions/Data

Those two sequences up above and your own curiosity ought to provoke you to try some of the sites!

 


 

Created by ESH 8-18-2003; updated 8-22-2003 18:30

email to Starr about this page