Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and Worldwide.

When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September 10, 2003.


Overview NON-human Genome Resources November 19, 2003


 

Introduction/Scope

The short answer is that there are a LOT of projects out there. The technology that sequenced the human genome is still out there and now it's being harnessed to examine the next series of useful genomes. The agricultural species of plant and animal are being worked on as are the pathogens and parasites with global impact. There is considerable scientific as well as political discourse about the projects to be pursued and the order of their pursuit.

Here is a casual listing of Genome pages/projects. The main thing you notice as you click through just this partial listing is that there is a HUGE variability in the progress and access to data. NCBI and ENSEMBL are far and away the best integrated sites for analysis via the web. The vast majority of non-bacterial genomes have only a limited coverage of their respective genomes and this annoying fact of life is just something you have to deal with. The encouraging thing is that with a wide array of open source software out there, you can without reinventing the wheel too much, adapt software to your own project.

Your favorite bacterium revealed

 

[T] - TaxTable; [P] - ProtTable; [C] - COG Table; [D] - 3-D neighbors; [L] - BLAST; [S] - CDD search [F] - FTP

Examples for E coli K12

T Taxonomic distribution
P Gene by gene Table of Genes
C COG summary
D Proteins with known structure
L BLAST Query page for that genome
S Structure Conserved Domains
F Download as FASTA format the whole genome

BLASTING some or all the microbial genomes.

PDB neighbors for bacterial proteins

Bacterial genome sequencing projects listing

A logical question to ask is how similar are two species. The NCBI TAXPLOT tool is designed to help visualize an answer to this problem.

Homologene You can download all of the Homologene content

COG/KOG Tool New. Clusters of classified organisms Old COG page COG Help

Virus Resources Naming/Taxonomy Conventions

What is a virus? Influenza Virus Replication Summary. Viral genome statistics. Viruses by Taxonomic Group

BLAST versus Viral genomes. Related viral proteins Virus COGs = VOGS.


 

A worked Example

Let's go to the Homologene page and search for CASP7

This gives us a few hits

Some more of the CASP7 Homolgene entry

Let's try a similar search via the COG/KOG tool

 

From this page we can select KOGNITOR and perform a BLAST search against the KOG database. Here is the link to the CASP7 sequence

 

Clicking on KOG3573 shows the taxonomic distribution of the hits. Note there are no yeast,or plant COG hits.

 

One level further shows

 


Pathways in COG

Phylogenetic page allows you to perform some interesting selection procedures. For example suppose you were looking for potential drug targets in gram positive bacteria which would have no effect on humans. You might start by using this tool. In the interactive page you see three icons in each cell of the table. These represent the applied selection criteria. The Green orb is a check meaning must be present. The red x means must be absent. The yellow orb means you do not care. The reset button is in the upper left. In the image below the yellow orb in the size cell is checked and 66 COGs are indicated. These show 13 in Archaea, 12 in Grampos and 7 in alpha for instance. Clicking the "show" button displays all 66.

Here is the same Phylogenetic window set up this time select COGs which exist in grampos bacteria (12) but are absent in eukaryotes (3 with red x) and the rest are marked as neutral (yellow orb).

The Show button displays 70 COGs which fit the selection criteria. Note that the authors have elected to display results with two similar pastel color schemes that actually represent two distinct features. On the left of the image below the colors map to broad groupings of species

Species colors

Gene Function Colors

, whereas on the right the colors code for functional categories.

This is not helpful to the viewer to say the least.

Below are the 70 COGs. Again the left pastels are organisms the right pastels are functional groups.

From this set of 70 COGs you might for instance seek a target for penicillin. This is wonderful bioinformatics but what if the drug is hyperallergenic?

If you examine one of the COGs eg COG0195 you see. The minuses in the pink bar highlighted at the top correspond to the eukaryota section where no entries have been selected. The yellow bar at the top represents the Archaea with 13 entries. In the phylogram on the right the 13 yellow diamonds correspond to the 13 Archaea COGs that have been selected. Off the view at the bottom of this screen shot are 12 grampos entries. If the full tree were visible you could count the 12 light blue grampos COGs. The tree is not linked but the left hand table entries are linked to archived COG/BLAST searches

Using TAXPLOT is interesting as a nifty comparison tool.

You can select species and then either genes or function and you will see:

Here is a result comparing three species of bacteria and selecting ion transport proteins. You can see the matches and you can see the BLink to the lower right.

Looking up SARS virus example Start at the VIRUS homepage and enter SARS

Click on the Complete Genome Link

Starting with ENTREZ search for Influenza A virus

Retrovirus Genomes

Retrovirus BLAST to identify genoptype.

Align a sequence to retroviral genomes

Specific pages for retroviruses

Country distribution of HIV-1


Sample Questions/Data

Use the COG phylogenetic table to find targets for drugs in Gram positive bacteria. These would need to be essential for grampos bacteria but absent from archea for instance.

See if you can locate any sequences from the 1918 influenza epidemic.

What type Virus is the PARVOVIRUS?

7 clones of the same isolate of HIV-2 subtype A have been partially sequenced. All of them are from a German woman infected via heterosexual contact with a Senegalese man in 1983. The woman has since died of AIDS. Both the woman, and the Senegalese man had AIDS at the time blood samples were collected. Human immunodeficiency virus type 2 (HIV-2) has been sequenced (complete proviral genome, Accn. M30502; this sequence is also an isolate from Germany, but the person probably got infected in Mali). How many differences in the sequence are approximately found for each of the 7 isolates in the envelope glycoprotein (env) gene ? This is an alignment problem.


 


 

Created by ESH 8-18-2003; updated 11-17-2003 11:30

email to Starr about this page