For ID mapping,
input IDs are mapped directly to genes from KEGG GENES.
For sequence similarity mapping, each input sequence is BLASTed
against all sequences in KEGG GENES.
The default cutoffs are BLAST E-value < 10^-5 and rank ≤ 5.
Users can customize the threshed according to varying demands.
The second program “identifies” statistically significantly enriched
pathways and diseases by comparing results from the first program against the background, which is usually the genes from whole genome,
or all probe sets on a microarray.
As previously mentioned, one of KOBAS 2.0’s advantages is that it can
use sequence similarity mapping
to annotate input genes from species that are not yet well-represented in existing pathway databases.
It can also map the genes from other species to human diseases
to predict whether these genes may be good candidates to study any human diseases.
In this paper , the author exemplified how to realize this function in KOBAS 2.0.
The researchers analyzed the microarray expression profiles in rhesus monkeys in two major hippocampal subdivisions critical for memory and
cognitive function:
cornu ammonis (CA) and dentate gyrus (DG).
They identified 371 up-regulated genes
and then used both DAVID and KOBAS 2.0 to identify enriched pathways and diseases.
DAVID can only perform ID mapping to rhesus genes in its two pathway
databases KEGG PATHWAY and Panther,
and as a result, identified no statistically significantly enriched pathways or diseases.
On the other hand, KOBAS 2.0 supports sequence similarity mapping by BLAST to annotate the rhesus gene set
and can thus take full advantage of the abundant data on human pathways and diseases.
130 of 371 genes are mapped to existing pathways and diseases,
with 61 genes related to pathways
and 30 genes involved in diseases after statistical tests.
These results are consistent with known functional differences between the two regions.
The authors also compared KOBAS 2.0 with popular GO enrichment analysis tools
including FuncAsspciate 2.0, Ontologizer 2.0, BiNGO, and EASE,
showing that enriched pathways identified by KOBAS 2.0 is more specific and informative than than other analysis tools.
KOBAS 2.0 hence offers more insights into the biological processes.
In conclusion, KOBAS 2.0 is an optimal software to annotate and analyze
pathways and diseases.
Our references include KOBAS 2.0 and KOBAS 1.0
It can be accessed at the site showed here。
At last,I express my appreciation to all of our group members,
to TA Meng Wang, to Dr. Ge Gao and Dr. Liping Wei
Thank you.
Now I’ll show you how to use kobas website
First , go to the homepage of kobas
click “Annotate” button in the left corner and this will direct us to the page of annotation.
To run it correctly, we shall choose proper input file format
which can be gene fasta sequences
or tabular blast.
or else the ID list.
We choose the Gene ID for example
and upload a file from the local disk.
The species should be Human
Click “Run”
and here goes the annotation about the genes
The annotation lists the genes corresponding to every ID in the uploading file
Click on specific gene to view more comprehensive information
including gene name, genetic definition and so on.
Cross-linking to other databases is also provided
Then we can do hypothesis test according to these annotation information
in order to find out the most possible pathways or diseases related with the query gene.
On the top of the output results, we use this file as identifys sample input
and you can click "show available database according to the species used in Sample Input”.
Here we could see lots of databases available
The users could select certain databases according to their needs.
Here, we run it with the options all by default
Click "Run"
This process may take a few minutes.
After the running is finished, we can check the output results in analysis history on the left.
The result lists the term, stored database,
ID, sample number,
input background number, p value and corrected p value.
Accordingly, we could click to check the information related to diseases.
In theory, smaller p value means more credible identification result.
So we can sort the corrected p value in ascending order
The users could choose certain P value according to their needs.
and find out the most possible pathway or disease the gene might be involved in.
Then we can we can click each term for more detailed information.
and thus this will contribute to further research
Here’s our presentation. Thank all of you!