| Application Build: 246 Database Build: 2008-04 |
| Home | High-Throughput | Getting Started | Requirements | Installation | Downloads | Command Line | Database | FAQ | News | Citing | GoMiner in Papers | Credits |
| Overview | Database Options | Installing MySQL Locally | Enhancement Algorithm |
The UniProt (Swiss-Prot, TrEMBL and PIR) Data Source in the GO Consortium database uses UniProt entry names that are tedious to associate with gene names. It is often convenient to reference genes by their short name or symbol. We added the enhanced name feature to make it easier to use gene names to query the data from UniProt. We created an enhancement that permits seamless use of gene names for the entries that are already present in the UniProt data source. For the human entries, these gene names are further filtered to include only those recognized by HUGO. The enhancement does not involve the addition of any genes that are not already present in UniProt. We provide users of GoMiner the options of using the enhancement on our server or of installing the enhancement on a local server.
Figure 1 shows the overall flow of the enhancement process. A Unix shell script downloads data files from the UniProt and HUGO web servers, parses the UniProt files, constructs a mapping between the identifiers. For human entries, it filters the gene names to eliminate those that are not HUGO names. The script creates two different mappings to gene names, one uses HUGO for the human filtering, the other uses MatchMiner.
Figure 1: Flow Diagram for Gene Association Enhancement
For most species, the two versions perform the same. For human, the HUGO version is usually more up to date, since the latest HUGO file is downloaded as part of the processing. The MatchMiner version is able to draw from a broader collection of databases, but the files used are not updated as frequently. The mapping file is fairly simple, and users could decide to create one to support other kinds of identifiers.
A Java program creates an enhanced version of the original GO Consortium database. This program adds a new column to the Gene_Product table and populates it with the gene names. The program joins the UniProt data with the data provided by the GO Consortium by using the Entry Name field in UniProt and the symbol column from GO as a common element. If there is no available gene name, then the original UniProt Entry Name is used (that is, the entry in the symbol column is copied into the officialname column). If the Enhanced Names option is chosen from the Lookup settings menu(Figure 3), then this additional column will be used as part of GoMiner's searches for gene product associations.
| id | symbol | dbxref_id | species_id | full_name | officialname |
|---|---|---|---|---|---|
| 688363 | P2X1_HUMAN | 713272 | 16 | P2X purinoceptor 1 | P2RX1 |
| 688364 | TPMT_HUMAN | 713274 | 16 | Thiopurine S-methyltransferase | TPMT |
| 688365 | P2Y4_HUMAN | 713276 | 16 | P2Y purinoceptor 4 | P2RY4 |
| 688366 | BRC2_HUMAN | 713279 | 16 | Breast cancer type 2 susceptibility protein | BRCA2 |
| 757437 | KAPB_MOUSE | 779097 | 25 | cAMP-dependent protein kinase, beta-catalytic subunit | PRKACB |
| 760719 | KAP0_RAT | 782412 | 49 | cAMP-dependent protein kinase type I-alpha regulatory chain | PRKAR1A |

Figure 3: Use the Lookup Settings to use the UniProt Gene Name Association Enhancement
We would like to hear from you. You can reach the team via email.
GoMiner was originally developed jointly by the Genomics and Bioinformatics Group (GBG) of LMP, NCI, NIH and the Medical Informatics and Bioimaging group of BME, Georgia Tech/Emory University. It is now maintained and under continuing development by GBG.