GoMiner    Application Build: 246     Database Build: 2008-04     Genomics and Bioinformatics Group GBG Logo

Enhanced UniProt Gene Name Associations:

Motivation:

The UniProt (Swiss-Prot, TrEMBL and PIR) Data Source in the GO Consortium database uses UniProt entry names that are tedious to associate with gene names. It is often convenient to reference genes by their short name or symbol. We added the enhanced name feature to make it easier to use gene names to query the data from UniProt. We created an enhancement that permits seamless use of gene names for the entries that are already present in the UniProt data source. For the human entries, these gene names are further filtered to include only those recognized by HUGO. The enhancement does not involve the addition of any genes that are not already present in UniProt. We provide users of GoMiner™ the options of using the enhancement on our server or of installing the enhancement on a local server.

Process Flow:

Figure 1 shows the overall flow of the enhancement process. A Unix shell script downloads data files from the UniProt and HUGO web servers, parses the UniProt files, constructs a mapping between the identifiers. For human entries, it filters the gene names to eliminate those that are not HUGO names. The script creates two different mappings to gene names, one uses HUGO for the human filtering, the other uses MatchMiner.

Flow Diagram for Gene Association Enhancement
Figure 1: Flow Diagram for Gene Association Enhancement

For most species, the two versions perform the same. For human, the HUGO version is usually more up to date, since the latest HUGO file is downloaded as part of the processing. The MatchMiner version is able to draw from a broader collection of databases, but the files used are not updated as frequently. The mapping file is fairly simple, and users could decide to create one to support other kinds of identifiers.

A Java program creates an enhanced version of the original GO Consortium database. This program adds a new column to the Gene_Product table and populates it with the gene names. The program joins the UniProt data with the data provided by the GO Consortium by using the Entry Name field in UniProt and the symbol column from GO as a common element. If there is no available gene name, then the original UniProt Entry Name is used (that is, the entry in the symbol column is copied into the officialname column). If the Enhanced Names option is chosen from the Lookup settings menu(Figure 3), then this additional column will be used as part of GoMiner's searches for gene product associations.

id symbol dbxref_id species_id full_name officialname
688363 P2X1_HUMAN 713272 16 P2X purinoceptor 1 P2RX1
688364 TPMT_HUMAN 713274 16 Thiopurine S-methyltransferase TPMT
688365 P2Y4_HUMAN 713276 16 P2Y purinoceptor 4 P2RY4
688366 BRC2_HUMAN 713279 16 Breast cancer type 2 susceptibility protein BRCA2
757437 KAPB_MOUSE 779097 25 cAMP-dependent protein kinase, beta-catalytic subunit PRKACB
760719 KAP0_RAT 782412 49 cAMP-dependent protein kinase type I-alpha regulatory chain PRKAR1A

Figure 2: Revised Database Table for the UniProt Gene Name Association Enhancement.
Original Elements from the GO Consortium are Shown in Black. The Enhancement is Shown in Red.

Use the Lookup Settings to use the UniProt Gene Name Association Enhancement

Figure 3: Use the Lookup Settings to use the UniProt Gene Name Association Enhancement


We would like to hear from you. You can reach the team via email.

GoMiner was originally developed jointly by the Genomics and Bioinformatics Group (GBG) of LMP, NCI, NIH and the Medical Informatics and Bioimaging group of BME, Georgia Tech/Emory University. It is now maintained and under continuing development by GBG.

Notice and Disclaimer