| Application Build: 246 Database Build: 2008-04 |
| Home | High-Throughput | Getting Started | Requirements | Installation | Downloads | Command Line | Database | FAQ | News | Citing | GoMiner in Papers | Credits |
| Overview | Database Options | Installing MySQL Locally | Enhancement Algorithm |
GoMiner uses the databases provided by the GO Consortium. These databases combine information from a number of different consortium participants, include information from many different organisms and data sources, and are referenced using a variety of different gene product identification approaches. The current database version is 2008-04.
GoMiner and its supporting scripts include a number of different features to enhance and query the GO database to make it easier to integrate the results of user experiments. This page provides a summary of these enhancements and options. It also provides links to the more detailed descriptions and instructions for users to establish their own local copies of the GO database with these enhancements.
By default, the GoMiner GUI application will query data for information from all organisms and from all data sources. For a particular gene in the input file, GoMiner will search for matches from all of the data sources and organisms in the GO Consortium database to find matching entries, and will then report all of the gene category associations found for those matches. You may wish to restrict your search to particular organism(s), or to a particular data source(s). To do so, make your selections from the Organism and Data Source menus respectively, before you read in the total genes file. There are corresponding filters available on the command-line interface.
GoMiner has three ways that it can enhance the search for genes in the GO Consortium database: Enhanced Names (UniProt Only), Cross Reference and Synonym. You may choose which approach you want with the Lookup Settings menu. This menu is activated after you have loaded the GO terms (File Menu). There are corresponding filters available on the command-line interface. You can use any of these lookup options, alone, or in combination, or none of them at all.
The UniProt (Swiss-Prot, TrEMBL and PIR) Data Source in the GO Consortium database does not include gene names, but uses UniProt entry names instead, and the entry names are tedious to associate with gene names. The Enhanced Names (UniProt Only) feature makes it easier to use gene names (including HUGO) to query the data from UniProt. We created an enhancement that permits seamless use of gene names for the entries that are already present in the UniProt data source. For the human entries, these gene names are further filtered to include only those recognized by HUGO. The enhancement does not involve the addition of any genes that are not already present in UniProt. We have a more detailed description of the algorithms used for this enhancement. In short though, if you are using the UniProt data source, and your input is in the form of gene names, you may want to consider using this option. It has no impact on the other data sources. The disadvantage of the Enhanced Names (UniProt Only) option is that it requires additional revisions to the the database distributed by the GO Consortium. We have these enhancements on the database we make available on our discover server. However, if you want to install your own local copy of the database, you need to run additional scripts to implement this feature.
UniProt used to be known in GoMiner as SPTR. The current GoMiner application can filter by either UniProt or SPTR. The current GoMiner database primarily uses the UniProt flags, although there are a handful with SPTR data source identifiers. If you are using an older build of the database, and you want to select Swiss-Prot and TrEMBL data, use the SPTR option. If you are using our database, or have rebuilt a newer database from GO, you will want to use the UniProt flag.
In addition to the identifiers found in the symbol column of the gene_product table, the GO Consortium database also provides a cross-reference table with additional identifiers. Like the symbol column, this cross-reference table includes many different types of identifiers. A complete list of the available identifier types can be found on the GO site. When the Cross Reference option is selected, GoMiner will search the symbol column for genes you submit, and as well as the cross-reference table. This feature is available using the regular distrubution of the GO Consortium database.
In addition to the identifiers found in the symbol column of the gene_product table, the GO Consortium database also provides a gene product synonym table with additional identifiers. Like the symbol column, this synonym table includes many different types of identifiers. When the Synonym option is selected, GoMiner will search the symbol column for genes you submit as well as the cross-reference table. This feature is available using the regular distribution of the GO Consortium database. This option is also useful if you are using yeast data with ORF identifiers.
We used to have an option called "Original." This style of lookup is still available if you leave all of the lookup options, Enhanced Names (UniProt Only), Cross Reference and Synonym unchecked.
For a number of reasons, you may find it helpful to install your own local copy of the database. Most users find that doing so improves GoMiner's performance substantially over using the database we provide on our discover server. We support local databases using either Derby or MySQL. Some users are interested in updating their database more frequently than we update ours. Other users are interested in making their own customizations. This guide will help you choose the right database configuration. The MySQL installation is a bit more complex, and has additional documentation..
We would like to hear from you. You can reach the team via email.
GoMiner was originally developed jointly by the Genomics and Bioinformatics Group (GBG) of LMP, NCI, NIH and the Medical Informatics and Bioimaging group of BME, Georgia Tech/Emory University. It is now maintained and under continuing development by GBG.