GoMiner    Application Build: 457     Database Build: 2011-01     Genomics and Bioinformatics Group GBG Logo

High-Throughput GoMiner Process Overview

High-Throughput GoMiner carries out a number of processing steps. All of these (except as noted) are the same for both the command-line and the web interfaces. The following text and graphic describe this process flow.

  1. Configuration: In the command-line version, parameters for running the program are established by editing a configuration file. In the web version, the user selects the parameters from the web interface, and the appropriate configuration file is generated.
  2. Quality Assurance: The program checks the total- and changed-gene files for various types of errors, including gene name formatting errors. We recently showed that Excel can inadvertently alter gene identifiers as a result of default date and floating point format conversions. High-Throughput GoMiner protects the user from those errors by scanning the input identifiers for probable instances of such conversions. The command-line application will exit with an error code if a problem is detected. The web application will return a web page describing an error when one is detected.
  3. GoMiner Execution: The command-line interface of an instance of GoMiner is invoked to generate a gene-category export file that is used for internal processing. The total-gene file functions as both the total- and changed-gene input files.
  4. Random-Gene File Generation for Computing FDR: A set of random-gene files is generated by sampling the genes in the total-gene file. Each random-gene file contains the same number of genes as the changed-gene file. The random-gene files are used for computing the false-discovery rate (FDR).
  5. Mapping Genes to GO Categories: The genes in the changed- and random-gene files are joined with the entries in the gene-category export file generated in step 3. The Gene Ontology contains categories called "obsolete" and "unknown." To avoid introducing errors in subsequent statistical computations, we add a processing step in which genes are removed if they appear only in "obsolete" or "unknown" categories. The net effect is to expunge those genes from further consideration in both the total- and changed-gene files.
  6. Result Integration: Reports that integrate the results computed from the multiple changed-gene files are generated. These integrated reports include estimations of FDR, files from which clustered image maps (CIMs) can be generated, and data from external resources, such as transcription factors.
  7. Email Notification: In the web version, once the reports are generated, the user is sent an email message with a link from which to download the results.

A Graphical View of the Process Flow

A Graphical View of the Process Flow

GoMiner™ is a development of the Genomics and Bioinformatics Group, Developmental Therapeutics Branch (DTB), Developmental Therapeutics Program (DTP), Center for Cancer Research (CCR), National Cancer Institute (NCI). We would like to hear from you. You can reach the team via email.

GoMiner was originally developed jointly by the Genomics and Bioinformatics Group (GBG) of LMP, NCI, NIH and the Medical Informatics and Bioimaging group of BME, Georgia Tech/Emory University. It is now maintained and under continuing development by GBG.

Notice and Disclaimer