GBG logo LeFE Build:8 Genomics and Bioinformatics Group

Understanding Your Results

Overview

The output of a LeFEMiner analysis consists of a ranked table in which each row corresponds to a gene category. The results page also displays additional information about the statistics and characteristics of the categories. Here we provide a description of the contents of that results page in order to help you interpret the biological meaning of your LeFEMiner results.

Table Column Headers

  • Category Rank: The rank of the category according to its median category p-value.
  • Category.Name: The category's name
  • Category.Size: The number of genes in your gene expression files that are also in the category
  • Median Category p-Value: The median p-value of the category. This value is used solely for ranking the categories and should not be used as a metric of statistical significance. Typically p-values < 0.04 represent possible biological associations between the category and the biology represented by the uploaded signature file.
  • FDR: Since the category’s median p-value does not take into account multiple comparisons biases, this column represents the statistical significance of finding each category. This is done by permuting the contents of the signature file 4 times and determining that rate at which we exceed each observed Median Category P-value in the unpermuted data. See the original LeFE manuscript for additional details.
  • Original.Category.Order: The original custom category input order. It can be useful if you wish to reorder your custom input categories into the order in which they were uploaded.
  • Top 5 most Important Genes with Importance Scores: For every category, this contains the top 5 most important genes and their importance values. It is useful for understanding exactly which genes may be the most important. Typically importance scores over 0.75 are biologically relevant, but that can very experiment to experiment.
  • Importance.Score Distribution: LeFEMiner creates an Importance Plot for the top 100 categories. The Importance Plots allow the user to compare the distribution of category genes' importance scores with that of the negative control genes. When mouse-clicked, the thumbnail graphics in this column will open in a new window with a full, high-resolution image. More information about those graphics can be found in the Understanding Your Importance Plots section below.

  • Note: This website will only display the top 100 results. A complete copy of your results can be accessed if you follow any of the links to the Results Archive on your results page.

Interpreting Your Importance Plots

Clicking on any one of the small thumbnails in the Importance Score Distribution column of the results table will open a new window with a high-resolution image of the corresponding category's importance plot. Importance Plots are an extremely valuable way of understanding the relative importance of each gene in a category.
Sample Importance Plot
Importance Plots consist of a set of red hash marks below two probability distribution functions (PDF) colored red and black. The hash marks represent the median importance score for each of the category’s genes.
The PDFs are the relative probabilities of finding a gene with a particular importance score. In the above example of Myoblast Fusion (left), the red distribution (category genes) is significantly different in shape from the black distribution (negative control genes).
There are two humps in the monocarboxylic acid transport category’s red distribution. The left hump represents the category’s genes that are of roughly equal importance to the background genes. The right hump represents the component of that category which is significantly more important than the category’s genes. Distributions such as that of Myoblast Fusion could be interpreted to mean that LeFE identified this category to contain genes that are important for the Random Forest model than the background genes. Likewise, sometimes biologically significant results are associatd with a right shifted, unimodal red distribution relative to the negative control distribution.

LeFE™ is a development of the Genomics and Bioinformatics Group, Laboratory of Molecular Pharmacology (LMP), Center for Cancer Research (CCR), National Cancer Institute (NCI). Please email us with any problems, questions or feedback on the tool.

Notice and Disclaimer