GBG logo LeFE Build:8 Genomics and Bioinformatics Group

Frequently Asked Questions

Running LeFE:

  1. Help, I have no idea what I’m doing. What can I do?
  2. What if I don’t want to use Gene Symbols of Affymetrix probe IDs as my gene identifiers?
  3. What if I don’t have the minimum number of samples to run LeFE?
  4. What version of Gene Ontologies does LeFE use?
  5. What if my gene expression file has multiple probes for some of the genes?
  6. What takes so long for LeFE to run?

After LeFE Has Run:

  1. Where are my results? I didn't get an email back from the LeFE website!
  2. Why do I get slightly different results when I run the same dataset through LeFE twice?
  3. How do I interpret my results when an ensemble gets a median p-value of zero?
  4. Can I use the median p-value as a representation of the statistical significance of finding an ensemble?
  5. How does LeFE compute an FDR?
  6. Question: Why do some genes have importance scores less than zero?
  7. How do I cite LeFE?
  8. Wow, I love LeFE. Who do I have to thank?

Running LeFE:

Question: Help, I have no idea what I’m doing. What can I do?
Answer: Don’t worry, the LeFE web application is designed to make this tool as simple as possible. First read over the paper and the How It Works page. Next, take a stab at uploading data to the website and seeing what happens. Finally, if you’re still confused, email eichler@mail.nih.gov and we’ll help you get started using LeFE.

Question: What if I don’t want to use Gene Symbols of Affymetrix probe IDs as my gene identifiers?
Answer: LeFE maps genes to categories using gene symbols or Affymetrix probe IDs. If you cannot provide data in one of those formats, then you must create and upload custom categories that match the gene identifiers in your gene expression data file.

Question: What if I don’t have the minimum number of samples to run LeFE?
Answer: If you are running LeFE on a regression problem and have less than 15 samples, than you can consider converting the problem to a classification problem by choosing a threshold by which you can segment your samples into classes based on their regression value.
If you are running LeFE with on a classification problem and you have less than 3 samples, then you should probably try using our lab’s web-based GOMiner application. However, the LeFE runs optimally with the most possible number of samples.

Question: What version of Gene Ontologies does LeFE use?
Answer: LeFE uses the GO and Annotate packages in R and Bioconductor 1.8. The GO annotations were downloaded from Entrez gene on March 15, 2006.

What if my gene expression file has multiple probes for some of the genes?
Answer: Your results will be biased! LeFE will not operate correctly when multiple probes exist for some of the genes. Therefore, it is highly recommend that you de-duplicate the repeated probes prior to running LeFE. The software will not do this for you, so it is your responsibility.

Question: What takes so long for LeFE to run?
Answer: Unless you are unlucky and happen to be submitting jobs to LeFE simultaneously with many other users, you’re probably not waiting in our queues for very long. LeFE is processed on the supercomputing resources at the Advanced Biomedical Computing Center (ABCC) which is using 4 to 6 processors and 5GB of RAM exclusively for your job. It takes 5-30 seconds to apply LeFE to each gene ensemble.

After you’ve run LeFE:

Question: Where are my results? I didn't get an email back from the LeFE website!
Answer: LeFE can sometimes take up to 4 hours to complete a job, especially if we're busy. If after that long you still have not received a response email, one over two events may have occurred. First, it's possible we did sent you an email it was mistaken for junk email by your email client software. We have observed that occasionally emails from our webserver are treated as junk mail by Microsoft Outlook. Second, it is possible that our web application encountered an error and never emailed you. If you don't have an email from us in your junk mail directory, and you haven't heard back from the web server in 4 hours, please resubmit your job or contact us for support. We're sorry for the inconvenience.

Question: Why do I get slightly different results when I run the same dataset through LeFE twice?
Answer: LeFE is a stochastic algorithm that has several random components. We’ve carefully configured LeFE’s internal parameters to minimize the inter-run differences in results. Unfortunately, some small changes in results may still occur. Despite some potential disagreement between different runs of LeFE, technically they both runs are correct, so you may choose which run to report.

Question: How do I interpret my results when an category gets a median p-value of zero?
Answer: The p-value is computed by using permutation t-test with 1000 iterations. Therefore, a median p-value of 0 is technically p<0.001. However, that number is not meant to reflect the statistical significance of the result and it should not be used as such. See the next question and answer for more information.

Question: Can I use the median p-value as a representation of the statistical significance of finding an category?
Answer: No! The median p-value is solely intended to be used for ranking the categories according to their biological association. It is not meant to quantify the statistical significance of the results and is in fact incorrect if used that way. The p-value is computed by using a null distribution that cannot provide an unbiased estimate of the category’s statistical significance.

Question: How does LeFE compute an FDR?
Answer: Running LeFE requires a lot of computing power, as is evident by the long processing times. Therefore, we have developed a fast running version of LeFE that preliminarily runs all gene categories through the algorithm quickly and then refines the statistics on the highest scoring (lowest p-value) categories. This faster version allows for us to run 4 FDR iterations and we find that that method provides suitably stable FDR estimates.

Question: Why do some genes have importance scores less than zero?
Answer: This is one of those quirky behaviors of random forests. It can arise when a gene of little or no importance to the model is permuted and by chance leads to an improvement in model accuracy. Negative gene importance scores should be treated as if they were 0. The magnitude of the negative importance scores has no meaning.

Question: How do I cite LeFE?
Answer: LeFE's citation is as follows: Eichler GS, Reimers M, Kane D, Weinstein JN, "The LeFE algorithm: embracing the complexity of gene expression in the interpretation of microarray data.", Genome Biology, 2007 Sep 10;8(9):R187

Wow, I love LeFE. Who do I have to thank?
Answer: LeFE was designed and developed at the NCI's Genomics and Bioinformatics Group by Gabriel Eichler Ph.D., with the help of Mark Reimers, Ph.D. The website and front end of the LeFEMiner application were created by David Kane and Sohana Chowdhury. The website's back end was built by Gabriel Eichler. All of this work was done under the guidance of the project's PI, John Weinstein, M.D., Ph.D.


LeFE™ is a development of the Genomics and Bioinformatics Group, Laboratory of Molecular Pharmacology (LMP), Center for Cancer Research (CCR), National Cancer Institute (NCI). Please email us with any problems, questions or feedback on the tool.

Notice and Disclaimer