|
|||||||||||||
How LeFE WorksLeFE is iteratively applied to each gene category. This picture depicts a toy example to demonstrate how LeFE would score the strength of association between a set of five 5 microarray experiments and a single category with two genes. In reality, the experiment would contain more microarrays and this process would be repeated for every category.
C: A subset of genes not in the currently analyzed category is selected to serve as negative control genes. The number of negative control genes selected is proportional to the number of genes in the category. D: The vector of signature values (orange) and a composite matrix, consisting of the category’s genes and the negative control genes, are input into a random forest machine learning algorithm. E: The random forest is trained to learn the signature vector assesses the importance of each gene to its trained model. The random forest’s multivariate models consider the genes importance within its biological context. F: The result of training the random forest is a set of gene importance scores, one for every gene input into the random forest. G: A non-parametric permutation t-test is used to determine if the genes in the category were deemed more important to the random forest models than the negative control genes. In order to ensure convergence of the algorithm, the steps C through G are repeated multiple times. H: The result of the process, run on a single gene category, is (i) the median importance scores of every gene in the category, (ii) the category’s median permutation t-test p-value and (iii) an importance plot that compares the distribution of importance scores for the genes in the category and all of the different negative control genes. Last Updated: August 16, 2007 |
|||||||||||||
|
LeFE™ is a development of the Genomics and Bioinformatics Group, Laboratory of Molecular Pharmacology (LMP), Center for Cancer Research (CCR), National Cancer Institute (NCI). Please email us with any problems, questions or feedback on the tool. |