GBG logo LeFE Build:8 Genomics and Bioinformatics Group

Signature File Format

The Signature file is used in Step 1 of the LeFE web interface. The signature file is extremely simple.

    This file is should contain one line indicating the class or continuos values associated with each sample. Each value's annotation is alphanumeric seperated by a space. The samples annotations are assumed to be in the same order as the columns of gene expression data uploaded in step 3.
    • A continuous valued example is: 1.2 1.4 1.5 2.0 2.1 1.9 1.3 1.4 6.2 2.4
    • A two class example is: Mut Wt Wt Wt Wt Wt Mut Mut Mut Mut Mut Mut Wt Mut
    • A three class example is: A B A C C A B C C C A A C C B A B A B C C
    Note: Quotes should not be used and sample classes must be consistent across the class in order for the file to be interpreted correctly.

A sample signature file is: p53.cls

Gene Expression File Format

The Gene Expression File is used in Step 3 of the LeFE web interface. It is a tab-delimited file format that contains gene expression values:

  1. The first line starts with the word "SAMPLES" and then the contains a tab delimited list of sample names. The first line is ignored by the web application, but may help users ensure proper ordering of the columns.
    • Line format: SAMPLES(tab)(sample 1 name)(tab)(sample 1 name)(tab)(sample 3 name)(tab)(sample 3 name)...(tab)(sample N name)
  • The remainder of the data file contains data for each gene. There is one line for each gene and one column for each of the samples. The gene identifiers must be either Affy hgu133A/hgu95A probe IDs or gene symbols. The gene identifiers types are later set accordingly in step 4 of the web application.
    • Line format: (gene name)(tab)(col 1 data)(tab)(col 2 data)(tab) ... (col N data)
    • For example: ABCB1 -104 -152 -158 ... -44
  • Note: Expert users may choose to upload a gene expression matrix with alternative gene identifiers. However, they then must specify the 'Custom Categories' feature in step 5. The file specifying the custom categories then must also be defined on the same set of customer gene identifiers.

    Sample Gene Expression file: p53.gex

    Custom Categories File Format

    The Custom Categories File format is optionally used in Step 5 of the LeFE web application.The file is tab-delimited and is used to describe customized sets of genes.

    The gene identifiers in the custom categories must match the gene identifiers in your gene expression file


    The file contains a row for each gene set:
    • Line format: (gene set name)(tab)(gene 1)(tab)(gene 2)(tab) ... (gene N)
      For example: cell_cycle_arrest EIF4G2 GAS7 CUL5 MAP2K6 CUL4A ... IFNW1
    • The first column contains the gene set name. Duplicate names are not allowed.
    • The remaining columns list the genes in the gene set.

    Sample Custom Categories file: export_gnf.GENE_SYMBOL.gmt

    The MSigDB .gmt files available here can be uploaded as custom categories as long as the gene expression data set uploaded in step 2 is annotated with gene symbols.



    LeFE™ is a development of the Genomics and Bioinformatics Group, Laboratory of Molecular Pharmacology (LMP), Center for Cancer Research (CCR), National Cancer Institute (NCI). Please email us with any problems, questions or feedback on the tool.

    Notice and Disclaimer