CIMMiner Help

Basic Steps

Prepare Dataset (One Matrix)

The input file must be in Tab delimited text file format (saved with a .txt extension).
Example below:

File Name: mydata.txt

  column 1column 2column 3
row 1-245
row 2na21
row 346-

The file includes data, row name and column name. The first column should be row names. And the first row should be column names. The value in the cell in the first row and first column (left blank in the above example) will be ignored.

Missing values are accepted and should be indicated by an 'na' (row 2/column 1 in the above example) or a hyphen (row 3/column 3). Empty cells will also be accepted

We use a period (full stop) as a decimal point. Using a comma will result in errors, since we use commas as list separators. For the same reason, please do not use commas to separate digits in large numbers. For example, numbers should be written as "123456.78" not "123456,78" or "123,456.78" or "123,456,78".

With fewer than 3 rows or columns the clustering algorithm cannot provide any useful information.

Prepare Dataset (Two Matrix)

Two matrices are used as input in this case, one NxP and another PxM. From these, a third matrix (the product matrix) of size NxM is created where element (i,j) is the correlation between the ith row of the first the jth column of the second matrix. A CIM for the product matrix is created which is colored according to the elements in the various rows and columns.

The rows and/or columns of the product matrix can be clustered to bring out patterns, but here the clustering is done based on the rows of the first input matrix and the columns of the second input matrix. The rows are reordered by clustering the rows of the first input matrix and this reordering is used for the product matrix. Similarly the clustering of the columns of the second input matrix gives the reordering of the columns of the product matrix.

The Two Matrix algorithm takes two files as input, the row data file and the column data file. The number of columns in the first data file should be the same number and order as the number of rows of the second data file. If the first data file has M rows and P columns, the number of rows of the second data file must be P. See example below where the columns in first data file are of the same number (3) and order as the rows of second data file.

First data file

  column 1column 2column 3
row 1-245
row 2na21
row 346-
row 45.33.4-2.3

Second data file

  column 1column 2column 3column 4column 5
row 1-1.23.85.14.65.6
row 2-3.4na6.71.42.6
row 3-4.33.4-3.95

The input files must be in Tab delimited text file format (saved with a .txt extension). The files include data, row names and column names. The first column should be row names and the first row should be column name. The value in the cell in the first row and first column (left blank in the above example) will be ignored.

The file includes data, row name and column name. The first column should be row names. And the first row should be column names. The value in the cell in the first row and first column (left blank in the above example) will be ignored.

Missing values are accepted and should be indicated by an 'na' (row 2/column 1 in the above example) or a hyphen (row 3/column 3). Empty cells will also be accepted

We use a period (full stop) as a decimal point. Using a comma will result in errors, since we use commas as list separators. For the same reason, please do not use commas to separate digits in large numbers. For example, numbers should be written as "123456.78" not "123456,78" or "123,456.78" or "123,456,78".

With fewer than 3 rows or columns the clustering algorithm cannot provide any useful information.

Default values

For ease of user entry, the system has selected the most common user choices as the default values for the order choice, distance method, cluster algorithm and binning algorithm. The default is to cluster both rows and columns using the Euclidian distance method and the average linkage cluster algorithm. The default binning method is equal width. To change these options, the user clicks on the advanced options radio button and these options will appear. The various choices are explained in more detail below.

Selecting appropriate options

Selecting one of the order choices will determine the order the output apprears. If you want like data to be grouped, then choose "Cluster". For the computer to randomly order your data then choose "Randomize". To have your results appear in the order specified in your original file, select "No cluster". You must specify the order for each axis.

If you select cluster as order choice, you have to select a cluster algorithm and a distance method. Otherwise, skip this section.

The distance method quantifies the measure of dissimilarity between two data vectors.

The cluster algorithm specifies the linkage method used by the hierarchical clustering algorithm to determine the distance between cluster groups.

The distance method and cluster algorithm can be chosen separately for each axis.

The binning method is used to specify the method to map the data values to colors for displaying the CIM

Understanding the result

The result has four frames. The left frame contains a list of the X axis elements, in the order that they appear on the X axis of the image (from left to right). The right frame contains a list of the Y axis elements, in the order that they appear on the Y axis of the image (from top to bottom). These two frames also contain links to display a separate image of the merge height plot. The main frame, in the middle, contains your input file name(s), a link to download a data file containing the raw data used to create the image, the image itself and a row of buttons that allow you select various ways to update your image. The image is a gif file.

You may reformat the image by clicking the button "Color", "Binning", "Zoom", "Axes", and "Page Layout" which will open up a new section of the page where you can select choices that relate to the category specified.