CellMinerCDB is an interactive web application that simplifies access and exploration of cancer cell line pharmacogenomic data across different sources. The current version is dedicated to the Small Cell Cancer cell lines (see Metadata section for more details). Navigation in the application is done using main menu tabs (see figure below). It includes 6 tabs: Univariate Analyses, Multivariate Analysis, Metadata, Search, Help and Video tutorial. Univariate Analyses is selected by default when entering the site. Each option includes a side bar menu (to choose input) and a user interface output to display results. Analysis options are available on the top for both the Univariant Analysis and Regression model tabs (see sub-menu on figure). The sub-menu first option result is displayed by default (Figure 1).
Figure 1: Main application interface
Molecular and/or drug response patterns across sets of cell lines can be compared to look for possible association. The univariate analysis panel includes 4 options: Plot data, Download Data, Compare Patterns and Tissue Correlation. Almost all options have the same input data in the left side panel.
Any pair of features from different sources across common cell lines can be plotted (as a scatterplot) including the resultant Pearson correlation and p-value. The p-value estimates assume multivariate normal data, and are less reliable as the data deviate from this. Please use the scatter plot to check the data distribution (e.g., for outlying points outside of a more elliptically concentrated set).
Some options are available to play with the plot image using icons on the top from left to right:
Downloads the plot as a png. | |
Allows the user to zoom in on an area of interest by clicking and dragging with the pointer. | |
Autoscales the image. | |
Allows the user to create horizontal and vertical line from either a cell line dot or the regression line, by hovering over them. |
Figure 2: An example scatterplot of SLFN11 gene expression (x-axis) versus Topotecan drug activity (y-axis)/ both from the NCI/DTP SCLC. Since Topotecan has 2 different drug ids in the NCI/DTP SCLC, the one with the lowest number of missing data is selected (here 609699). However, the user can type in their specific drug ID of interest. The Pearson correlation value and p value appear at the top of the plot. A linear fitting curve is included. This is an interactive plot and whenever the user changes any input value, the plot will be updated. Any point in the plot can be hovered over to provide additional information about cell line, tissue, Onco tree designation, and x and y coordinate values. Here we colored the points (cell lines) according to their NAPY status.
This option both displays the data selected from the Plot Data tab in tabular form, and provides a Download selected x and y axis data as Tab-Delimited File option. The user can change the input data in the left selection panel as described for Plot Data. The displayed table include the cell line, the x-axis value, the y-axis value, the tissue of origin, the 4 onco-tree levelsm the NAPY status and Neuro-Endocrine score. Within the header the selected features are prefixed by the data type abbreviation and post-fixed by the data source.
Figure 3: Shows the selected values for SLFN11 gene expression (x-axis) and Topotecan (id 609699) drug activity (y-axis) from the NCI/DTP SCLC across all common lines. The features are coded as expSLFN11_nciSclc and act609699_nciSclc where “exp” and “act” represent respectively prefixes for gene expression based on z-score and drug activity.
This option allows one to compute the correlation between the selected feature as defined from the specified Cell Line Set, Data Type, and Identifier from either the x or y-axis selections, and either all drug or all molecular data from the same source. The user has the option (with the button “Cross cell line sets”) to compare to all drug or molecular data from the other source.
Pearson’s correlations are provided, with reported p-values (not adjusted for multiple comparisons) in tabular form. This displays features are organized by level of correlation, and includes target pathway for genes and mechanism of action (MOA) for drugs (if available).