SclcCellMinerCDB enables exploration and analysis of SCLC cancer cell line pharmacogenomic data across different sources. If publishing results based on this site, please cite: Tlemsani.C, Pongor.L, Elloumi.F et al. Cell Rep. 2020 Oct 20;33(3):108296.

CellMinerCDB is an interactive web application that simplifies access and exploration of cancer cell line pharmacogenomic data across different sources. The current version is dedicated to the Small Cell Cancer cell lines (see Metadata section for more details). Navigation in the application is done using main menu tabs (see figure below). It includes 6 tabs: Univariate Analyses, Multivariate Analysis, Metadata, Search, Help and Video tutorial. Univariate Analyses is selected by default when entering the site. Each option includes a side bar menu (to choose input) and a user interface output to display results. Analysis options are available on the top for both the Univariant Analysis and Regression model tabs (see sub-menu on figure). The sub-menu first option result is displayed by default (Figure 1).

Screenshot of CellMinerCDB Application

Figure 1: Main application interface

Univariate Analyses

Molecular and/or drug response patterns across sets of cell lines can be compared to look for possible association. The univariate analysis panel includes 4 options: Plot data, Download Data, Compare Patterns and Tissue Correlation. Almost all options have the same input data in the left side panel.

Input data

  1. The x-axis data choices includes 4 fields to be filled by the user:
    • x-Axis Cell Line Set selects the data source. The user can choose: NCI/DTP SCLC, CCLE, GDSC, CTRP or UTSW (see Data Sources for more details).
    • x-Axis Data Type selects the data type to query. The options for this vary dependent on the source selected above, and appear in the x-Axis Data Type dropdown. See the Metadata tab for descriptions and abbreviations.
    • Identifier selects the identifier of interest for the above selected data type. For instance, if drug activity for the NCI/DTP SCLC is selected, the user can enter a single drug name or drug ID (NSC number) or a paired drug ID (NSC1_NSC2). The Search IDs tab explores potential identifiers interactively, or to download datasets of interest.
    • x-Axis Range allows the user to control the x-axis range for better visualization.

  2. The y-axis data choices are as explained above for the x-axis.

  3. Selected tissues: by default, all tissues are selected and included in the scatter plot. To include or exclude cell lines from specific tissues, the user should specify:
    • Select Tissues to include or exclude specific tissues
    • Select Tissues of Origin Subset/s functionality at the bottom of the left-hand panel. On Macs, more than one tissue of origin may be selected using the “command” button. On PC's use the “control” key. All cell lines were mapped to the four-level OncoTree cancer tissue type hierarchy developed at Memorial Sloan-Kettering Cancer Center. In the CellminerCDB application, a tissue value is coded as an OncoTree node that can include elements from level 1 to level 4 separated by “:” character. For instance, the cell line DMS-79 is a “Lung” cell line but also more specifically it is a Small Cell Lung Cancer one. So DMS-79 belong to different cancer tissue types (or hierarchical nodes) “Lung” (level 1) and “Lung: Small Cell Lung Cancer (SCLC) ” (level 2). There is no further sub-categorization for DMS-79.

  4. Color selection
    • Tissues to Color to locate cell lines related to desired tissues within the scatter plot. By default, the cell lines are colored by their OncoTree cancer tissue level 1 pre-assigned color. Selecting a tissue makes related cell lines appear in red while remaining cell lines are colored in blue. The Show Color checkbox should be active.

Plot Data

Any pair of features from different sources across common cell lines can be plotted (as a scatterplot) including the resultant Pearson correlation and p-value. The p-value estimates assume multivariate normal data, and are less reliable as the data deviate from this. Please use the scatter plot to check the data distribution (e.g., for outlying points outside of a more elliptically concentrated set).

Some options are available to play with the plot image using icons on the top from left to right:

icon Downloads the plot as a png.
icon Allows the user to zoom in on an area of interest by clicking and dragging with the pointer.
icon Autoscales the image.
icon Allows the user to create horizontal and vertical line from either a cell line dot or the regression line, by hovering over them.

Screenshot of CellMinerCDB Application

Figure 2: An example scatterplot of SLFN11 gene expression (x-axis) versus Topotecan drug activity (y-axis)/ both from the NCI/DTP SCLC. Since Topotecan has 2 different drug ids in the NCI/DTP SCLC, the one with the lowest number of missing data is selected (here 609699). However, the user can type in their specific drug ID of interest. The Pearson correlation value and p value appear at the top of the plot. A linear fitting curve is included. This is an interactive plot and whenever the user changes any input value, the plot will be updated. Any point in the plot can be hovered over to provide additional information about cell line, tissue, Onco tree designation, and x and y coordinate values. Here we colored the points (cell lines) according to their NAPY status.

View Data

This option both displays the data selected from the Plot Data tab in tabular form, and provides a Download selected x and y axis data as Tab-Delimited File option. The user can change the input data in the left selection panel as described for Plot Data. The displayed table include the cell line, the x-axis value, the y-axis value, the tissue of origin, the 4 onco-tree levelsm the NAPY status and Neuro-Endocrine score. Within the header the selected features are prefixed by the data type abbreviation and post-fixed by the data source.

Screenshot of CellMinerCDB Application

Figure 3: Shows the selected values for SLFN11 gene expression (x-axis) and Topotecan (id 609699) drug activity (y-axis) from the NCI/DTP SCLC across all common lines. The features are coded as expSLFN11_nciSclc and act609699_nciSclc where “exp” and “act” represent respectively prefixes for gene expression based on z-score and drug activity.

Compare Patterns

This option allows one to compute the correlation between the selected feature as defined from the specified Cell Line Set, Data Type, and Identifier from either the x or y-axis selections, and either all drug or all molecular data from the same source. The user has the option (with the button “Cross cell line sets”) to compare to all drug or molecular data from the other source.

Pearson’s correlations are provided, with reported p-values (not adjusted for multiple comparisons) in tabular form. This displays features are organized by level of correlation, and includes target pathway for genes and mechanism of action (MOA) for drugs (if available).