Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock () or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

SclcCellMinerCDB




SclcCellMinerCDB enables exploration and analysis of SCLC cancer cell line pharmacogenomic data across different sources. If publishing results based on this site, please cite: Tlemsani C, Pongor L, Elloumi F et al. Cell Reports. 2020 Oct 20.














Download Data

Download Footnotes





Download current cell line set information Download cell lines annotation

Download drug synonyms table with matching IDs for all cell line sets Download Table





Table of Contents

Introduction

CellMinerCDB is an interactive web application that simplifies access and exploration of cancer cell line pharmacogenomic data across different sources. The current version is dedicated to the Small Cell Cancer cell lines (see Metadata section for more details). Navigation in the application is done using main menu tabs (see figure below). It includes 6 tabs: Univariate Analyses, Multivariate Analysis, Metadata, Search, Help and Video tutorial. Univariate Analyses is selected by default when entering the site. Each option includes a side bar menu (to choose input) and a user interface output to display results. Analysis options are available on the top for both the Univariant Analysis and Regression model tabs (see sub-menu on figure). The sub-menu first option result is displayed by default (Figure 1).

Screenshot of CellMinerCDB Application

Figure 1: Main application interface

Univariate Analyses

Molecular and/or drug response patterns across sets of cell lines can be compared to look for possible association. The univariate analysis panel includes 4 options: Plot data, Download Data, Compare Patterns and Tissue Correlation. Almost all options have the same input data in the left side panel.

Input data

  1. The x-axis data choices includes 4 fields to be filled by the user:
    • x-Axis Cell Line Set selects the data source. The user can choose: NCI/DTP SCLC, CCLE, GDSC, CTRP or UTSW (see Data Sources for more details).
    • x-Axis Data Type selects the data type to query. The options for this vary dependent on the source selected above, and appear in the x-Axis Data Type dropdown. See the Metadata tab for descriptions and abbreviations.
    • Identifier selects the identifier of interest for the above selected data type. For instance, if drug activity for the NCI/DTP SCLC is selected, the user can enter a single drug name or drug ID (NSC number) or a paired drug ID (NSC1_NSC2). The Search IDs tab explores potential identifiers interactively, or to download datasets of interest.
    • x-Axis Range allows the user to control the x-axis range for better visualization.

  2. The y-axis data choices are as explained above for the x-axis.

  3. Selected tissues: by default, all tissues are selected and included in the scatter plot. To include or exclude cell lines from specific tissues, the user should specify:
    • Select Tissues to include or exclude specific tissues
    • Select Tissues of Origin Subset/s functionality at the bottom of the left-hand panel. On Macs, more than one tissue of origin may be selected using the “command” button. On PC's use the “control” key. All cell lines were mapped to the four-level OncoTree cancer tissue type hierarchy developed at Memorial Sloan-Kettering Cancer Center. In the CellminerCDB application, a tissue value is coded as an OncoTree node that can include elements from level 1 to level 4 separated by “:” character. For instance, the cell line DMS-79 is a “Lung” cell line but also more specifically it is a Small Cell Lung Cancer one. So DMS-79 belong to different cancer tissue types (or hierarchical nodes) “Lung” (level 1) and “Lung: Small Cell Lung Cancer (SCLC) ” (level 2). There is no further sub-categorization for DMS-79.

  4. Color selection
    • Tissues to Color to locate cell lines related to desired tissues within the scatter plot. By default, the cell lines are colored by their OncoTree cancer tissue level 1 pre-assigned color. Selecting a tissue makes related cell lines appear in red while remaining cell lines are colored in blue. The Show Color checkbox should be active.

Plot Data

Any pair of features from different sources across common cell lines can be plotted (as a scatterplot) including the resultant Pearson correlation and p-value. The p-value estimates assume multivariate normal data, and are less reliable as the data deviate from this. Please use the scatter plot to check the data distribution (e.g., for outlying points outside of a more elliptically concentrated set).

Some options are available to play with the plot image using icons on the top from left to right:

icon Downloads the plot as a png.
icon Allows the user to zoom in on an area of interest by clicking and dragging with the pointer.
icon Autoscales the image.
icon Allows the user to create horizontal and vertical line from either a cell line dot or the regression line, by hovering over them.

Screenshot of CellMinerCDB Application

Figure 2: An example scatterplot of SLFN11 gene expression (x-axis) versus Topotecan drug activity (y-axis)/ both from the NCI/DTP SCLC. Since Topotecan has 2 different drug ids in the NCI/DTP SCLC, the one with the lowest number of missing data is selected (here 609699). However, the user can type in their specific drug ID of interest. The Pearson correlation value and p value appear at the top of the plot. A linear fitting curve is included. This is an interactive plot and whenever the user changes any input value, the plot will be updated. Any point in the plot can be hovered over to provide additional information about cell line, tissue, Onco tree designation, and x and y coordinate values. Here we colored the points (cell lines) according to their NAPY status.

View Data

This option both displays the data selected from the Plot Data tab in tabular form, and provides a Download selected x and y axis data as Tab-Delimited File option. The user can change the input data in the left selection panel as described for Plot Data. The displayed table include the cell line, the x-axis value, the y-axis value, the tissue of origin, the 4 onco-tree levelsm the NAPY status and Neuro-Endocrine score. Within the header the selected features are prefixed by the data type abbreviation and post-fixed by the data source.

Screenshot of CellMinerCDB Application

Figure 3: Shows the selected values for SLFN11 gene expression (x-axis) and Topotecan (id 609699) drug activity (y-axis) from the NCI/DTP SCLC across all common lines. The features are coded as expSLFN11_nciSclc and act609699_nciSclc where “exp” and “act” represent respectively prefixes for gene expression based on z-score and drug activity.

Compare Patterns

This option allows one to compute the correlation between the selected feature as defined from the specified Cell Line Set, Data Type, and Identifier from either the x or y-axis selections, and either all drug or all molecular data from the same source. The user has the option (with the button “Cross cell line sets”) to compare to all drug or molecular data from the other source.

Pearson’s correlations are provided, with reported p-values (not adjusted for multiple comparisons) in tabular form. This displays features are organized by level of correlation, and includes target pathway for genes and mechanism of action (MOA) for drugs (if available).

Screenshot of CellMinerCDB Application

Figure 4: Shows correlation results for SLFN11 gene with all other molecular features for all NCI/DTP SCLC datasets sorted by correlation value with gene location and target pathways (annotation field).

Tissue Correlation

This option enables to display per tissue of origin (oncotype level 1) the number of cell lines with complete observations (non missing values), the correlation between the selected paired features and its p-value.

Screenshot of CellMinerCDB Application

Figure 5: Shows the correlation between the selected values for SLFN11 gene expression (x-axis) and Topotecan (id 609699) drug activity (y-axis) from the NCI/DTP SCLC across all common lines by tissue of origin. Note: The value “ALL” means all available common tissues between the 2 selected features.

Multivariate Analysis

The ‘Multivariate Analysis’ option (or module) has multiple tabs including Heatmap, Data, Plot, Cross-Validation, Tehnical Details and Partial Correlation (described below), and allows construction and assessment of multivariate linear response prediction models within a single cell line set. For instance, we can assess prediction of a drug activity based on some genes expression. To construct a regression model, you need to specify the input data in the left side panel.

Input data

  1. Cell Line Set selects the data source. The user can choose: NCI/DTP SCLC, CCLE, GDSC, CTRP or UTSW (see Data Sources for more details).

  2. The response variable by selecting
    • Response Data Type (example: a drug or a molecular dataset). The options for this vary dependent on the source selected above, and appear in the Response Data Type dropdown. See the Metadata tab for data types description.
    • Response Identifier (e.g., a specific drug or gene identifier)

  3. The predictor variables from the same data source by selecting
    • Predictor Data Type/s (as explained in response data type). Use command button on Macs or control key on PCs to select more than one dataset.
    • Minimal Range Value provides a required minimum for the identifier to be included for the first listed data type. The default is 0. One may increase this value to eliminate predictors that are considered to have insufficient range to be biologically meaningful.
    • predictor identifiers from the selected data types are required for the Linear Regression algorithm. In figure 5, we explore linear model prediction of Topotecan drug activity in the NCI/DTP SCLC choosing SLFN11 and BTPF gene expression. Identifiers from different sources may be combined using 2 methods. In the first, select multiple Data Types as desired, and enter your identifiers. The model will be built automatically using those Data Types and Identifiers. For example, if expression and mutation are selected as Data Types and SLFN11 and BPTF are entered as identifiers, the model will be built using 4 identifiers: expSLFN11, expBTPF, mutSLFN11 and mutBTPF. In the second, more specific approach, you enter the identifier with the data type prefix. For example, if your predictor variables are specifically the expression value for SLFN11 and mutation value for BTPF then you can enter as identifiers: expSLFN11 and mutBTPF. Predictors are optional for the Lasso algorithm (see point 5) since it identifies automatically the ones that best fit the Lasso model.

  4. Selected tissues: by default, all cell lines are included however you can selected some based-on tissue
    • Select Tissues to include or exclude specific tissues
    • Select Tissues of Origin Subset/s : by default, all tissues are selected and included. The user may also select specific tissues (to include or exclude). On Macs, more than one tissue of origin may be selected using the “command” key. On PC's use the “control” key.

  5. Algorithm: by default, the basic linear regression model is selected however you can select the Lasso model (penalized linear regression model). If Lasso algorithm is selected, you have to specify:
    • Select Gene Sets: The gene selection is based on curated gene sets such as DNA Damage Repair DDR or Apoptosis. The user can select one or more gene sets.
    • Maximum Number of Predictors (default 4)

Once all the above information is entered, a regression model is built and the results are shown in different ways such as the technical details of the model, observed vs. predictive responses plots or variables heatmap. Find below an explanation of different output for the regression model module.

Heatmap

This option provides the observed response and predictor variables across all source cell lines as an interactive heatmap. For the heatmap visualization, data are range standardized (subtract the minimum, and divide by the range) to values between 0 and 1, based on the value range within all rows of a given data type (by default) or within each row of data (if ‘Use Row Color Scale’ is selected). For data types other than mutation data, the range is trimmed to the difference between the 95th and 5th percentiles; values below or above the 5th and 95th percentile values are scaled to 0 and 1, respectively. In the case of mutation data, the range used for scaling is the difference between the maximum and minimum values. If the values within a data type (or data row if ‘Use Row Color Scale’ is selected) are constant, the scaled value for heatmap visualization is set to 0.5.

The user can restrict the number of cell lines to those that have the highest or lowest response values by selecting Number of High/Low Response Lines to Display. The user can download the heatmap related data by clicking on Download Heatmap Data.

Screenshot of CellMinerCDB Application

Figure 6: An example heatmap where we selected topotecan as a response variable and SLFLN11 and BPTF gene expression as predictor variables. In this example, we chose to display only 60 cell lines that have the most 30 highest and 30 lowest values for topotecan activity.

In case, the Lasso algorithm is selected more predicted variables are shown based on model result as shown below (UBE2B, PAK6 and CD4 new genes added)

Screenshot of CellMinerCDB Application

Figure 7: Same example as previous figure with the lasso algorithm

Data

This option shows the detailed data for the model variables for each cell line. Both the 10-fold cross validation (CV) as well as the predicted responses are given. The data is displayed as a table with filtering options for each column.

Screenshot of CellMinerCDB Application

Figure 8: Data related to the simple linear regression model presented in the previous section.

Plot

This option enables one to plot and compare the observed response values (y-axis) versus the predicted response values (x-axis). The predicted response values are derived from a linear regression model fit to the full data set.

Screenshot of CellMinerCDB Application

Figure 9: Plot comparing Topotecan observed vs. predicted activity with high correlation value of 0.53

Cross-Validation

This option enables to plot the observed response values (y-axis) versus the 10-fold cross-validation predicted response values (x-axis). With this approach, the predicted response values are obtained (over 10 iterations) by successively holding out 10% of the cell lines and predicting their response using a linear regression model fit to the remaining 90% of the data. Cross-validation is widely used in statistics to assess model generalization to independent data – with the caveat that the independent data must still share the same essential structure (i.e., probability distribution) as the training data. It can also indicate possible overfitting of the training data, such as when the observed versus full data set model-predicted correlation (shown in ‘Plot’) is substantially better than the observed versus cross-validation predicted correlation (shown in ‘Cross-Validation’).

Screenshot of CellMinerCDB Application

Figure 10: Plot comparing Topotecan observed vs. cross-validation predicted activity with still high correlation value of 0.48

Technical Details

This option enables the user to view the R statistical and other technical details related to the predicted response model. To save, these results may be copied and pasted into the document or spreadsheet of your choice.

Screenshot of CellMinerCDB Application

Figure 11: Example of regular regression model fitting results

Partial correlations

This function is used to identify additional predictive variables for a multivariate linear model. Conceptually, the aim is to identify additional predictive variables that are independently correlated with the response variable, after accounting for the influence of the existing predictor set. Computationally, a linear model is fit, with respect to the existing predictor set, for both the response variable and each candidate predictor variable. The partial correlation is then computed as the Pearson’s correlation between the resulting pairs of model residual vectors (which capture the variation not explained by the existing predictor set). The p-values reported for the correlation and linear modeling analyses assume multivariate normal data. The two-variable plot feature of CellMinerCDB allows informal assessment of this assumption, with clear indication of outlying observations. The reported p-values are less reliable as the data deviate from multivariate normality.

In order to run a partial correlation analysis, the user should first construct a linear model (providing response and predictor variables as explained earlier - steps 1 to 5 in figure below-) and then:

  • Select Gene Sets: The gene selection is based on curated gene sets. Here the user can select one or more gene sets and even all genes (step 6 in figure below)
  • Select Data types: the user can select one or more data type such as gene expression, methylation or copy number variation (step 7 in figure below)
  • optionally, specify the Minimum Range for the first listed data type (step 8 in figure below)
  • And finally click on button run (step 9 in figure below).

Screenshot of CellMinerCDB Application

Figure 12: An example of partial correlation results for selected gene expression data using all gene sets.

Metadata

This option enumerates for each cell line set, the available data types that could be queried within the app providing the data type abbreviation or prefix, description, feature value unit (z-score, intensity, probability …), platform or experiment and related publication reference (pubmed). First the user should specify the Cell Line Set or data source to view all available associated data types. Then he can download data via: Select Data Type to Download and then click on Download Data type and/or Download Data Footnotes to download any data or footnotes for the selected cell line set. Finally the user has the option to Download current cell line set information and Download drug synonyms table with matching IDs for all cell line sets by clicking respectively on Download cell line annotation and Download table.

Screenshot of CellMinerCDB Application

Figure 13: Shows all data types for NCI/DTP SCLC

This page lists the identifiers (ID) available in the selected data source for use in the univariate analysis or Multivariate Analysis. The user chooses:

  • Cell Line Set selects the data source. The user can choose: NCI/DTP SCLC, CCLE, GDSC, CTRP or UTSW (see Data Sources for more details).
  • Select Data Type selects the data type to query. The options for this vary dependent on the source selected above, and appear in the x-Axis Data Type dropdown. See the Metadata tab for descriptions and abbreviations.

This enables to search all related ID for each combination. For the molecular data, the gene names (ID) and specific data type information are provided. For the drugs and compounds, the identifiers (ID), Drug name (when available), and Drug MOA (when available) are displayed. The user can scroll down the whole list of IDs, or search specific ID(s) by entering a value in the header of any column.

Drug IDs

For the NCI/DTP SCLC, the drug identifiers (ID) are NSC's or names. For the CCLE, GTRP, and CTRP, the drug identifiers are the Drug names.

Screenshot of CellMinerCDB Application

Figure 14: Example of a search: if looking for a drug ID in the NCI/DTP SCLC select “NCI/DTP SCLC” as the cell line source and select “Drug Activity” as the data type. You can type in search box of column “Drug name” or “MOA”.

Gene IDs

For all data sources, the gene ID is the Hugo gene symbol however the application also recognizes any synonym or previous symbol (alias) that is included in the Hugo database.

Screenshot of CellMinerCDB Application

Figure 15: Example of a search: if looking for a gene ID in the NCI/DTP SCLC select “NCI/DTP SCLC” as the cell line source and select “gene expression” as the data type. You can type in search box of column “gene name” or “entrez gene id” or “Chromosome”…

Multiple selection

In order to select multiple choice from a list, use “command” button for Mac or “alt” button for PC and then click

X-axis or Y-axis range

You can change the x-axis or y-axis lower or higher value to have different views of the displayed plot.

Show color

It is a checkbox that enable and disable colors in the scatter plots

Exploratory workflow

Mutilple data analysis workflows may be used dependent of the question being asked. A typical workflow:

  1. Check the relationship between two variables [2D plot]. Example: SLFN11 transcript expression and topotecan drug activity.
  2. Examine what else might be associated with either the x-axis or y-axis variable [Pattern Comparison]. Example: considering potential biological affects, TGFBR3 (an apoptosis factor) and BPTF (a chromatin factor) transcript expression might be considered candidates for affecting topotecan activity.
  3. Upon finding two or more associations with single 'response' variable through [Pattern Comparison/2D Plot], check if they complement one another in a multivariate model [Regression Models]. Example: Starting with the dominant SLFN11, adding TGFBR3 does not add to the regression model, but BPTF does.
  4. Repeat the above steps as needed.

Methods

  • Linear regression

    Basic linear regression models are implemented using the R stats package lm() function

  • Lasso Model

    Lasso (penalized linear regression models) are implemented using the glmnet R package. The lasso performs both variable selection and linear model coefficient fitting. The lasso lambda parameter controls the tradeoff between model fit and variable set size. Lambda is set to the value giving the minimum error with 10-fold cross-validation. For either standard linear regression or LASSO models, 10-fold cross validation is applied to fit model coefficients and predict response, while withholding portions of the data to better estimate robustness.

Contact/Feedback

Please send comments and feedback to

  • fathi.elloumi AT nih.gov
  • aluna AT jimmy.harvard.edu
  • vinodh.rajapakse AT nih.gov

Data Sources

CellMinerCDB for Small Cell Cancer Cell (SCLC) integrates data from the following sources, which provide additional data and specialized analyses.

Screenshot of CellMinerCDB Application Figure 16: Summary of Molecular Drug Activity Data for the nine data sources currently included in SCLC CellMinerCDB. For molecular data types, the numbers indicate the number of genes with a particular data type. Gene level mutation and methylation were computed using specific scripts described in cellminercdb paper (see reference below). A Grey tab indicates that is no data available. Numbers highlighted in blue indicates change of number of features compare to the previous release version whereas numbers highlighted in red indicates new features.

Screenshot of CellMinerCDB Application Figure 17: Cell line overlaps between data sources.

About the Data

For specific information about the data made available for particular sources, please refer to the 'Metadata' navbar tab.

Drug mechanism of action details:

Gene sets used for annotation of analysis results or algorithm input filtering were curated by the NCI/DTB CellMiner team, based on surveys of the applicable research literature.

About SclcCellMinerCDB

The SclcCellMinerCDB application is developed and maintained using R and Shiny by:

  • Fathi Elloumi; Staff Scientist, Developmental Therapeutics Branch, National Cancer Institute
  • Augustin Luna; Research Fellow, Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard Medical School
  • Vinodh N. Rajapakse; Postdoctoral Fellow, Developmental Therapeutics Branch, National Cancer Institute

NCI-DTB Genomics and Bioinformatics Group

  • William C. Reinhold
  • Sudhir Varma
  • Margot Sunshine
  • Fathi Elloumi
  • Lisa Loman (Special Volunteer)
  • Fabricio G. Sousa
  • Kurt W. Kohn
  • Yves Pommier

Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard Medical School

  • Chris Sander

MSKCC Computational Biology

  • Jianjiong Gao
  • Nikolaus Schultz

References

Shankavaram UT, Varma S, Kane D, Sunshine M, Chary KK, Reinhold WC, Pommier Y, Weinstein JN. CellMiner: a relational database and query tool for the NCI-60 cancer cell lines. BMC Genomics. 2009 Jun 23;10:277. doi: 10.1186/1471-2164-10-277.

Reinhold WC, Sunshine M, Liu H, Varma S, Kohn KW, Morris J, Doroshow J, Pommier Y. CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set. Cancer Res. 2012 Jul 15;72(14):3499-511. doi: 10.1158/0008-5472.CAN-12-1370.

Reinhold WC, Sunshine M, Varma S, Doroshow JH, Pommier Y. Using CellMiner 1.6 for Systems Pharmacology and Genomic Analysis of the NCI-60. Clin Cancer Res. 2015 Sep 1;21(17):3841-52. doi: 10.1158/1078-0432.CCR-15-0335. Epub 2015 Jun 5.

Luna A, Rajapakse VN, Sousa FG, Gao J, Schultz N, Varma S, Reinhold W, Sander C, Pommier Y. rcellminer: exploring molecular profiles and drug response of the NCI-60 cell lines in R. Bioinformatics. 2015 Dec 3. pii: btv701.

Rajapakse VN, Luna A, Yamade M, Loman L, Varma S, Sunshine M, Iorio F, Elloumi F, Aladjem MI, Thomas A, Sander C, Kohn KW, Benes CH, Garnett M, Reinhold WC, Pommier Y. CellMinerCDB for Integrative Cross-Database Genomics and Pharmacogenomics Analyses of Cancer Cell Lines. iScience, Cell Press. 2018 Dec 12.

Reinhold WC, Varma S, Sunshine M, Elloumi F, Ofori-Atta K, Lee S, Trepel JB, Meltzer PS, Doroshow JH, Pommier Y. RNA sequencing of the NCI-60: Integration into CellMiner and CellMiner CDB. Cancer Res. 2019 May 21. pii: canres.2047.2018. doi: 10.1158/0008-5472.CAN-18-2047.

Tlemsani C, Pongor L, Elloumi F, Girard L, Huffman KE, Roper N, Varma S, Luna A, Rajapakse VN, Sebastian R, Kohn KW, Krushkal J, Aladjem MI, Teicher BA, Meltzer PS, Reinhold WC, Minna JD, Thomas A, Pommier Y. SCLC-CellMiner: A Resource for Small Cell Lung Cancer Cell Line Genomics and Pharmacology Based on Genomic Signatures. Cell Rep. 2020 Oct 20;33(3):108296. doi: 10.1016/j.celrep.2020.108296. PMID: 33086069; PMCID: PMC7643325.

Pongor LS, Tlemsani C, Elloumi F, Arakawa Y, Jo U, Gross JM, Mosavarpour S, Varma S, Kollipara RK, Roper N, Teicher BA, Aladjem MI, Reinhold W, Thomas A, Minna JD, Johnson JE, Pommier Y. Integrative epigenomic analyses of small cell lung cancer cells demonstrates the clinical translational relevance of gene body methylation. iScience. 2022 Oct 12;25(11):105338. doi: 10.1016/j.isci.2022.105338. PMID: 36325065; PMCID: PMC9619308.

Related links

Introduction to CellMinerCDB

Release notes

Version 1.2

December 2022:

  • New GUI interface
  • New search cell line within univariate scatter plot
  • Updated MD Anderson and GDSC data with new SCLC cell lines (see help section data for details)

Version 1.1

March 2022: [pubmed: 36325065]

  • New NCI SCLC body methylation dataset
  • New NCI SCLC protein data based on Western Blot
  • New SCLC RPPA protein data from MD Anderson
  • New CCLE SCLC protein data
  • New CCLE SCLC metabolites data
  • New CCLE SCLC RBBS methylation data
  • New SCLC Histone data from Cold Spring Harbor Lab (CSHL)
  • New global SCLC Histone data computed from CSHL and UTSW
  • New SCLC crispr data from Achilles project
  • Incorporated replication stress score based on gene expression

Version 1.0

March 2020: [pubmed: 33086069]

  • Official lunch of CellMinerCDB website dedicated to the SCLC
  • Data sources are NCI/DTP SCLC, CCLE , CTRP , GDSC and UTSW
  • Added a new Global SCLC gene expression from all data sources
  • Integrated NAPY subtypes for ALL SCLC cell lines
  • Incorporated APM enrichment score
  • Improved Pattern comparison to allow across data source comparison

discover.nci.nih.gov

An official website of the National Institutes of Health

Looking for U.S. government information and services?
Visit USA.gov