An Information-Intensive Approach to the Molecular Pharmacology of Cancer
Science 275:343-349 (1997) (January 17 issue).
Abstract. Since 1990, the National Cancer Institute (NCI) has screened more than 60,000 compounds against a panel of 60 human cancer cell lines. The 50 percent growth-inhibitory concentrations (GI50) for any single cell line is simply an index of cytotoxicity or cytostasis, but the patterns of 60 such GI50 values encode unexpectedly rich, detailed information on mechanisms of drug action and drug resistance. Each compound's pattern is like a fingerprint, essentially unique among the many billions of distinguishable possibilities. These activity patterns are being used in conjunction with molecular structural features of the tested agents to explore the NCI's database of more than 460,000 compounds, and they are providing insight into potential target molecules and modulators of activity in the 60 cell lines. For example, the information is being used to search for candidate anticancer drugs that are not dependent on intact p53 suppressor gene function for their activity. It remains to be seen how effective this information-intensive strategy will be at generating new clinically active agents.
A Protein Expression Database for the Molecular Pharmacology of Cancer
Electrophoresis: in press.
ABSTRACT. In the last six years, the Developmental Therapeutics Program (DTP) of the U.S. National Cancer Institute (NCI) has screened over 60,000 chemical compounds and a larger number of natural product extracts for their ability to inhibit growth of 60 different cancer cell lines representing different organs of origin. Whereas inhibition of the growth of one cancer cell type gives no information on drug specificity, the relative growth inhibitory activities against 60 different cells constitute patterns that have been found to encode detailed information on mechanisms of action and resistance [Paull, 1989 #23; Weinstein, 1992 #15]. In order to correlate the patterns of activity with properties of the cells, we and other laboratories are characterizing the cells with respect to a large number of factors at the DNA, mRNA, and protein levels. As part of that effort, we have developed a 2-dimensional gel electrophoresis (2-DE) protein expression database covering all 60 cell types [Buolamwini, submitted]. Here we present analyses of the correlations among protein spots (i) in terms of their patterns of expression and (ii) in terms of their apparent relationships to drug activity for a set of 3,989 tested compounds. To our surprise, the correlations were, on average, stronger for the latter than for the former, suggesting that the spots have more robust signatures in terms of the pharmacology than in terms of expression levels.
Patterns of Protein Expression in Cells of the NCI Cancer Drug Discovery Program
submitted
Background: The National Cancer Institute's in vitro anticancer drug screen based on 60 human cancer cell lines has generated a database of growth inhibitory activities for approximately 60,000 chemical compounds and a larger number of natural product extracts. The patterns of drug activity (i.e., patterns of cell sensitivity) have been shown by both classical statistics and neural networks to be rich in information on mechanisms of action and resistance. Purpose: To create a database resource on protein expression in the 60 cell lines of the screen and pan for molecular targets (as defined to include modulators of drug action) by examining correlations of protein abundance with chemosensitivity.
Methods: High-resolution 2-dimensional polyacrylamide gel electrophoresis (2-D PAGE) was used to analyze protein expression. Electrophoresis was performed using the ISODALT system on whole cell lysates obtained by rapid solubilization in 2% NP-40 detergent with 9 M urea and 0.5 % dithiothreitol. Proteins were detected by Coomassie blue staining. Digitized gel images were indexed, and spots were quantified and matched using the KEPLER program. Protein expression and patterns of drug activity were analyzed using the DISCOVERY programs.
Results: A total of 1,014 resolved protein species were indexed, including 22 of known identity. Cluster analysis reveals considerable similarities within the leukemia, melanoma, and colon cancer cell panels in terms of protein expression. Renal and central nervous system cancer subpanels show moderate internal similarity. Ovarian, breast, and non-small cell lung tumor lines do not group coherently. When the patterns of protein expression were correlated with patterns of cytotoxic/cytostatic activity for a set of 86 standard chemotherapeutic, some of the indexed proteins were identified as possible targets or modulators of drug activity.
Conclusions: An informationally coherent 2-D PAGE database for protein expression has been created for 60 human cancer cell lines from 9 different organs of origin. This is, to our knowledge, the most extensive protein database on disparate cell types yet developed. The information on patterns of protein expression can be integrated with our databases on activity patterns of tested compounds and on their molecular structures. As progressively more of the spots are identified (by mass spectrometry) this information will become increasingly useful to the drug discovery process and it will serve to generate hypotheses about the molecular pharmacology of cancer.
Neural Computing in Cancer Drug Development: Predicting Mechanism of Action
ABSTRACT. Described here are neural networks capable of predicting a drug's mechanism of action from its pattern of activity against a panel of 60 malignant cell lines in the National Cancer Institute's drug screening program. Given 6 possible classes of mechanism, the network misses the correct category for only 12 out of 141 agents (8.5%), whereas linear discriminant analysis, a standard statistical technique, misses 20 out of 141 (14.2%). The success of the neural net indicates (i) that the cell line response patterns are rich in information about mechanism, (ii) that appropriately designed neural networks can make effective use of that information, and (iii) that trained networks can be used to classify prospectively the more than 10,000 agents per year tested by the screening program. Related networks, in combination with classical statistical tools, will help in a variety of ways to move new agents through the pipeline from in vitro studies to clinical application.
Use of the Kohonen self-organizing map to study the mechanisms of action of chemotherapeutic agents
Background: Many natural and synthetic compounds might prove to be effective in cancer chemotherapy. To identify potentially useful agents, the National Cancer Institute screens over 10,000 compounds annually against a panel of 60 distinct human tumor cell lines in vitro. This screening program generates large amounts of data that are organized into relational databases. Important questions concern the information content of the data and ways to extract that information. Previously, statistical techniques have revealed that compounds with similar patterns of activity against the 60 cell lines are often similar in structure and mechanism of action. Feed-forward, back-propagation neural networks have been trained on this type of data to predict broadly defined mechanisms of action of chemotherapeutic agents.
Purpose and Method: In this report, we examine the information that can be extracted from the screening data by means of another type of neural entwork paradigm, the Kohonen self-organizing map. This is a topology-preserving function, obtained by unsupervised learning, that nonlinearly projects the high-dimensional activity patterns into two dimensions. Our dataset is almost identical to that used in the earlier neural network study.
Results: The self-organizing maps we constructed have several important characteristics. (1) They partition the two-dimensional array into distinct regions, each of which is principally occupied by agents having the same broadly defined mechanism of action. (2) These regions can be resolved into distinct subregions that conform to plausible submechanisms and chemically defined subroups of submechanisms. (3) These results (and exceptions to them) are consistent with those obtained with the use of such deterministic measures of similarity among activity patterns as the Euclidean distance or Pearson correlation coefficient.
Conclusions: Our results indicate that the activity patterns obtained from the screen contain detailed information about mechanism of action and its basis in chemical structure. The self-organizing map can be used to suggest the mechanism of action of compounds identified by the screen as potentially useful chemotherapeutic agents and to probe the biology of the cell lines in the cancer screen. Kohonen self-organizing maps, unlike the previously applied neural networks, preserve and reveal the relationships among compounds acting by similar mechanisms and therefore have the potential to identify compounds that act by novel cytotoxic mechanisms.
Predictive Statistics and Artificial Intelligence in the U.S. National Cancer Institute's Drug Discovery Program for Cancer and AIDS
ABSTRACT. The National Cancer Institute's drug discovery program screens more than 20,000 compounds a year for activity against a panel of 60 tumor cell lines in vitro. The result is an information-rich database of patterns that form the basis for what we term an "information-intensive" approach to the process of drug discovery. The first step was a demonstration, both by statistical methods and by neural networks, that patterns of activity in the screen can be used to predict a compound's mechanism of action. Given this finding, the overall plan has been to develop three large matrices of information: the first (designated A) gives the pattern of activity for each compound tested against each cell line in the screen; the second (S) encodes any of a number of types of 2-D or 3-D structural motifs for each compound; the third (T) indicates each cell's expression of molecular targets (e.g., from 2-dimensional protein gel electrophoresis). Construction and updating of these matrices is an ongoing process. The matrices can be concatenated in various ways to test a variety of specific hypotheses about compounds screened, as well as to "prioritize" candidate compounds for testing. To aid in these efforts, we have developed the DISCOVER program package, which integrates the matrix data for visual pattern recognition. The "information-intensive" approach summarized here in some senses serves to bridge the perceived gap between screening and structure-based drug design.