|
|
EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature
Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Spellman, P., Iyer, V., Jeffrey, S.S.,
Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C.F., Lashkari, D., Shalon, D., Myers, T.G., Weinstein,
J.N., Botstein, D., and Brown, P.O.
Nature Genetics, 2000 March, 24(3):227-234
EDGAR (Extraction of Drugs, Genes and Relations)
is a natural language processing system that extracts assertions of relationships between
drugs and genes relevant to cancer from the biomedical literature. These automatically
generated assertions have remarkable potential to facilitate computational analysis in the
molecular biology of cancer, and the technology is straightforwardly generalizable to many
areas of biomedicine. This paper reports on the mechanisms for automatically generating
such assertions and on a simple application??conceptual clustering of documents. The
system uses a stochastic part of speech tagger, generates an underspecified syntactic
parse and then uses semantic and pragmatic information to construct its assertions. The
system builds on two important existing resources: the MEDLINE database of biomedical
citations and abstracts and the Unified Medical Language System, which provides syntactic
and semantic information about the terms found in biomedical abstracts.
|