GDep (GENIA Dependency parser)


A dependency parser for biomedical text developed by
Kenji Sagae
at Tsujii Lab (University of Tokyo) and
the Institute for Creative Technologies (University of Southern California).


This is a version of the KSDep dependency parser trained on the GENIA Treebank for parsing biomedical text. KSDep is described in

Sagae, K., Tsujii, J. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. Proceedings of the CoNLL 2007 Shared Task. Joint Conferences on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). Prague, Czech Republic.

and was used in the experiments in

Miyao, Y., Saetre, R., Sagae, K., Matsuzaki, T. and Tsujii, J. 2008. Task-oriented Evaluation of Syntactic Parsers and Their Representations. Proceedings of the 45th Meeting of the Association for Computational Linguistics (ACL'08:HLT).

Download

GDep beta2
Sep 10, 2010: The current version now compiles using gcc4.
Source code and models for tagging and parsing biomedical text.

Windows 32 and Linux 64 executables included. Executable for Mac OS X coming eventually.

Building GDep

To build GDep in Linux, MacOS X, or Windows with Cygwin, you need gcc. Unpack the archive with
tar xzvf gdep-beta1a.tgz

Then type
cd gdep-beta1a
make

This will produce an executable named gdep.

Using GDep

To parse biomedical text, simply type
gdep INPUTFILE
where INPUTFILE is a text file containing one sentence per line.

Output is written to stdout. To save the output to a file, type
gdep INPUTFILE > OUTPUTFILE

where OUTPUTFILE is the file where the parser output will be written.

If you don't specify an input file, GDep accepts input from stdin.

Anyone is free to download and use the parser and the models included for research purposes. However, because this is a beta release, I strongly recommend you contact me (sagae+lrdep at cs dot cmu dot edu) if you want to do anything beyond simple testing.

If you publish work in which GDep is used, please cite the 2007 paper above.