Kenji Sagae

Research Assistant Professor
Computer Science Department

Research Scientist
Institute for Creative Technologies

University of Southern California

Phone: +1 310 448 0319

Click here to reveal my current email address

Or you may send email to sagae+web@cs.cmu.edu
   


This semester (Spring 2012) I'm teaching CSCI 561 Foundations of Artificial Intelligence with Liang Huang

NEW: I'm looking for an NLP intern this summer (2012). The project involves spoken language understanding, and I'm looking for either a grad student or advanced undergrad with experience in NLP, or speech, or machine learning.

Go to my publications.

Get my dependency parser for child language transcripts (CHILDES parser).

The child language transcripts annotated with grammatical relations are available directly from the CHILDES database (unzip the archive and you will find the annotated files in the Eve directory).

Get my dependency parser (with CoNLL-X input/output format)


June 2008: I am now at the USC Institute for Creative Technologies.

Novemeber 2005: I successfully defended my thesis, A multi-strategy approach to parsing of grammatical relations in child language transcripts.

My thesis research focused on syntactic analysis of CHILDES data, but the main parsing issues are applicable to the general problem of parsing natural language. See my thesis summary and defense slides.

Thesis Advisors (while at CMU)

Additional thesis committee members


Research

My primary reserch interest is natural language processing, and much of my recent work has been on data-driven and linguistically-motivated models for syntactic parsing. Topics in my current work include: interfacing shallow and deep syntactic analysis, parser ensembles, discriminative disambiguation models, parsing efficiency, and descriptive adequacy of syntactic formalisms. I have applied this research in topics ranging from child language development to bioinformatics. See my list of publications.

My research at CMU involved the identification of grammatical relations, or GRs, (such as subjects, objects and adjuncts) in corpora of transcribed dialogs between children and parents. Most of these transcripts came from the CHILDES Database, but I also worked with transcripts from other sources.

A summary of my GR parsing approach for CHILDES appeared in

Sagae, K. Davis, E., Lavie, A., MacWhinney, B. and Wintner, S. 2007. High-accuracy annotation and parsing of CHILDES transcripts. Proceedings of the ACL-2007 Workshop on Cognitive Aspects of Computational Language Acquisition. Prague, Czech Republic.

For an example of how syntactic analysis of child language can be used, look at

Sagae, K., Lavie, A., and MacWhinney, B. (2005) Automatic measurement of syntactic development in child langugage. In proceedings of the 42nd Meeting of the Association for Computational Linguistics. Ann Arbor, Michigan.

I have also worked on applying discriminative dependency parsing approaches (such as the one I developed for my thesis work) to syntactic analysis based on more linguistically sophisticated models (such as HPSG). For an introduction to this research, see

Sagae, K., Miyao, Y. and Tsujii, J. 2007. HPSG Parsing with shallow dependency constraints. Proceedings of the 44th Meeting of the Association for Computational Linguistics (ACL'07). Prague, Czech Republic.

A different (but related) aspect of my dissertation is the combination of several parsers to improve parsing accuracy. My graph-based ensemble approach for dependency parsing was shown to be very effective in the 2007 CoNLL shared task on multilingual dependency parsing. My parser combination work was first published as

Sagae, K. and Lavie, A. 2006. Parser combination by reparsing. Proceedings of the 2006 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics - short papers (HLT-NAACL'06). New York, NY.

Other topics I have worked on include parser evaluation, conversion among syntactic representation formalisms, machine translation evaluation, and identification of protein-protein interactions from text. See my list of publications.