Project details for TurboParser

Logo TurboParser 0.1

by afm - December 3, 2009, 01:50:55 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (8 today), download ( 0 today ), 1 subscription

Description:

Dependency parsing is a lightweight syntactic formalism that relies on lexical relationships between words. Nonprojective dependency grammars may generate languages that are not context-free, offering a formalism that is arguably more adequate for some natural languages. Statistical parsers, learned from treebanks, have achieved the best performance in this task. While only local models (arc-factored) allow for exact inference, it has been shown that including non-local features and performing approximate inference can greatly increase performance. To learn the model, we implement a structured SVM with LP-relaxed inference.

This package contains a C++ implementation of an unlabeled dependency parser.

This package allows:

* learning the parser from a treebank,
* running the parser on new data,
* evaluating the results against a gold-standard.

To run this software, you need to have ILOG CPLEX installed in your system. ILOG is a commercial MILP solver. For more information regarding ILOG CPLEX, please go to http://www.ilog.com/products/cplex. You need also to have the Boost C++ libraries installed in your system.

Changes to previous version:

Initial Announcement on mloss.org.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
URL: Project Homepage
Supported Operating Systems: Linux, Agnostic
Data Formats: Ascii
Tags: Dependency Parser, Large Margin Structured Classifier
Archive: download here

Other available revisons

Version Changelog Date
2.0

This version introduces a number of new features:

  • The parser does not depend anymore on CPLEX (or any other non-free LP solver). Instead, the decoder is now based on AD3, our free library for approximate MAP inference.

  • The parser now outputs dependency labels along with the backbone structure.

  • As a bonus, we now provide a trainable part-of-speech tagger, called TurboTagger, which can be used in standalone mode, or to provide part-of-speech tags as input for the parser. TurboTagger has state-of-the-art accuracy for English (97.3% on section 23 of the Penn Treebank) and is fast (~40,000 tokens per second).

  • The parser is much faster than in previous versions. You may choose among a basic arc-factored parser (~4,300 tokens per second), a standard second-order model with consecutive sibling and grandparent features (the default; ~1,200 tokens per second), and a full model with head bigram and arbitrary sibling features (~900 tokens per second).

Note: The runtimes above are approximate, and based on experiments with a desktop machine with a Intel Core i7 CPU 3.4 GHz and 8GB RAM. To run this software, you need a standard C++ compiler. This software has the following external dependencies: AD3, a library for approximate MAP inference; Eigen, a template library for linear algebra; google-glog, a library for logging; gflags, a library for commandline flag processing. All these libraries are free software and are provided as tarballs in this package.

This software has been tested on Linux, but it should run in other platforms with minor adaptations.

October 11, 2012, 02:59:04
0.1

Initial Announcement on mloss.org.

December 3, 2009, 01:50:55

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.