About:
TurboParser is a free multilingual dependency parser based on linear programming developed by André Martins.
It is based on joint work with Noah Smith, Mário Figueiredo, Eric Xing, Pedro Aguiar.
Changes:
This version introduces a number of new features:
The parser does not depend anymore on CPLEX (or any other non-free LP solver). Instead, the decoder is now based on AD3, our free library for approximate MAP inference.
The parser now outputs dependency labels along with the backbone structure.
As a bonus, we now provide a trainable part-of-speech tagger, called TurboTagger, which can be used in standalone mode, or to provide part-of-speech tags as input for the parser. TurboTagger has state-of-the-art accuracy for English (97.3% on section 23 of the Penn Treebank) and is fast (~40,000 tokens per second).
The parser is much faster than in previous versions. You may choose among a basic arc-factored parser (~4,300 tokens per second), a standard second-order model with consecutive sibling and grandparent features (the default; ~1,200 tokens per second), and a full model with head bigram and arbitrary sibling features (~900 tokens per second).
Note: The runtimes above are approximate, and based on experiments with a desktop machine with a Intel Core i7 CPU 3.4 GHz and 8GB RAM.
To run this software, you need a standard C++ compiler. This software has the following external dependencies: AD3, a library for approximate MAP inference; Eigen, a template library for linear algebra; google-glog, a library for logging; gflags, a library for commandline flag processing. All these libraries are free software and are provided as tarballs in this package.
This software has been tested on Linux, but it should run in other platforms with minor adaptations.
|