annotation.org

Tools for Natural Language Processing

  • Increase font size
  • Default font size
  • Decrease font size
Home Blog NLP (Natural Language Processing)
Annotation.org Blog

Average Perceptron Algorithm Better Than Average.

E-mail Print PDF

I recently finished up an initial implementation of an average perceptron trainer and a simple version of applying it to sequences as described in Collins 2002.  There have been a couple of pleasant surprises which have some interesting implications for OpenNLP and the development of models in general.

While this work has been around for while, I wasn't particularly interested in what seemed like small improvements in sequence tagging.  I've tried alternatives to Maxent-GIS in the past such as gaussian smoothing, mallet's parameter estimation, conjugent gradient methods, etc and have typically found that the improvements don't scale to larger data sets either in terms of run time performance or accuracy.  This was also my reaction to the feature set used in Toutanova and Manning, 2000 and 2003 (with Klein and Singer);  an extreme trade-off of run time performance for accuracy. Matthew Wilkens reports results that indicate that it is pretty slow comparatively. This occurs a lot in academic work which has publication as its currency for success and not working systems.

Fortunately, a number of people started using average perceptron for ranking and mentioned that its implementation was pretty trivial.  A parse re-ranker is something that OpenNLP could use.  It's parser works ok for one without re-ranking, but most modern parses use re-ranking to improve parsing accuracy.  This made looking into average perceptron worth while.  I figured I'd start with pos-tagging as I already had something to compare it to.

When I took the time to to decode the math in the paper, I discovered that the implementation was in fact pretty easy and that even without the sequence modifications, performance is decent.  This would be uninteresting except that it is also balls fast to train.  For instance, it takes the better part of a day to train the maxent version of the pos-tagger on 1.5 million words.  It takes under 6 minutes to train an average-perceptron version.  The difference in performance on section 00 of the WSJ for your day of cpu cycles is about a .2%.

  No tag dictionaryTag dictionary
 maxent96.50% 96.58%
 average-perceptron95.22% 96.38%


It's also much faster to run as there is no converting to and from the log domain as their is in determining best sequences in maxent models. 

 No tag dictionary
Tag dictionary
maxent
8705 w/s
9750 w/s
 average-perceptron10600 w/s13500 w/s

This is nice because this should also apply to models built using the sequence-based updating scheme.  Unfortunately, these models currently take a solid day to train as they re-tag every sentence for every iteration, but there is hope for optimization.

In the future I'll probably use regular average perceptron model when I'm trying to figure out how to model stuff and then try several learners once I'm pretty happy with my features.  This should make development easier, but also has interesting implications for feature selection.

Last Updated on Tuesday, 11 August 2009 19:59
 

I could have been a contender

E-mail Print PDF

Reading Bob Carpender's blog today, I was introduced to a series of posts about part-of-speech evaluation by Matthew Wilkens in his blog, Work Product.  I couldn't help but be a little disappointed that he didn't include OpenNLP Tools in his POS tagger evaluation.  He looked at Ling Pipe, Stanford's tagger, Morph Adorner, and Tree Tagger,  This is especially the case since in many ways this guy is the target audience for OpenNLP.  Basically, someone whose willing to do a little programming, but wants the tools to work on most domains out-of-the-box with minimal amounts of futzing and who wants a true open-source solution.

 From the looks of his conclusions and evaluations we would have fared pretty well.  The criteria he is interested in are: Accuracy, Tagset, Training Data, Speed, License/Source Code/Cost, Thread Safety and Input/Output. I'm not entirely clear what his evaluation procedures were for accuracy, so its hard to say for sure, but our accuracy numbers on Wall Street Journal (96.8%) and Brown (98.3%) seem to put us in the ball park of the 97.0% number his evaluation showed.  The OpenNLP tagger is trained on Wall Street Journal, Brown Corpus, and about 8k words of narrative text so it should be a reasonable match for Wilkens' target domain of literary text.  Our tagset is the Penn Treebank tagset.     Matthew says he prefers the Brown or Morph Adorner tagset:

Out of the box, then, I think MorphAdorner and NUPOS win for literary work, with LingPipe/Brown a reasonably close second. Stanford and TreeTagger usethe significantly smaller Penn tagset, which seems less suitable for my needs.

While I can't speak for the Morph Adorner tagset, the Brown set has always struck me as just a more lexicalized version of the Penn Treebank tagset.  I suspect that a classifier could map word/brown_tag to ptb_tag quite easily but I'll have to try that for myself.  Perhaps I can get a new source of nearly human tagged Brown data out of that work.  Speed-wise I think we do quite well.   On my home machine, which is slower than the one used in the evaluation, a quick test shows the tagger go through 8500 words a second.  This would place us second in his evaluation behind TreeTagger.  License, code, and TCO-wise I think we win hands down.  We're LGPL, have regular releases, are not tied to a research grant or a particuar grad student, and have been around for a while.  As of he 1.4 release the underlying model code is thread-safe so you can re-use the same model in multiple threads which accounts for most of the memory usage for any of our tools.  Finally on the input output front, we support setting the encoding of the data, and have a pretty simple format.

I think the bigger question this poses for OpenNLP is how to advertise better.  OpenNLP has a decent size user base.  In the last 12 months, we had just under 12,000 downloads, but this post lets us know we're still missing some core members of our audience.  Part of what I think is missing is documentation and other writing about OpenNLP.  While there are a number of things in the works including a white paper for the research types with evaluation data, and a book targeted at developers, I'll try and make an immediate impact with this blog post.

Last Updated on Monday, 10 August 2009 22:57