Projects that are tagged with information extraction.


Logo Aika 0.17

by molzberger - May 14, 2018, 15:42:00 CET [ Project Homepage BibTeX Download ] 30592 views, 7975 downloads, 0 subscriptions

About: Aika is an open source text mining engine. It can automatically extract and annotate semantic information in text. In case this information is ambiguous, Aika will generate several hypothetical interpretations about the meaning of this text and retrieve the most likely one.

Changes:

Aika Version 0.17 2018-05-14

  • Introduction of synapse relations. Previously the relation between synapses was implicitly modeled through word positions (RIDs). Now it is possible to explicitly model relations like: The end position of input activation 1 equals to the begin position of input activation 2. Two types of relations are currently supported, range relations and instance relations. Range relations compare the input activation range of a given synapse with that of the linked synapse. Instance relations also compare the input activations of two synapses, but instead of the ranges the dependency relations of these activations are compared.
  • Removed the norm term from the interpretation objective function.
  • Introduction of an optional distance function to synapses. It allows to model a weakening signal depending on the distance of the activation ranges.
  • Example implementation of a context free grammar.
  • Example implementation for co-reference resolution.
  • Work on an syllable identification experiment based on the meta network implementation.

Aika Version 0.15 2018-03-16

  • Simplified interpretation handling by removing the InterpretationNode class and moving the remaining logic to the Activation class.
  • Moved the activation linking and activation selection code to separate classes.
  • Ongoing work on the training algorithms.

Logo Pattern 2.4

by tomdesmedt - August 31, 2012, 02:26:01 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 23067 views, 7157 downloads, 0 subscriptions

About: "Pattern" is a web mining module for Python. It bundles tools for data retrieval, text analysis, clustering and classification, and data visualization.

Changes:
  • Small bug fixes in overall + performance improvements.
  • Module pattern.web: updated to the new Bing API (Bing API has is paid service now).
  • Module pattern.en: now includes Norvig's spell checking algorithm.
  • Module pattern.de: new German tagger/chunker, courtesy of Schneider & Volk (1998) who kindly agreed to release their work in Pattern under BSD.
  • Module pattern.search: the search syntax now includes { } syntax to define match groups.
  • Module pattern.vector: fast implementation of information gain for feature selection.
  • Module pattern.graph: now includes a toy semantic network of commonsense (see examples).
  • Module canvas.js: image pixel effects & editor now supports live editing

Logo MALLET 2.0-rc4

by jacktanner - August 24, 2009, 23:10:14 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 28851 views, 5205 downloads, 0 subscriptions

About: MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to [...]

Changes:

MALLET 2.0 RC4 Release Notes July 16, 2009

Major updates:

An implementation of generalized expectation criteria training of MaxEnt classifiers and methods for obtaining constraints (c.f. Gregory Druck, Gideon Mann, Andrew McCallum "Learning from Labeled Features using Generalized Expectation Criteria.")

PagedInstanceList has been substantially rewritten by Mike Bond.

Bug fixes to topic model hyperparameter optimization and topic inference.


Logo Aleph 0.6

by jiria - January 12, 2009, 20:52:12 CET [ Project Homepage BibTeX Download ] 18015 views, 4420 downloads, 0 subscriptions

About: Aleph is both a multi-platform machine learning framework aimed at simplicity and performance, and a library of selected state-of-the-art algorithms.

Changes:

Initial Announcement on mloss.org.


Logo Ngram Statistics Package 1.09

by tpederse - August 12, 2008, 18:21:52 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 14027 views, 3008 downloads, 0 comments, 0 subscriptions

About: The Ngram Statistics Package is a suite of Perl modules that identifies significant multi-word units (collocations) in written text using many different tests of association. NSP allows a user to [...]

Changes:

Initial Announcement on mloss.org.


Logo MinorThird 20080414

by frank - June 9, 2008, 09:08:30 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 14607 views, 3859 downloads, 0 subscriptions

About: MinorThird is a collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text. It was written primarily by William W. Cohen, a professor at [...]

Changes:

Initial Announcement on mloss.org.