Projects that are tagged with data cleaning.


Logo DCABags 0.7

by wbuntine - June 5, 2014, 05:34:44 CET [ Project Homepage BibTeX Download ] 18221 views, 4358 downloads, 0 subscriptions

About: Document/Text preprocessing for topic models: suite of Perl scripts for preprocessing text collections to create dictionaries and bag/list files for use by topic modelling software.

Changes:

Moved distribution and code across to GitHub. Changed "ldac" format to have 0 offset for word indices. Added "document frequency" (df) filtering on selection of tokens for linkTables. Playing with linkParse but its still unuseable generally.


Logo GritBot 2.01

by zenog - September 2, 2011, 14:56:26 CET [ Project Homepage BibTeX Download ] 8590 views, 2231 downloads, 0 subscriptions

About: GritBot is an data cleaning and outlier/anomaly detection program.

Changes:

Initial Announcement on mloss.org.