Project details for mldata-utils

Logo mldata-utils 0.4.1

by sonne - December 7, 2010, 03:06:42 CET [ Project Homepage BibTeX Download ]

view (9 today), download ( 2 today ), 1 comment, 1 subscription


Tools to convert data + task files from and to HDF5 and also retrieve extracts from the generated files. In addition it provides various performance measures to e.g. compute ROC curves and API to communicate with, to retrieve data sets, run experiments on remotely defined tasks and to upload results.

Changes to previous version:
  • Various bugfixes (sparse matrix, data extraction).
  • Client api to interact with works with live website now.
BibTeX Entry: Download
URL: Project Homepage
Supported Operating Systems: Posix
Data Formats: Svmlight, Matlab, Arff, Octave, Hdf, Csv
Tags: Matlab, Octave, Python, Arff, Data Formats, Weka, Libsvm, Csv, H5, Hdf5, Performance Measures
Archive: download here

Other available revisons

Version Changelog Date
  • Change task file format, such that data splits can have a variable number items and put into up to 256 categories of training/validation/test/not used/...
  • Various bugfixes.
April 8, 2011, 10:02:44
  • Various bugfixes (sparse matrix, data extraction).
  • Client api to interact with works with live website now.
December 7, 2010, 03:06:42
  • Finally reliably convert sparse, dense matrices of floating point or integer types and string lists from/to .hdf5, octave, matlab, csv, arff.
  • Added examples and a small test-suite.
November 7, 2010, 14:39:56
  • Added a fix when data.get_correct internally receives an array of array with values instead an array with values.
  • Added support for sparse matrices in data.get_correct.
August 27, 2010, 15:31:58
  • Introduced task.get_test_output to get test_idx and output_variables from Task file.
  • Introduced data.get_correct() to get the 'correct' results from Data file.
  • Fixed minor issus when converting to octave/matlab.
August 25, 2010, 18:56:53
  • Fixed an issue with data extracts.
  • Fixed an issue when updating Task files.
  • Fixed a few issues when converting to arff/octave/matlab.
August 24, 2010, 16:01:27
  • task.create now includes handling of input/output_variables and train/test_idx.
  • fixed a little error handling octave files.
August 21, 2010, 13:02:07
  • Had removed too much from data.get_extract and put it back in.
  • Added safeguard for illegal task files with no output_variables.
August 20, 2010, 10:35:46
  • Restructured package into more different modules.
  • Revamped conversion structure.
  • Bugfix re Task vs output variables.
August 19, 2010, 12:42:57
  • Caught a few more error conditions when handlings Task.
  • Temporarily removed author from package information because it threw ugly error message on older python installations.
  • Removed label_dims and improved support for input/output variables for Tasks.
  • Created new module 'data' for better encapsulation.
August 17, 2010, 12:00:51
  • Added extract function (and script) for Task datasets.
  • Moved extract function for Data from website to this tool.
  • Improved handling of Task files.
August 16, 2010, 11:52:18

Initial Announcement on

July 21, 2010, 15:03:24

Initial Announcement on

July 12, 2010, 13:33:04


Yaroslav Halchenko (on December 16, 2010, 05:47:51)

any plans for furnishing Debian package, Soeren? I see no ITP ;)

Leave a comment

You must be logged in to post comments.