Projects supporting the arff data format.
Showing Items 1-20 of 26 on page 1 of 2: 1 2 Next

Logo python weka wrapper 0.3.1

by fracpete - April 23, 2015, 00:06:57 CET [ Project Homepage BibTeX Download ] 12129 views, 2521 downloads, 3 subscriptions

About: A thin Python wrapper that uses the javabridge Python library to communicate with a Java Virtual Machine executing Weka API calls.

Changes:
  • added "get_tags" class method to "Tags" class for easier instantiation of Tag arrays
  • added "find" method to "Tags" class to locate "Tag" object that matches the string
  • fixed "getitem" and "setitem" methods of the "Tags" class
  • added "GridSearch" meta-classifier with convenience properties to module "weka.classifiers"
  • added "SetupGenerator" and various parameter classes to "weka.core.classes"
  • added "MultiSearch" meta-classifier with convenience properties to module "weka.classifiers"
  • added "quote"/"unquote" and "backquote"/"unbackquote" methods to "weka.core.classes" module
  • added "main" method to "weka.core.classes" for operations on options: join, split, code
  • added support for option handling to "weka.core.classes" module

Logo ADAMS 0.4.8

by fracpete - March 4, 2015, 00:54:04 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 10982 views, 2387 downloads, 3 subscriptions

About: The Advanced Data mining And Machine learning System (ADAMS) is a novel, flexible workflow engine aimed at quickly building and maintaining real-world, complex knowledge workflows.

Changes:
  • 13 new actors
  • 1 new conversion
  • new module adams-access: for accessing MS Access databases (read/write)
  • adams-heatmap module overhaul
  • adams-imaging: barcode (QRCode etc) encoding/decoding, multi-image operations (and, or, xor)
  • Flow editor gets a "quick edit" tab
  • MEKA upgraded to 1.7.5
  • Weka filter "Scale" (unsupervised/instance) allows you to scale the values of a row eg to interval 0 to 1
  • SimplePlot sink is a "dumbed down" version of the SequencePlotter with only basic options -- enough to create good looking plots quickly
  • Upper/LowerCase conversion take the locale into account now
  • added print support for PDFs
  • fixed sluggish behavior in Flow editor (open/save/undo)
  • TryCatch correctly flushes token now
  • spreadsheet column range/index sometimes failed in conjunction with variables
  • fixed memory leak in Weka Explorer plugins FixedClassifierErrorPlot, ThresholdCurve
  • WekaExcel upgraded to 1.0.5 (no longer omits last row in sheets)
  • WhileLoop did not react to changes in variables once looping, ie conditions couldn't make use of variables
  • ImageProcessor now works again with the improved ImageFileChooser dialog
  • PreviewBrowser displays arrays in a more meaningful way
  • WekaFileReader didn't output empty datasets in DATASET mode
  • obtaining subsets from Notes objects only resulted in first element being retrieved

Logo JMLR Mulan 1.5.0

by lefman - February 23, 2015, 21:19:05 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 17298 views, 6673 downloads, 2 subscriptions

About: Mulan is an open-source Java library for learning from multi-label datasets. Multi-label datasets consist of training examples of a target function that has multiple binary target variables. This means that each item of a multi-label dataset can be a member of multiple categories or annotated by many labels (classes). This is actually the nature of many real world problems such as semantic annotation of images and video, web page categorization, direct marketing, functional genomics and music categorization into genres and emotions.

Changes:

Learners

  • MLCSSP.java: Added the MLCSSP algorithm (from ICML 2013)
  • Enhancements of multi-target regression capabilities
  • Improved CLUS support
  • Added pairwise classifier and pairwise transformation

Measures/Evaluation

  • Providing training data in the Evaluator is unnecessary in the case of specific measures.
  • Examples with missing ground truth are not skipped for measures that handle missing values.
  • Added logistics and squared error losses and measures

Bug fixes

  • IndexOutOfBounds in calculation of MiAP and GMiAP
  • Bug fix in Rcut.java
  • When in rank/score mode the meta-data contained additional unecessary attributes. (Newton Spolaor)

API changes

  • Upgrade to Java 7
  • Upgrade to Weka 3.7.10

Miscalleneous

  • Small changes and improvements in the wrapper classes for the CLUS library
  • ENTCS13FeatureSelection.java (new experiment)
  • Enumeration is now used for specifying the type of meta-data. (Newton Spolaor)

Logo NaN toolbox 2.7.1

by schloegl - February 3, 2015, 19:03:36 CET [ Project Homepage BibTeX Download ] 34059 views, 6943 downloads, 2 subscriptions

About: NaN-toolbox is a statistics and machine learning toolbox for handling data with and without missing values.

Changes:

Changes in v.2.7.1 - API compatibility of mahal, zscore, - improve support for cygwin, macosx/homebrew - a number of minor improvements

For details see the CHANGELOG at http://pub.ist.ac.at/~schloegl/matlab/NaN/CHANGELOG


Logo Hub Miner 1.1

by nenadtomasev - January 22, 2015, 16:33:51 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1670 views, 306 downloads, 2 subscriptions

About: Hubness-aware Machine Learning for High-dimensional Data

Changes:
  • BibTex support for all algorithm implementations, making all of them easy to reference (via algref package).

  • Two more hubness-aware approaches (meta-metric-learning and feature construction)

  • An implementation of Hit-Miss networks for analysis.

  • Several minor bug fixes.

  • The following instance selection methods were added: HMScore, Carving, Iterative Case Filtering, ENRBF.

  • The following clustering quality indexes were added: Folkes-Mallows, Calinski-Harabasz, PBM, G+, Tau, Point-Biserial, Hubert's statistic, McClain-Rao, C-root-k.

  • Some more experimental scripts have been included.

  • Extensions in the estimation of hubness risk.

  • Alias and weighted reservoir methods for weight-proportional random selection.


Logo JEMLA 1.0

by bathaeian - January 4, 2015, 08:34:49 CET [ Project Homepage BibTeX Download ] 657 views, 184 downloads, 3 subscriptions

About: Java package for calculating Entropy for Machine Learning Applications. It has implemented several methods of handling missing values. So it can be used as a lab for examining missing values.

Changes:

Discretizing numerical values is added to calculate mode of values and fractional replacement of missing ones. class diagram is on the web http://profs.basu.ac.ir/bathaeian/free_space/jemla.rar


Logo JMLR JKernelMachines 2.5

by dpicard - December 11, 2014, 17:51:42 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 17801 views, 4174 downloads, 4 subscriptions

Rating Whole StarWhole StarWhole StarWhole Star1/2 Star
(based on 4 votes)

About: machine learning library in java for easy development of new kernels

Changes:

Version 2.5

  • New active learning algorithms
  • Better threading management
  • New multiclass SVM algorithm based on SDCA
  • Handle class balancing in cross-validation
  • Optional support of EJML switch to version 0.26
  • Various bugfixes and improvements

Logo pySPACE 1.2

by krell84 - October 29, 2014, 15:36:28 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 3169 views, 667 downloads, 1 subscription

About: pySPACE is the abbreviation for "Signal Processing and Classification Environment in Python using YAML and supporting parallelization". It is a modular software for processing of large data streams that has been specifically designed to enable distributed execution and empirical evaluation of signal processing chains. Various signal processing algorithms (so called nodes) are available within the software, from finite impulse response filters over data-dependent spatial filters (e.g. CSP, xDAWN) to established classifiers (e.g. SVM, LDA). pySPACE incorporates the concept of node and node chains of the MDP framework. Due to its modular architecture, the software can easily be extended with new processing nodes and more general operations. Large scale empirical investigations can be configured using simple text- configuration files in the YAML format, executed on different (distributed) computing modalities, and evaluated using an interactive graphical user interface.

Changes:

improved testing, improved documentation, windows compatibility, more algorithms


Logo JMLR Waffles 2014-07-05

by mgashler - July 20, 2014, 04:53:54 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 28297 views, 8055 downloads, 2 subscriptions

About: Script-friendly command-line tools for machine learning and data mining tasks. (The command-line tools wrap functionality from a public domain C++ class library.)

Changes:

Added support for CUDA GPU-parallelized neural network layers, and several other new features. Full list of changes at http://waffles.sourceforge.net/docs/changelog.html


Logo JMLR MOA Massive Online Analysis Nov-13

by abifet - April 4, 2014, 03:50:20 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 13380 views, 5212 downloads, 1 subscription

About: Massive Online Analysis (MOA) is a real time analytic tool for data streams. It is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Naive Bayes classifiers at the leaves. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and it is released under the GNU GPL license.

Changes:

New version November 2013


Logo SAMOA 0.0.1

by gdfm - April 2, 2014, 17:09:08 CET [ Project Homepage BibTeX Download ] 1094 views, 335 downloads, 1 subscription

About: SAMOA is a platform for mining big data streams. It is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.

Changes:

Initial Announcement on mloss.org.


Logo JMLR MultiBoost 1.2.02

by busarobi - March 31, 2014, 16:13:04 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 28249 views, 4873 downloads, 1 subscription

About: MultiBoost is a multi-purpose boosting package implemented in C++. It is based on the multi-class/multi-task AdaBoost.MH algorithm [Schapire-Singer, 1999]. Basic base learners (stumps, trees, products, Haar filters for image processing) can be easily complemented by new data representations and the corresponding base learners, without interfering with the main boosting engine.

Changes:

Major changes :

  • The “early stopping” feature can now based on any metric output with the --outputinfo command line argument.

  • Early stopping now works with --slowresume command line argument.

Minor fixes:

  • More informative output when testing.

  • Various compilation glitch with recent clang (OsX/Linux).


Logo Chordalysis 1.0

by fpetitjean - March 24, 2014, 01:22:06 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1526 views, 381 downloads, 1 subscription

About: Log-linear analysis for high-dimensional data

Changes:

Initial Announcement on mloss.org.


Logo ELKI 0.6.0

by erich - January 10, 2014, 18:32:28 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 12805 views, 2323 downloads, 3 subscriptions

About: ELKI is a framework for implementing data-mining algorithms with support for index structures, that includes a wide variety of clustering and outlier detection methods.

Changes:

Additions and Improvements from ELKI 0.5.5:

Algorithms

Clustering:

  • Hierarchical Clustering - the slower naive variants were added, and the code was refactored
  • Partition extraction from hierarchical clusterings - different linkage strategies (e.g. Ward)
  • Canopy pre-Clustering
  • Naive Mean-Shift Clustering
  • Affinity propagation clustering (both with distances and similarities / kernel functions)
  • K-means variations: Best-of-multiple-runs, bisecting k-means
  • New k-means initialization: farthest points, sample initialization
  • Cheng and Church Biclustering
  • P3C Subspace Clustering
  • One-dimensional clustering algorithm based on kernel density estimation

Outlier detection

  • COP - correlation outlier probabilities
  • LDF - a kernel density based LOF variant
  • Simplified LOF - a simpler version of LOF (not using reachability distance)
  • Simple Kernel Density LOF - a simple LOF using kernel density (more consistent than LDF)
  • Simple outlier ensemble algorithm
  • PINN - projection indexed nearest neighbors, via projected indexes.
  • ODIN - kNN graph based outlier detection
  • DWOF - Dynamic-Window Outlier Factor (contributed by Omar Yousry)
  • ABOD refactored, into ABOD, FastABOD and LBABOD

Distances

  • Geodetic distances now support different world models (WGS84 etc.) and are subtantially faster.
  • Levenshtein distances for processing strings, e.g. for analyzing phonemes (contributed code, see "Word segmentation through cross-lingual word-to-phoneme alignment", SLT2013, Stahlberg et al.)
  • Bray-Curtis, Clark, Kulczynski1 and Lorentzian distances with R-tree indexing support
  • Histogram matching distances
  • Probabilistic divergence distances (Jeffrey, Jensen-Shannon, Chi2, Kullback-Leibler)
  • Kulczynski2 similarity
  • Kernel similarity code has been refactored, and additional kernel functions have been added

Database Layer and Data Types

Projection layer * Parser for simple textual data (for use with Levenshtein distance) Various random projection families (including Feature Bagging, Achlioptas, and p-stable) Latitude+Longitude to ECEF Sparse vector improvements and bug fixes New filter: remove NaN values and missing values New filter: add histogram-based jitter New filter: normalize using statistical distributions New filter: robust standardization using Median and MAD New filter: Linear discriminant analysis (LDA)

Index Layer

  • Another speed up in R-trees
  • Refactoring of M- and R-trees: Support for different strategies in M-tree New strategies for M-tree splits Speedups in M-tree
  • New index structure: in-memory k-d-tree
  • New index structure: in-memory Locality Sensitive Hashing (LSH)
  • New index structure: approximate projected indexes, such as PINN
  • Index support for geodetic data - (Details: Geodetic Distance Queries on R-Trees for Indexing Geographic Data, SSTD13)
  • Sampled k nearest neighbors: reference KDD13 "Subsampling for Efficient and Effective Unsupervised Outlier Detection Ensembles"
  • Cached (precomputed) k-nearest neighbors to share across multiple runs
  • Benchmarking "algorithms" for indexes

Mathematics and Statistics

  • Many new distributions have been added, now 28 different distributions are supported
  • Additional estimation methods (using advanced statistics such as L-Moments), now 44 estimators are available
  • Trimming and Winsorizing
  • Automatic best-fit distribution estimation
  • Preprocessor using these distributions for rescaling data sets
  • API changes related to the new distributions support
  • More kernel density functions
  • RANSAC covariance matrix builder (unfortunately rather slow)

Visualization

  • 3D projected coordinates (Details: Interactive Data Mining with 3D-Parallel-Coordinate-Trees, SIGMOD2013)
  • Convex hulls now also include nested hierarchical clusters

Other

  • Parser speedups
  • Sparse vector bug fixes and improvements
  • Various bug fixes
  • PCA, MDS and LDA filters
  • Text output was slightly improved (but still needs to be redesigned from scratch - please contribute!)
  • Refactoring of hierarchy classes
  • New heap classes and infrastructure enhancements
  • Classes can have aliases, e.g. "l2" for euclidean distance.
  • Some error messages were made more informative.
  • Benchmarking classes, also for approximate nearest neighbor search.

Logo CIlib Computational Intelligence Library 0.8

by gpampara - August 22, 2013, 08:34:21 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1899 views, 576 downloads, 1 subscription

About: CIlib is a library of computational intelligence algorithms and supporting components that allows simple extension and experimentation. The library is peer reviewed and is backed by a leading research group in the field. The library is under active development.

Changes:

Initial Announcement on mloss.org.


Logo Apache Mahout 0.8

by gsingers - July 27, 2013, 15:52:32 CET [ Project Homepage BibTeX Download ] 17004 views, 4586 downloads, 2 subscriptions

About: Apache Mahout is an Apache Software Foundation project with the goal of creating both a community of users and a scalable, Java-based framework consisting of many machine learning algorithm [...]

Changes:

Apache Mahout 0.8 contains, amongst a variety of performance improvements and bug fixes, an implementation of Streaming K-Means, deeper Lucene/Solr integration and new scalable recommender algorithms. For a full description of the newest release, see http://mahout.apache.org/.


Logo PREA Personalized Recommendation Algorithms Toolkit 1.1

by srcw - September 1, 2012, 22:53:37 CET [ Project Homepage BibTeX Download ] 9229 views, 2346 downloads, 2 subscriptions

About: An open source Java software providing collaborative filtering algorithms.

Changes:

Initial Announcement on mloss.org.


Logo MLWizard 5.2

by remat - July 26, 2012, 15:04:14 CET [ Project Homepage BibTeX Download ] 3446 views, 871 downloads, 1 subscription

About: MLwizard recommends and optimizes classification algorithms based on meta-learning and is a software wizard fully integrated into RapidMiner but can be used as library as well.

Changes:

Faster parameter optimization using genetic algorithm with predefined start population.


Logo WebEnsemble 1.0

by jungc005 - May 8, 2012, 22:24:44 CET [ BibTeX Download ] 1642 views, 581 downloads, 1 subscription

About: Use the power of crowdsourcing to create ensembles.

Changes:

Initial Announcement on mloss.org.


Logo MLFlex 02-21-2012-00-12

by srp33 - April 3, 2012, 16:44:43 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2528 views, 539 downloads, 1 subscription

About: Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classification analyses in a systematic yet flexible manner. ML-Flex was written in Java but is capable of interfacing with third-party packages written in other programming languages. It can handle multiple input-data formats and supports a variety of customizations. MLFlex provides implementations of various validation strategies, which can be executed in parallel across multiple computing cores, processors, and nodes. Additionally, ML-Flex supports aggregating evidence across multiple algorithms and data sets via ensemble learning. (See http://jmlr.csail.mit.edu/papers/volume13/piccolo12a/piccolo12a.pdf.)

Changes:

Initial Announcement on mloss.org.


Showing Items 1-20 of 26 on page 1 of 2: 1 2 Next