Project details for Hivemall

Logo Hivemall 0.3

by myui - March 13, 2015, 17:08:22 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions


Hivemall provides machine learning functionality as well as feature engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed to be scalable to the number of training instances as well as the number of training features.

Though we consider that Hivemall is much easier to use and more scalable than Mahout for classification/regression tasks, please check it by yourself. If you have a Hive environment, you can evaluate Hivemall within 5 minutes or so.

Hivemall is very easy to use as every machine learning step is done within HiveQL.

-- Installation is just as follows:
add jar /tmp/hivemall.jar; source /tmp/define-all.hive;

-- Logistic regression is performed by a query.
SELECT feature, avg(weight) as weight FROM ( SELECT logress(features,label) as (feature,weight) FROM training_features ) t GROUP BY feature;

You can find detailed examples on our wiki pages.

Changes to previous version:
  • Supported Matrix Factorization
  • Added a support for TF-IDF computation
  • Supported AdaGrad/AdaDelta
  • Supported AdaGradRDA classification
  • Added normalization scheme
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Linux
Data Formats: Csv, Any, Tsv
Tags: Classification, Regression, Online Learning, Matrix Factorization, Logistic Regression, Multiclass Classification, Recommendation, Hadoop, Hive, Passive Aggressive, Confidence Weighted, Adagrad
Archive: download here


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.