-
- Description:
Loom is a streaming inference and query engine for the Cross-Categorization model.
Data Types
Loom learns models of sparse heterogeneous tabular data, with hundreds of features and millions of rows. Loom currently supports the following feature types and models:
- boolean fields as Beta-Bernoulli
- categorical fields with up to 256 values as Dirichlet-Discrete
- unbounded categorical fields as Dirichlet-Process-Discrete
- count fields as Gamma-Poisson
- real fields as Normal-Inverse-Chi-Squared-Normal
- sparse real fields as mixture of degenerate and dense real
- text and keyword fields as booleans for word absence/presence
- date fields as a combination of absolute, relative, and cyclic parts
- optional fields as a boolean plus one of the above feature models
Data Scale
Loom targets tabular datasets of sizes 100-1000 columns x 10^3-10^9 rows. To handle large datasets, loom implements subsample annealing with an accelerating annealing schedule and adaptively turns off ineffective inference strategies. Loom's annealing schedule is tuned to learn 10^8 cell datasets in under an hour and 10^10 cell datasets in under a day (depending on feature type and sparsity).
- Changes to previous version:
Initial Announcement on mloss.org.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Ubuntu
- Data Formats: Csv
- Tags: Mcmc, Bayesian, Nonparametric
- Archive: download here
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.