Project details for Sally

Screenshot JMLR Sally 1.0.0

by konrad - March 26, 2015, 17:01:35 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions

Description:

Sally is a small tool for mapping a set of strings to a set of vectors. This mapping is referred to as embedding and allows for applying techniques of machine learning and data mining for analysis of string data. Sally can applied to several types of string data, such as text documents, DNA sequences or log files, where it can handle common formats such as directories, archives and text files of string data.

Sally implements a standard technique for mapping strings to a vector space that is often referred to as vector space model or bag-of-words model. The strings are characterized by a set of features, where each feature is associated with one dimension of the vector space. The following types of features are supported by Sally: bytes, tokens, n-grams of bytes and n-grams of tokens.

Sally proceeds by counting the occurrences of the specified features in each string and generating a sparse vector of count values. Alternatively, binary or TF-IDF values can be computed and stored in the vectors. Sally then normalizes the vector, for example using the L1 or L2 norm, and outputs it in a specified format, such as plain text or in LibSVM or Matlab format.

Changes to previous version:

Support for explicit selection of granularity added. Several minor bug fixes. We have reached 1.0

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Linux, Unix, Posix, Mac Os X
Data Formats: Svmlight, Binary, Matlab, Fasta, Txt, Cluto
Tags: Sequence Analysis, Feature Extraction
Archive: download here

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.