Project details for SketchSort

Logo SketchSort 0.0.6

by ytabei - October 11, 2010, 18:33:21 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (4 today), download ( 1 today ), 0 subscriptions

Description:

SketchSort is a software for all pairs similarity search. It takes as an input data points and outputs approximate neighbor pairs within a distance. First, the input data points are mapped to binary bit strings (sketches) by locality sensitive hashing, and then neighbor pairs of strings within a Hamming distance are enumerated by the multiple sorting method. Finally, the cosine distances for such neighbor pairs are calculated. If the cosine distance for a neighbor pair is no more than a user-specified threshold , the neighbor pair is output. One might worry about missed nearest neighbor pairs by our method. A theoretical bound of the expectation of missing edge ratio is derived. It enables us to set parameters so as to limit the empirical missing edge ratio as small as possible.

Changes to previous version:

Initial Announcement on mloss.org.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Linux
Data Formats: Vector Separated By Space
Tags: Allpairssimilaritysearch, Multiplesorting, Nearduplicationdetection, Neighborsearch, Sketchsort
Archive: download here

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.