mloss.org SketchSorthttp://mloss.orgUpdates and additions to SketchSortenMon, 11 Oct 2010 18:33:21 -0000SketchSort 0.0.6http://mloss.org/software/view/277/<html><p>SketchSort is a software for all pairs similarity search. It takes as an input data points and outputs approximate neighbor pairs within a distance. First, the input data points are mapped to binary bit strings (sketches) by locality sensitive hashing, and then neighbor pairs of strings within a Hamming distance are enumerated by the multiple sorting method. Finally, the cosine distances for such neighbor pairs are calculated. If the cosine distance for a neighbor pair is no more than a user-specified threshold , the neighbor pair is output. One might worry about missed nearest neighbor pairs by our method. A theoretical bound of the expectation of missing edge ratio is derived. It enables us to set parameters so as to limit the empirical missing edge ratio as small as possible.
</p></html>Yasuo TabeiMon, 11 Oct 2010 18:33:21 -0000http://mloss.org/software/rss/comments/277http://mloss.org/software/view/277/allpairssimilaritysearchmultiplesortingnearduplicationdetectionneighborsearchsketchsort