SketchSort is a software for all pairs similarity search. It takes as an input data points and outputs approximate neighbor pairs within a distance. First, the input data points are mapped to binary bit strings (sketches) by locality sensitive hashing, and then neighbor pairs of strings within a Hamming distance are enumerated by the multiple sorting method. Finally, the cosine distances for such neighbor pairs are calculated. If the cosine distance for a neighbor pair is no more than a user-specified threshold , the neighbor pair is output. One might worry about missed nearest neighbor pairs by our method. A theoretical bound of the expectation of missing edge ratio is derived. It enables us to set parameters so as to limit the empirical missing edge ratio as small as possible.
- Changes to previous version:
Initial Announcement on mloss.org.
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.