Fast and Easy Levenshtein distance using a Trie (2011)

(stevehanov.ca)

44 points | by sebg 3 days ago

6 comments

localhoster 14 minutes ago
This article surface every once in a while, and I love it. What the author suggests is very clever. I have implemented an extended version of that in Go as an experiment. Instead of using a trie, I used a radix tree. Functions the same, but it's much more compressed (and faster).
kelseydh 51 minutes ago
I needed a fuzzy string matching algorithm for finding best name matches among a candidate list. Considered Normalized Levenshtein Distance but ended up using Jaro-Winkler. I'm curious if anybody has good resources on when to use each fuzzy string matching algorithm and when.
dvh 1 hour ago
I made myself plugin that shows new news in wikipedia's current event page and I was using levenshtein originally (they often edit couple of words in article over span of few days so I compare each new article with previous ones for similarity) but after few days it became too slow (~20s) because O(m*n), so I switched to sorensen-dice instead which is O(m+n) and it's much faster and works very similar way, even tho they do slightly different thing.
gregman1 24 minutes ago
2011
[-]
- saagarjha 23 minutes ago
  Added
fergie 1 hour ago
Very cool and satisfying.