Fast and Easy Levenshtein distance using a Trie (2011)

(stevehanov.ca)

44 points | by sebg 3 days ago

6 comments

  • localhoster 14 minutes ago
    This article surface every once in a while, and I love it. What the author suggests is very clever. I have implemented an extended version of that in Go as an experiment. Instead of using a trie, I used a radix tree. Functions the same, but it's much more compressed (and faster).
  • kelseydh 51 minutes ago
    I needed a fuzzy string matching algorithm for finding best name matches among a candidate list. Considered Normalized Levenshtein Distance but ended up using Jaro-Winkler. I'm curious if anybody has good resources on when to use each fuzzy string matching algorithm and when.
  • dvh 1 hour ago
    I made myself plugin that shows new news in wikipedia's current event page and I was using levenshtein originally (they often edit couple of words in article over span of few days so I compare each new article with previous ones for similarity) but after few days it became too slow (~20s) because O(m*n), so I switched to sorensen-dice instead which is O(m+n) and it's much faster and works very similar way, even tho they do slightly different thing.
  • gregman1 24 minutes ago
    2011
  • fergie 1 hour ago
    Very cool and satisfying.