Word similarity considering special characteristics

Question

Word similarity considering special characteristics

ahs312

2022年5月5日 08:36

I'm looking for an algorithm that computes the similarity between two strings just like the levenshtein distance. However, I want to consider the following. The levenshtein distance gives me the same for these cases:

distance(apple, appli) #1
distance(apple, appel) #1
distance(apple, applr) #1

However, I want the second and third example to have a smaller distance because of the following reasons:

second example: all the correct letters are used in the second word
third example: r is much likely to be a typo of the letter e because of the keyboard placement.

Are you familiar with any algorithm that weights such characteristics ?

Topic nlp similarity

Category Data Science

Jindřich · Accepted Answer · 2022年5月5日 08:36

There are plenty of versions of edit distance and plenty of possible extensions.

The edit operations can have different weights. By default, all operations have a weight of 1. However, you can easily reassign the weights of substitutions based on their distance on the keyboard and assign a lower weight for substituting neighboring characters. In Python, you can use e.g., weighted-levenshtein package.

There is also an extension of the standard edit distance called Damerau-Levenshtein distance. It allows an additional operation of swapping two characters. If you set a lower cost for swapping characters, permuting characters in the string will be less penalized.

Word similarity considering special characteristics

About