Package org.basex.util.similarity
Class Levenshtein
java.lang.Object
org.basex.util.similarity.Levenshtein
Damerau-Levenshtein algorithm. Based on the publications from Levenshtein (1965): "Binary codes capable of correcting spurious insertions and deletions of ones", and Damerau (1964): "A technique for computer detection and correction of spelling errors.".
- Author:
- BaseX Team 2005-21, BSD License, Christian Gruen
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic doubledistance(int[] cps1, int[] cps2) Computes the full Damerau-Levenshtein distance for two codepoint arrays and returns a double value (0.0 - 1.0), which represents the distance.booleansimilar(byte[] token, byte[] compare) Compares two tokens for similarity.booleansimilar(byte[] token, byte[] compare, int err) Compares two tokens for similarity.static ObjectReturns the most similar entry.static ObjectReturns the most similar entry.
-
Constructor Details
-
Levenshtein
public Levenshtein()Constructor. -
Levenshtein
public Levenshtein(int error) Constructor.- Parameters:
error- allowed errors
-
-
Method Details
-
similar
Returns the most similar entry.- Parameters:
token- token to be comparedobjects- objects to be compared- Returns:
- most similar token or
null
-
similar
Returns the most similar entry.- Parameters:
token- token to be comparedobjects- objects to be comparedprepare- function for preparing the object to be compared for the comparison- Returns:
- most similar token or
null
-
similar
public boolean similar(byte[] token, byte[] compare) Compares two tokens for similarity.- Parameters:
token- token to be comparedcompare- second token to be compared- Returns:
- true if the arrays are similar
-
similar
public boolean similar(byte[] token, byte[] compare, int err) Compares two tokens for similarity.- Parameters:
token- token to be comparedcompare- second token to be comparederr- number of allowed errors; dynamic calculation if value is 0- Returns:
- true if the arrays are similar
-
distance
public static double distance(int[] cps1, int[] cps2) Computes the full Damerau-Levenshtein distance for two codepoint arrays and returns a double value (0.0 - 1.0), which represents the distance. The value is computed as follows:
1.0 - distance / max(length of strings)
1.0 is returned if the strings are equal; 0.0 is returned if all strings are completely different.
- Parameters:
cps1- first arraycps2- second array- Returns:
- distance (0.0 - 1.0)
-