Class Levenshtein

java.lang.Object
org.basex.util.similarity.Levenshtein

public final class Levenshtein extends Object

Damerau-Levenshtein algorithm. Based on the publications from Levenshtein (1965): "Binary codes capable of correcting spurious insertions and deletions of ones", and Damerau (1964): "A technique for computer detection and correction of spelling errors.".

Author:
BaseX Team 2005-21, BSD License, Christian Gruen
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor.
    Levenshtein(int error)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    static double
    distance(int[] cps1, int[] cps2)
    Computes the full Damerau-Levenshtein distance for two codepoint arrays and returns a double value (0.0 - 1.0), which represents the distance.
    boolean
    similar(byte[] token, byte[] compare)
    Compares two tokens for similarity.
    boolean
    similar(byte[] token, byte[] compare, int err)
    Compares two tokens for similarity.
    static Object
    similar(byte[] token, Object[] objects)
    Returns the most similar entry.
    static Object
    similar(byte[] token, Object[] objects, Function<Object,Object> prepare)
    Returns the most similar entry.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Levenshtein

      public Levenshtein()
      Constructor.
    • Levenshtein

      public Levenshtein(int error)
      Constructor.
      Parameters:
      error - allowed errors
  • Method Details

    • similar

      public static Object similar(byte[] token, Object[] objects)
      Returns the most similar entry.
      Parameters:
      token - token to be compared
      objects - objects to be compared
      Returns:
      most similar token or null
    • similar

      public static Object similar(byte[] token, Object[] objects, Function<Object,Object> prepare)
      Returns the most similar entry.
      Parameters:
      token - token to be compared
      objects - objects to be compared
      prepare - function for preparing the object to be compared for the comparison
      Returns:
      most similar token or null
    • similar

      public boolean similar(byte[] token, byte[] compare)
      Compares two tokens for similarity.
      Parameters:
      token - token to be compared
      compare - second token to be compared
      Returns:
      true if the arrays are similar
    • similar

      public boolean similar(byte[] token, byte[] compare, int err)
      Compares two tokens for similarity.
      Parameters:
      token - token to be compared
      compare - second token to be compared
      err - number of allowed errors; dynamic calculation if value is 0
      Returns:
      true if the arrays are similar
    • distance

      public static double distance(int[] cps1, int[] cps2)

      Computes the full Damerau-Levenshtein distance for two codepoint arrays and returns a double value (0.0 - 1.0), which represents the distance. The value is computed as follows:

        1.0 - distance / max(length of strings)

      1.0 is returned if the strings are equal; 0.0 is returned if all strings are completely different.

      Parameters:
      cps1 - first array
      cps2 - second array
      Returns:
      distance (0.0 - 1.0)