Class FTLexer

All Implemented Interfaces:
Iterator<FTSpan>, IndexSearch

public final class FTLexer extends FTIterator implements IndexSearch
Performs full-text lexing on token. Calls tokenizers, stemmers matching to full-text options to achieve this.
Author:
BaseX Team 2005-21, BSD License, Jens Erat
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor, using the default full-text options.
    FTLexer(FTOpt ftOpt)
    Default constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    all()
    If called, all tokens will be returned (including non-fulltext tokens).
    copy(FTOpt opt)
    Returns a new lexer, adopting the tokenizer options.
    int
    Returns total number of tokens.
    int
    errors(byte[] token)
    Returns the Levenshtein error for the specified token.
    errors(int err)
    Sets the Levenshtein error if it hasn't been assigned yet.
    Returns the full-text options.
    boolean
     
    int[][]
    Gets full-text info for the specified token.
    Initializes the iterator.
    init(byte[] txt)
    Initializes the iterator.
    static StringList
    Lists all languages for which tokenizers and stemmers are available.
     
    byte[]
    Returns the next token.
    If called, the original tokens will be returned (including non-fulltext tokens).
    boolean
    Returns if the current token starts a new paragraph.
    int
    pos(int word, FTUnit unit)
    Calculates a position value, dependent on the specified unit.
    byte[]
    Returns the original token.
    Returns the index type.

    Methods inherited from class org.basex.util.ft.FTIterator

    remove

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface java.util.Iterator

    forEachRemaining
  • Constructor Details

    • FTLexer

      public FTLexer()
      Constructor, using the default full-text options. Called by the serializer, FTFilter, and the map visualizations.
    • FTLexer

      public FTLexer(FTOpt ftOpt)
      Default constructor.
      Parameters:
      ftOpt - full-text options (can be null)
  • Method Details

    • original

      public FTLexer original()
      If called, the original tokens will be returned (including non-fulltext tokens).
      Returns:
      self reference
    • all

      public FTLexer all()
      If called, all tokens will be returned (including non-fulltext tokens).
      Returns:
      self reference
    • init

      public FTLexer init()
      Initializes the iterator.
      Returns:
      self reference
    • errors

      public FTLexer errors(int err)
      Sets the Levenshtein error if it hasn't been assigned yet.
      Parameters:
      err - error
      Returns:
      self reference
    • errors

      public int errors(byte[] token)
      Returns the Levenshtein error for the specified token.
      Parameters:
      token - token
      Returns:
      error
    • init

      public FTLexer init(byte[] txt)
      Description copied from class: FTIterator
      Initializes the iterator.
      Specified by:
      init in class FTIterator
      Parameters:
      txt - text
      Returns:
      self reference
    • hasNext

      public boolean hasNext()
      Specified by:
      hasNext in interface Iterator<FTSpan>
    • next

      public FTSpan next()
      Specified by:
      next in interface Iterator<FTSpan>
    • nextToken

      public byte[] nextToken()
      Description copied from class: FTIterator
      Returns the next token. May be called as an alternative to Iterator.next() to avoid the creation of new FTSpan instances.
      Specified by:
      nextToken in class FTIterator
      Returns:
      token
    • count

      public int count()
      Returns total number of tokens.
      Returns:
      token count
    • type

      public IndexType type()
      Description copied from interface: IndexSearch
      Returns the index type.
      Specified by:
      type in interface IndexSearch
      Returns:
      type
    • token

      public byte[] token()
      Returns the original token. Inherited from IndexSearch; use next() or nextToken() if not using this interface.
      Specified by:
      token in interface IndexSearch
      Returns:
      current token
    • ftOpt

      public FTOpt ftOpt()
      Returns the full-text options.
      Returns:
      full-text options (may be null)
    • paragraph

      public boolean paragraph()
      Returns if the current token starts a new paragraph. Needed for visualizations. Does not have to be implemented by all tokenizers. Returns false if not implemented.
      Returns:
      boolean
    • pos

      public int pos(int word, FTUnit unit)
      Calculates a position value, dependent on the specified unit. Does not have to be implemented by all tokenizers.
      Parameters:
      word - word position
      unit - unit
      Returns:
      new position (0 if not implemented)
    • info

      public int[][] info()
      Gets full-text info for the specified token. Needed for visualizations; see Tokenizer.info() for more info.
      Returns:
      int arrays or empty array if not implemented
    • copy

      public FTLexer copy(FTOpt opt)
      Returns a new lexer, adopting the tokenizer options.
      Parameters:
      opt - full-text options
      Returns:
      lexer
    • languages

      public static StringList languages()
      Lists all languages for which tokenizers and stemmers are available.
      Returns:
      supported languages