Package org.basex.util.ft
Class FTLexer
java.lang.Object
org.basex.util.ft.FTIterator
org.basex.util.ft.FTLexer
- All Implemented Interfaces:
Iterator<FTSpan>,IndexSearch
Performs full-text lexing on token. Calls tokenizers, stemmers matching to full-text options
to achieve this.
- Author:
- BaseX Team 2005-21, BSD License, Jens Erat
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionall()If called, all tokens will be returned (including non-fulltext tokens).Returns a new lexer, adopting the tokenizer options.intcount()Returns total number of tokens.interrors(byte[] token) Returns the Levenshtein error for the specified token.errors(int err) Sets the Levenshtein error if it hasn't been assigned yet.ftOpt()Returns the full-text options.booleanhasNext()int[][]info()Gets full-text info for the specified token.init()Initializes the iterator.init(byte[] txt) Initializes the iterator.static StringListLists all languages for which tokenizers and stemmers are available.next()byte[]Returns the next token.original()If called, the original tokens will be returned (including non-fulltext tokens).booleanReturns if the current token starts a new paragraph.intCalculates a position value, dependent on the specified unit.byte[]token()Returns the original token.type()Returns the index type.Methods inherited from class org.basex.util.ft.FTIterator
removeMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.util.Iterator
forEachRemaining
-
Constructor Details
-
FTLexer
public FTLexer()Constructor, using the default full-text options. Called by the serializer,FTFilter, and the map visualizations. -
FTLexer
Default constructor.- Parameters:
ftOpt- full-text options (can benull)
-
-
Method Details
-
original
If called, the original tokens will be returned (including non-fulltext tokens).- Returns:
- self reference
-
all
If called, all tokens will be returned (including non-fulltext tokens).- Returns:
- self reference
-
init
Initializes the iterator.- Returns:
- self reference
-
errors
Sets the Levenshtein error if it hasn't been assigned yet.- Parameters:
err- error- Returns:
- self reference
-
errors
public int errors(byte[] token) Returns the Levenshtein error for the specified token.- Parameters:
token- token- Returns:
- error
-
init
Description copied from class:FTIteratorInitializes the iterator.- Specified by:
initin classFTIterator- Parameters:
txt- text- Returns:
- self reference
-
hasNext
public boolean hasNext() -
next
-
nextToken
public byte[] nextToken()Description copied from class:FTIteratorReturns the next token. May be called as an alternative toIterator.next()to avoid the creation of newFTSpaninstances.- Specified by:
nextTokenin classFTIterator- Returns:
- token
-
count
public int count()Returns total number of tokens.- Returns:
- token count
-
type
Description copied from interface:IndexSearchReturns the index type.- Specified by:
typein interfaceIndexSearch- Returns:
- type
-
token
public byte[] token()Returns the original token. Inherited fromIndexSearch; usenext()ornextToken()if not using this interface.- Specified by:
tokenin interfaceIndexSearch- Returns:
- current token
-
ftOpt
Returns the full-text options.- Returns:
- full-text options (may be
null)
-
paragraph
public boolean paragraph()Returns if the current token starts a new paragraph. Needed for visualizations. Does not have to be implemented by all tokenizers. Returns false if not implemented.- Returns:
- boolean
-
pos
Calculates a position value, dependent on the specified unit. Does not have to be implemented by all tokenizers.- Parameters:
word- word positionunit- unit- Returns:
- new position (
0if not implemented)
-
info
public int[][] info()Gets full-text info for the specified token. Needed for visualizations; seeTokenizer.info()for more info.- Returns:
- int arrays or empty array if not implemented
-
copy
Returns a new lexer, adopting the tokenizer options.- Parameters:
opt- full-text options- Returns:
- lexer
-
languages
Lists all languages for which tokenizers and stemmers are available.- Returns:
- supported languages
-