Class Token

java.lang.Object
org.basex.util.Token

public final class Token extends Object

This class provides convenience operations for handling 'Tokens'. A token is a UTF-8 encoded string. It is represented as a byte array.

In order to ensure a consistent representation of tokens in the project, all string conversions should be done via the methods of this class.

Author:
BaseX Team 2005-21, BSD License, Christian Gruen
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final byte[]
    Colon.
    static final Comparator<byte[]>
    Comparator for byte arrays.
    static final DecimalFormat
    Decimal double output.
    static final DecimalFormat
    Decimal float output.
    static final byte[]
    Dollar.
    static final byte[]
    Empty token.
    static final byte[]
    Token 'false'.
    static final byte[]
    Hex codes.
    static final byte[]
    ID token.
    static final byte[]
    Token 'INF'.
    static final byte[]
    Token 'Infinity'.
    static final Comparator<byte[]>
    Case-insensitive comparator for byte arrays.
    US charset.
    static final byte[]
    Minimum integer.
    static final byte[]
    Minimum long value.
    static final byte[]
    Token 'NaN'.
    static final byte[]
    Token '-Infinity'.
    static final byte[]
    Number '-0'.
    static final byte[]
    Token '-INF'.
    static final byte[]
    Number '1'.
    static final byte[]
    IDRef token.
    static final char
    Unicode replacement codepoint (\\uFFFD).
    static final DecimalFormat
    Scientific double output.
    static final DecimalFormat
    Scientific float output.
    static final byte[]
    Slash.
    static final byte[]
    Space.
    static final byte[]
    Token 'true'.
    static final byte[]
    XML token.
    static final byte[]
    XML token with colon.
    static final byte[]
    XMLNS token.
    static final byte[]
    XMLNS token with colon.
    static final byte[]
    Number '0'.
  • Method Summary

    Modifier and Type
    Method
    Description
    static boolean
    ascii(byte[] token)
    Checks if the specified token only consists of ASCII characters.
    static byte[]
    chop(byte[] token, int max)
    Chops a token to the specified length.
    static byte[]
    chopNumber(byte[] token)
    Finishes the numeric token, removing trailing zeroes.
    static int
    cl(byte cp)
    Returns the length of the specified UTF8 byte.
    static int
    cl(byte[] token, int pos)
    Returns the byte length of a codepoint at the specified position.
    static byte[]
    concat(byte[]... tokens)
    Concatenates multiple tokens.
    static byte[]
    concat(Object... objects)
    Concatenates multiple objects.
    static boolean
    contains(byte[] token, byte[] sub)
    Checks if the first token contains the second token.
    static boolean
    contains(byte[] token, byte[] sub, int pos)
    Checks if the first token contains the second token.
    static boolean
    contains(byte[] token, int ch)
    Checks if the first token contains the specified character.
    static int
    cp(byte[] token, int pos)
    Returns the codepoint (unicode value) of the specified token, starting at the specified position.
    static int
    cpLength(int cp)
    Returns the byte length of a codepoint.
    static int[]
    cps(byte[] token)
    Converts a token to a sequence of codepoints.
    static byte[]
    cpToken(int cp)
    Converts a codepoint to a token.
    static int
    dec(int ch)
    Converts a hex character to an integer value.
    static int
    dec(int ch1, int ch2)
    Converts hex characters to an integer value.
    static byte[]
    decodeUri(byte[] token)
    Returns a URI decoded token.
    static byte[]
    delete(byte[] token, int ch)
    Deletes a character from a token.
    static int
    diff(byte[] token, byte[] compare)
    Compares two tokens lexicographically.
    static boolean
    digit(int ch)
    Checks if the specified character is a digit (0 - 9).
    static byte[][]
    distinctTokens(byte[] token)
    Normalizes the specified input and returns its distinct tokens.
    static byte[]
    encodeUri(byte[] token, boolean iri)
    Returns a URI encoded token.
    static boolean
    endsWith(byte[] token, byte[] sub)
    Checks if the first token ends with the second token.
    static boolean
    endsWith(byte[] token, int ch)
    Checks if the first token starts with the specified character.
    static boolean
    eq(byte[] token1, byte[] token2)
    Compares two tokens for equality.
    static boolean
    eq(byte[] token, byte[]... tokens)
    Compares several tokens for equality.
    static byte[]
    escape(byte[] token)
    Escapes the specified token.
    static int
    hash(byte[] token)
    Calculates a hash code for the specified token.
    static byte[]
    hex(byte[] value, boolean uc)
    Returns a hex representation of the specified byte array.
    static int
    indexOf(byte[] token, byte[] sub)
    Returns the position of the specified token or -1.
    static int
    indexOf(byte[] token, byte[] sub, int pos)
    Returns the position of the specified token or -1.
    static int
    indexOf(byte[] token, int ch)
    Returns the position of the specified character or -1.
    static int
    lastIndexOf(byte[] token, int ch)
    Returns the last position of the specified character or -1.
    static byte[]
    lc(byte[] token)
    Converts the specified token to lower case.
    static int
    lc(int ch)
    Converts a character to lower case.
    static int
    length(byte[] token)
    Returns the number of codepoints in the token.
    static boolean
    letter(int ch)
    Checks if the specified character is a computer letter (A - Z, a - z, _).
    static boolean
    letterOrDigit(int ch)
    Checks if the specified character is a computer letter or digit.
    static byte[]
    local(byte[] name)
    Returns the local name of the specified name.
    static byte[]
    max(byte[] token, byte[] compare)
    Returns the bigger token.
    static byte[]
    min(byte[] token, byte[] compare)
    Returns the smaller token.
    static byte[]
    normalize(byte[] token)
    Normalizes all whitespace occurrences from the specified token.
    static int
    numDigits(int integer)
    Checks number of digits of the specified integer.
    static byte[]
    prefix(byte[] name)
    Returns the prefix of the specified token.
    static byte[]
    replace(byte[] token, int search, int replace)
    Replaces the specified character and returns the result token.
    static byte[][]
    split(byte[] token, int sep)
    Splits a token around matches of the given separator.
    static boolean
    startsWith(byte[] token, byte[] sub)
    Checks if the first token starts with the second token.
    static boolean
    startsWith(byte[] token, byte[] sub, int pos)
    Checks if the first token starts with the second token.
    static boolean
    startsWith(byte[] token, int ch)
    Checks if the first token starts with the specified character.
    static String
    string(byte[] token)
    Returns the specified token as string.
    static String
    string(byte[] token, int start, int length)
    Converts the specified token to a string.
    static byte[]
    substring(byte[] token, int start)
    Returns a substring of the specified token.
    static byte[]
    substring(byte[] token, int start, int end)
    Returns a substring of the specified token.
    static byte[]
    subtoken(byte[] token, int start)
    Returns a partial token.
    static byte[]
    subtoken(byte[] token, int start, int end)
    Returns a partial token.
    static byte[]
    tc(byte[] token)
    Converts the specified token to title case.
    static double
    toDouble(byte[] token)
    Converts the specified token into a double value.
    static int
    toInt(byte[] token)
    Converts the specified token into an integer value.
    static byte[]
    token(boolean bool)
    Creates a byte array representation of the specified boolean value.
    static byte[]
    token(double dbl)
    Creates a byte array representation from the specified double value.
    static byte[]
    token(float flt)
    Creates a byte array representation from the specified float value.
    static byte[]
    token(int integer)
    Creates a byte array representation of the specified integer value.
    static byte[]
    token(long integer)
    Creates a byte array representation from the specified long value, using Java's standard method.
    static byte[]
    token(Object object)
    Returns a token representation of the specified object.
    static byte[]
    token(String string)
    Converts a string to a byte array.
    static byte[][]
    tokens(String... strings)
    Converts the specified strings to tokens.
    static long
    toLong(byte[] token)
    Converts the specified token into an long value.
    static long
    toLong(byte[] token, int start, int end)
    Converts the specified token into an long value.
    static byte[]
    trim(byte[] token)
    Removes leading and trailing whitespaces from the specified token.
    static byte[]
    uc(byte[] token)
    Converts the specified token to upper case.
    static int
    uc(int ch)
    Converts a character to upper case.
    static byte[]
    utf8(byte[] token, String encoding)
    Converts a token from the input encoding to UTF8.
    static boolean
    ws(byte[] token)
    Checks if the specified token has only whitespaces.
    static boolean
    ws(int ch)
    Checks if the specified character is a whitespace.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • EMPTY

      public static final byte[] EMPTY
      Empty token.
    • XML

      public static final byte[] XML
      XML token.
    • XML_COLON

      public static final byte[] XML_COLON
      XML token with colon.
    • XMLNS

      public static final byte[] XMLNS
      XMLNS token.
    • XMLNS_COLON

      public static final byte[] XMLNS_COLON
      XMLNS token with colon.
    • ID

      public static final byte[] ID
      ID token.
    • REF

      public static final byte[] REF
      IDRef token.
    • TRUE

      public static final byte[] TRUE
      Token 'true'.
    • FALSE

      public static final byte[] FALSE
      Token 'false'.
    • NAN

      public static final byte[] NAN
      Token 'NaN'.
    • INF

      public static final byte[] INF
      Token 'INF'.
    • NEGATVE_INF

      public static final byte[] NEGATVE_INF
      Token '-INF'.
    • INFINITY

      public static final byte[] INFINITY
      Token 'Infinity'.
    • NEGATIVE_INFINITY

      public static final byte[] NEGATIVE_INFINITY
      Token '-Infinity'.
    • MIN_LONG

      public static final byte[] MIN_LONG
      Minimum long value.
    • MIN_INT

      public static final byte[] MIN_INT
      Minimum integer.
    • SPACE

      public static final byte[] SPACE
      Space.
    • ZERO

      public static final byte[] ZERO
      Number '0'.
    • NEGATIVE_ZERO

      public static final byte[] NEGATIVE_ZERO
      Number '-0'.
    • ONE

      public static final byte[] ONE
      Number '1'.
    • SLASH

      public static final byte[] SLASH
      Slash.
    • COLON

      public static final byte[] COLON
      Colon.
    • DOLLAR

      public static final byte[] DOLLAR
      Dollar.
    • COMPARATOR

      public static final Comparator<byte[]> COMPARATOR
      Comparator for byte arrays.
    • LC_COMPARATOR

      public static final Comparator<byte[]> LC_COMPARATOR
      Case-insensitive comparator for byte arrays.
    • REPLACEMENT

      public static final char REPLACEMENT
      Unicode replacement codepoint (\\uFFFD).
      See Also:
    • LOC

      public static final DecimalFormatSymbols LOC
      US charset.
    • SD

      public static final DecimalFormat SD
      Scientific double output.
    • DD

      public static final DecimalFormat DD
      Decimal double output.
    • SF

      public static final DecimalFormat SF
      Scientific float output.
    • DF

      public static final DecimalFormat DF
      Decimal float output.
    • HEX_TABLE

      public static final byte[] HEX_TABLE
      Hex codes.
  • Method Details

    • string

      public static String string(byte[] token)
      Returns the specified token as string.
      Parameters:
      token - token
      Returns:
      string
    • string

      public static String string(byte[] token, int start, int length)
      Converts the specified token to a string.
      Parameters:
      token - token
      start - start position
      length - length
      Returns:
      string
    • ascii

      public static boolean ascii(byte[] token)
      Checks if the specified token only consists of ASCII characters.
      Parameters:
      token - token
      Returns:
      result of check
    • token

      public static byte[] token(String string)
      Converts a string to a byte array. All strings should be converted by this function to guarantee a consistent character conversion.
      Parameters:
      string - string to be converted
      Returns:
      byte array
    • tokens

      public static byte[][] tokens(String... strings)
      Converts the specified strings to tokens.
      Parameters:
      strings - strings
      Returns:
      tokens
    • utf8

      public static byte[] utf8(byte[] token, String encoding)
      Converts a token from the input encoding to UTF8.
      Parameters:
      token - token to be converted
      encoding - input encoding
      Returns:
      byte array
    • token

      public static byte[] token(Object object)
      Returns a token representation of the specified object.
      Parameters:
      object - object
      Returns:
      token
    • cp

      public static int cp(byte[] token, int pos)
      Returns the codepoint (unicode value) of the specified token, starting at the specified position. Returns a unicode replacement character for invalid values.
      Parameters:
      token - token
      pos - character position
      Returns:
      current character
    • cl

      public static int cl(byte cp)
      Returns the length of the specified UTF8 byte.
      Parameters:
      cp - codepoint
      Returns:
      character length
    • cl

      public static int cl(byte[] token, int pos)
      Returns the byte length of a codepoint at the specified position.
      Parameters:
      token - token
      pos - position
      Returns:
      character length
    • cps

      public static int[] cps(byte[] token)
      Converts a token to a sequence of codepoints.
      Parameters:
      token - token
      Returns:
      codepoints
    • cpToken

      public static byte[] cpToken(int cp)
      Converts a codepoint to a token.
      Parameters:
      cp - codepoint of the character
      Returns:
      token
    • cpLength

      public static int cpLength(int cp)
      Returns the byte length of a codepoint.
      Parameters:
      cp - codepoint of the character
      Returns:
      length
    • length

      public static int length(byte[] token)
      Returns the number of codepoints in the token.
      Parameters:
      token - token
      Returns:
      number of codepoints
    • token

      public static byte[] token(boolean bool)
      Creates a byte array representation of the specified boolean value.
      Parameters:
      bool - boolean value to be converted
      Returns:
      boolean value in byte array
    • token

      public static byte[] token(int integer)
      Creates a byte array representation of the specified integer value.
      Parameters:
      integer - int value to be converted
      Returns:
      integer value in byte array
    • numDigits

      public static int numDigits(int integer)
      Checks number of digits of the specified integer.
      Parameters:
      integer - number to be checked
      Returns:
      number of digits
    • token

      public static byte[] token(long integer)
      Creates a byte array representation from the specified long value, using Java's standard method.
      Parameters:
      integer - value to be converted
      Returns:
      byte array
    • token

      public static byte[] token(double dbl)
      Creates a byte array representation from the specified double value.
      Parameters:
      dbl - double value to be converted
      Returns:
      byte array
    • token

      public static byte[] token(float flt)
      Creates a byte array representation from the specified float value.
      Parameters:
      flt - float value to be converted
      Returns:
      byte array
    • chopNumber

      public static byte[] chopNumber(byte[] token)
      Finishes the numeric token, removing trailing zeroes.
      Parameters:
      token - token to be modified
      Returns:
      token
    • toDouble

      public static double toDouble(byte[] token)
      Converts the specified token into a double value.
      Parameters:
      token - token to be converted
      Returns:
      resulting double value, or Double.NaN is returned if the input is invalid
    • toLong

      public static long toLong(byte[] token)
      Converts the specified token into an long value. Long.MIN_VALUE is returned if the input is invalid. Note that this may also be the actual value (MIN_LONG).
      Parameters:
      token - token to be converted
      Returns:
      resulting long value
    • toLong

      public static long toLong(byte[] token, int start, int end)
      Converts the specified token into an long value. Long.MIN_VALUE is returned if the input is invalid. Note that this may also be the actual value (MIN_LONG).
      Parameters:
      token - token to be converted
      start - first byte to be parsed
      end - last byte to be parsed - exclusive
      Returns:
      resulting long value
    • toInt

      public static int toInt(byte[] token)
      Converts the specified token into an integer value. Integer.MIN_VALUE is returned if the input is invalid.
      Parameters:
      token - token to be converted
      Returns:
      resulting integer value
    • hash

      public static int hash(byte[] token)
      Calculates a hash code for the specified token.
      Parameters:
      token - specified token
      Returns:
      hash code
    • eq

      public static boolean eq(byte[] token1, byte[] token2)
      Compares two tokens for equality.
      Parameters:
      token1 - first token (can be null)
      token2 - token to be compared (can be null)
      Returns:
      true if the tokens are equal
    • eq

      public static boolean eq(byte[] token, byte[]... tokens)
      Compares several tokens for equality.
      Parameters:
      token - token (can be null)
      tokens - tokens to be compared (single tokens can be null)
      Returns:
      true if one test is successful
    • diff

      public static int diff(byte[] token, byte[] compare)
      Compares two tokens lexicographically.
      Parameters:
      token - first token
      compare - token to be compared
      Returns:
      0 if tokens are equal, negative if first token is smaller, positive if first token is bigger
    • min

      public static byte[] min(byte[] token, byte[] compare)
      Returns the smaller token.
      Parameters:
      token - first token
      compare - token to be compared
      Returns:
      smaller token
    • max

      public static byte[] max(byte[] token, byte[] compare)
      Returns the bigger token.
      Parameters:
      token - first token
      compare - token to be compared
      Returns:
      bigger token
    • contains

      public static boolean contains(byte[] token, byte[] sub)
      Checks if the first token contains the second token.
      Parameters:
      token - token
      sub - token to be found
      Returns:
      result of test
    • contains

      public static boolean contains(byte[] token, byte[] sub, int pos)
      Checks if the first token contains the second token.
      Parameters:
      token - token
      sub - token to be found
      pos - start position
      Returns:
      result of test
    • contains

      public static boolean contains(byte[] token, int ch)
      Checks if the first token contains the specified character.
      Parameters:
      token - token
      ch - character to be found
      Returns:
      result of test
    • indexOf

      public static int indexOf(byte[] token, int ch)
      Returns the position of the specified character or -1.
      Parameters:
      token - token
      ch - character to be found
      Returns:
      position or -1
    • lastIndexOf

      public static int lastIndexOf(byte[] token, int ch)
      Returns the last position of the specified character or -1.
      Parameters:
      token - token
      ch - character to be found
      Returns:
      position or -1
    • indexOf

      public static int indexOf(byte[] token, byte[] sub)
      Returns the position of the specified token or -1.
      Parameters:
      token - token
      sub - token to be found
      Returns:
      position or -1
    • indexOf

      public static int indexOf(byte[] token, byte[] sub, int pos)
      Returns the position of the specified token or -1.
      Parameters:
      token - token
      sub - token to be found
      pos - start position
      Returns:
      result of test
    • startsWith

      public static boolean startsWith(byte[] token, int ch)
      Checks if the first token starts with the specified character.
      Parameters:
      token - token
      ch - character to be found
      Returns:
      result of test
    • startsWith

      public static boolean startsWith(byte[] token, byte[] sub)
      Checks if the first token starts with the second token.
      Parameters:
      token - token
      sub - token to be found
      Returns:
      result of test
    • startsWith

      public static boolean startsWith(byte[] token, byte[] sub, int pos)
      Checks if the first token starts with the second token.
      Parameters:
      token - token
      sub - token to be found
      pos - start position
      Returns:
      result of test
    • endsWith

      public static boolean endsWith(byte[] token, int ch)
      Checks if the first token starts with the specified character.
      Parameters:
      token - token
      ch - character to be bound
      Returns:
      result of test
    • endsWith

      public static boolean endsWith(byte[] token, byte[] sub)
      Checks if the first token ends with the second token.
      Parameters:
      token - token
      sub - token to be found
      Returns:
      result of test
    • substring

      public static byte[] substring(byte[] token, int start)
      Returns a substring of the specified token. Note that this method ignores Unicode codepoints; use subtoken(byte[], int) instead.
      Parameters:
      token - input token
      start - start position
      Returns:
      substring
    • substring

      public static byte[] substring(byte[] token, int start, int end)
      Returns a substring of the specified token. Note that this method ignores Unicode codepoints; use subtoken(byte[], int) instead.
      Parameters:
      token - input token
      start - start position
      end - end position
      Returns:
      substring
    • subtoken

      public static byte[] subtoken(byte[] token, int start)
      Returns a partial token.
      Parameters:
      token - input token
      start - start position
      Returns:
      resulting text
    • subtoken

      public static byte[] subtoken(byte[] token, int start, int end)
      Returns a partial token.
      Parameters:
      token - input text
      start - start position
      end - end position
      Returns:
      resulting text
    • split

      public static byte[][] split(byte[] token, int sep)
      Splits a token around matches of the given separator.
      Parameters:
      token - token to be split
      sep - separation character
      Returns:
      array
    • distinctTokens

      public static byte[][] distinctTokens(byte[] token)
      Normalizes the specified input and returns its distinct tokens. Optimized for small number of tokens.
      Parameters:
      token - token
      Returns:
      distinct tokens
    • ws

      public static boolean ws(byte[] token)
      Checks if the specified token has only whitespaces.
      Parameters:
      token - token
      Returns:
      true if all characters are whitespaces
    • replace

      public static byte[] replace(byte[] token, int search, int replace)
      Replaces the specified character and returns the result token.
      Parameters:
      token - token to be checked
      search - the character to be replaced
      replace - the new character
      Returns:
      resulting token
    • trim

      public static byte[] trim(byte[] token)
      Removes leading and trailing whitespaces from the specified token.
      Parameters:
      token - token to be trimmed
      Returns:
      trimmed token
    • chop

      public static byte[] chop(byte[] token, int max)
      Chops a token to the specified length. Appends trailing dots if the string is too long.
      Parameters:
      token - token to be chopped
      max - maximum length
      Returns:
      chopped token
    • concat

      public static byte[] concat(byte[]... tokens)
      Concatenates multiple tokens.
      Parameters:
      tokens - tokens
      Returns:
      resulting token
    • concat

      public static byte[] concat(Object... objects)
      Concatenates multiple objects.
      Parameters:
      objects - objects
      Returns:
      resulting token
    • delete

      public static byte[] delete(byte[] token, int ch)
      Deletes a character from a token.
      Parameters:
      token - token
      ch - character to be removed
      Returns:
      resulting token
    • normalize

      public static byte[] normalize(byte[] token)
      Normalizes all whitespace occurrences from the specified token.
      Parameters:
      token - token
      Returns:
      normalized token
    • ws

      public static boolean ws(int ch)
      Checks if the specified character is a whitespace.
      Parameters:
      ch - the letter to be checked
      Returns:
      result of check
    • letter

      public static boolean letter(int ch)
      Checks if the specified character is a computer letter (A - Z, a - z, _).
      Parameters:
      ch - the letter to be checked
      Returns:
      result of check
    • digit

      public static boolean digit(int ch)
      Checks if the specified character is a digit (0 - 9).
      Parameters:
      ch - the letter to be checked
      Returns:
      result of check
    • letterOrDigit

      public static boolean letterOrDigit(int ch)
      Checks if the specified character is a computer letter or digit.
      Parameters:
      ch - the letter to be checked
      Returns:
      result of check
    • uc

      public static byte[] uc(byte[] token)
      Converts the specified token to upper case.
      Parameters:
      token - token to be converted
      Returns:
      resulting token
    • tc

      public static byte[] tc(byte[] token)
      Converts the specified token to title case.
      Parameters:
      token - token to be converted
      Returns:
      resulting token
    • uc

      public static int uc(int ch)
      Converts a character to upper case.
      Parameters:
      ch - character to be converted
      Returns:
      resulting character
    • lc

      public static byte[] lc(byte[] token)
      Converts the specified token to lower case.
      Parameters:
      token - token to be converted
      Returns:
      resulting token
    • lc

      public static int lc(int ch)
      Converts a character to lower case.
      Parameters:
      ch - character to be converted
      Returns:
      resulting character
    • prefix

      public static byte[] prefix(byte[] name)
      Returns the prefix of the specified token.
      Parameters:
      name - name
      Returns:
      prefix or empty token if no prefix exists
    • local

      public static byte[] local(byte[] name)
      Returns the local name of the specified name.
      Parameters:
      name - name
      Returns:
      local name
    • encodeUri

      public static byte[] encodeUri(byte[] token, boolean iri)
      Returns a URI encoded token.
      Parameters:
      token - token
      iri - input
      Returns:
      encoded token
    • escape

      public static byte[] escape(byte[] token)
      Escapes the specified token.
      Parameters:
      token - token
      Returns:
      escaped token
    • hex

      public static byte[] hex(byte[] value, boolean uc)
      Returns a hex representation of the specified byte array.
      Parameters:
      value - values to be mapped
      uc - upper case
      Returns:
      hex representation
    • dec

      public static int dec(int ch)
      Converts a hex character to an integer value.
      Parameters:
      ch - character
      Returns:
      integer value, or -1 if the input is invalid
    • dec

      public static int dec(int ch1, int ch2)
      Converts hex characters to an integer value.
      Parameters:
      ch1 - first character
      ch2 - second character
      Returns:
      integer value, or -1 if the input is invalid
    • decodeUri

      public static byte[] decodeUri(byte[] token)
      Returns a URI decoded token.
      Parameters:
      token - token
      Returns:
      decoded token, or null if input was invalid