Package org.basex.util
Class Token
java.lang.Object
org.basex.util.Token
This class provides convenience operations for handling 'Tokens'. A token is a UTF-8 encoded string. It is represented as a byte array.
In order to ensure a consistent representation of tokens in the project, all string conversions should be done via the methods of this class.
- Author:
- BaseX Team 2005-21, BSD License, Christian Gruen
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final byte[]Colon.static final Comparator<byte[]>Comparator for byte arrays.static final DecimalFormatDecimal double output.static final DecimalFormatDecimal float output.static final byte[]Dollar.static final byte[]Empty token.static final byte[]Token 'false'.static final byte[]Hex codes.static final byte[]ID token.static final byte[]Token 'INF'.static final byte[]Token 'Infinity'.static final Comparator<byte[]>Case-insensitive comparator for byte arrays.static final DecimalFormatSymbolsUS charset.static final byte[]Minimum integer.static final byte[]Minimum long value.static final byte[]Token 'NaN'.static final byte[]Token '-Infinity'.static final byte[]Number '-0'.static final byte[]Token '-INF'.static final byte[]Number '1'.static final byte[]IDRef token.static final charUnicode replacement codepoint (\\uFFFD).static final DecimalFormatScientific double output.static final DecimalFormatScientific float output.static final byte[]Slash.static final byte[]Space.static final byte[]Token 'true'.static final byte[]XML token.static final byte[]XML token with colon.static final byte[]XMLNS token.static final byte[]XMLNS token with colon.static final byte[]Number '0'. -
Method Summary
Modifier and TypeMethodDescriptionstatic booleanascii(byte[] token) Checks if the specified token only consists of ASCII characters.static byte[]chop(byte[] token, int max) Chops a token to the specified length.static byte[]chopNumber(byte[] token) Finishes the numeric token, removing trailing zeroes.static intcl(byte cp) Returns the length of the specified UTF8 byte.static intcl(byte[] token, int pos) Returns the byte length of a codepoint at the specified position.static byte[]concat(byte[]... tokens) Concatenates multiple tokens.static byte[]Concatenates multiple objects.static booleancontains(byte[] token, byte[] sub) Checks if the first token contains the second token.static booleancontains(byte[] token, byte[] sub, int pos) Checks if the first token contains the second token.static booleancontains(byte[] token, int ch) Checks if the first token contains the specified character.static intcp(byte[] token, int pos) Returns the codepoint (unicode value) of the specified token, starting at the specified position.static intcpLength(int cp) Returns the byte length of a codepoint.static int[]cps(byte[] token) Converts a token to a sequence of codepoints.static byte[]cpToken(int cp) Converts a codepoint to a token.static intdec(int ch) Converts a hex character to an integer value.static intdec(int ch1, int ch2) Converts hex characters to an integer value.static byte[]decodeUri(byte[] token) Returns a URI decoded token.static byte[]delete(byte[] token, int ch) Deletes a character from a token.static intdiff(byte[] token, byte[] compare) Compares two tokens lexicographically.static booleandigit(int ch) Checks if the specified character is a digit (0 - 9).static byte[][]distinctTokens(byte[] token) Normalizes the specified input and returns its distinct tokens.static byte[]encodeUri(byte[] token, boolean iri) Returns a URI encoded token.static booleanendsWith(byte[] token, byte[] sub) Checks if the first token ends with the second token.static booleanendsWith(byte[] token, int ch) Checks if the first token starts with the specified character.static booleaneq(byte[] token1, byte[] token2) Compares two tokens for equality.static booleaneq(byte[] token, byte[]... tokens) Compares several tokens for equality.static byte[]escape(byte[] token) Escapes the specified token.static inthash(byte[] token) Calculates a hash code for the specified token.static byte[]hex(byte[] value, boolean uc) Returns a hex representation of the specified byte array.static intindexOf(byte[] token, byte[] sub) Returns the position of the specified token or -1.static intindexOf(byte[] token, byte[] sub, int pos) Returns the position of the specified token or -1.static intindexOf(byte[] token, int ch) Returns the position of the specified character or -1.static intlastIndexOf(byte[] token, int ch) Returns the last position of the specified character or -1.static byte[]lc(byte[] token) Converts the specified token to lower case.static intlc(int ch) Converts a character to lower case.static intlength(byte[] token) Returns the number of codepoints in the token.static booleanletter(int ch) Checks if the specified character is a computer letter (A - Z, a - z, _).static booleanletterOrDigit(int ch) Checks if the specified character is a computer letter or digit.static byte[]local(byte[] name) Returns the local name of the specified name.static byte[]max(byte[] token, byte[] compare) Returns the bigger token.static byte[]min(byte[] token, byte[] compare) Returns the smaller token.static byte[]normalize(byte[] token) Normalizes all whitespace occurrences from the specified token.static intnumDigits(int integer) Checks number of digits of the specified integer.static byte[]prefix(byte[] name) Returns the prefix of the specified token.static byte[]replace(byte[] token, int search, int replace) Replaces the specified character and returns the result token.static byte[][]split(byte[] token, int sep) Splits a token around matches of the given separator.static booleanstartsWith(byte[] token, byte[] sub) Checks if the first token starts with the second token.static booleanstartsWith(byte[] token, byte[] sub, int pos) Checks if the first token starts with the second token.static booleanstartsWith(byte[] token, int ch) Checks if the first token starts with the specified character.static Stringstring(byte[] token) Returns the specified token as string.static Stringstring(byte[] token, int start, int length) Converts the specified token to a string.static byte[]substring(byte[] token, int start) Returns a substring of the specified token.static byte[]substring(byte[] token, int start, int end) Returns a substring of the specified token.static byte[]subtoken(byte[] token, int start) Returns a partial token.static byte[]subtoken(byte[] token, int start, int end) Returns a partial token.static byte[]tc(byte[] token) Converts the specified token to title case.static doubletoDouble(byte[] token) Converts the specified token into a double value.static inttoInt(byte[] token) Converts the specified token into an integer value.static byte[]token(boolean bool) Creates a byte array representation of the specified boolean value.static byte[]token(double dbl) Creates a byte array representation from the specified double value.static byte[]token(float flt) Creates a byte array representation from the specified float value.static byte[]token(int integer) Creates a byte array representation of the specified integer value.static byte[]token(long integer) Creates a byte array representation from the specified long value, using Java's standard method.static byte[]Returns a token representation of the specified object.static byte[]Converts a string to a byte array.static byte[][]Converts the specified strings to tokens.static longtoLong(byte[] token) Converts the specified token into an long value.static longtoLong(byte[] token, int start, int end) Converts the specified token into an long value.static byte[]trim(byte[] token) Removes leading and trailing whitespaces from the specified token.static byte[]uc(byte[] token) Converts the specified token to upper case.static intuc(int ch) Converts a character to upper case.static byte[]Converts a token from the input encoding to UTF8.static booleanws(byte[] token) Checks if the specified token has only whitespaces.static booleanws(int ch) Checks if the specified character is a whitespace.
-
Field Details
-
EMPTY
public static final byte[] EMPTYEmpty token. -
XML
public static final byte[] XMLXML token. -
XML_COLON
public static final byte[] XML_COLONXML token with colon. -
XMLNS
public static final byte[] XMLNSXMLNS token. -
XMLNS_COLON
public static final byte[] XMLNS_COLONXMLNS token with colon. -
ID
public static final byte[] IDID token. -
REF
public static final byte[] REFIDRef token. -
TRUE
public static final byte[] TRUEToken 'true'. -
FALSE
public static final byte[] FALSEToken 'false'. -
NAN
public static final byte[] NANToken 'NaN'. -
INF
public static final byte[] INFToken 'INF'. -
NEGATVE_INF
public static final byte[] NEGATVE_INFToken '-INF'. -
INFINITY
public static final byte[] INFINITYToken 'Infinity'. -
NEGATIVE_INFINITY
public static final byte[] NEGATIVE_INFINITYToken '-Infinity'. -
MIN_LONG
public static final byte[] MIN_LONGMinimum long value. -
MIN_INT
public static final byte[] MIN_INTMinimum integer. -
SPACE
public static final byte[] SPACESpace. -
ZERO
public static final byte[] ZERONumber '0'. -
NEGATIVE_ZERO
public static final byte[] NEGATIVE_ZERONumber '-0'. -
ONE
public static final byte[] ONENumber '1'. -
SLASH
public static final byte[] SLASHSlash. -
COLON
public static final byte[] COLONColon. -
DOLLAR
public static final byte[] DOLLARDollar. -
COMPARATOR
Comparator for byte arrays. -
LC_COMPARATOR
Case-insensitive comparator for byte arrays. -
REPLACEMENT
public static final char REPLACEMENTUnicode replacement codepoint (\\uFFFD).- See Also:
-
LOC
US charset. -
SD
Scientific double output. -
DD
Decimal double output. -
SF
Scientific float output. -
DF
Decimal float output. -
HEX_TABLE
public static final byte[] HEX_TABLEHex codes.
-
-
Method Details
-
string
Returns the specified token as string.- Parameters:
token- token- Returns:
- string
-
string
Converts the specified token to a string.- Parameters:
token- tokenstart- start positionlength- length- Returns:
- string
-
ascii
public static boolean ascii(byte[] token) Checks if the specified token only consists of ASCII characters.- Parameters:
token- token- Returns:
- result of check
-
token
Converts a string to a byte array. All strings should be converted by this function to guarantee a consistent character conversion.- Parameters:
string- string to be converted- Returns:
- byte array
-
tokens
Converts the specified strings to tokens.- Parameters:
strings- strings- Returns:
- tokens
-
utf8
Converts a token from the input encoding to UTF8.- Parameters:
token- token to be convertedencoding- input encoding- Returns:
- byte array
-
token
Returns a token representation of the specified object.- byte arrays are returned as-is.
-
nullreferences are replaced by the string "null". - objects of type
Throwableare converted to a string representation viaUtil.message(java.lang.Throwable). - objects of type
Classare converted viaUtil.className(Class). - for all other typer,
Object.toString()is called.
- Parameters:
object- object- Returns:
- token
-
cp
public static int cp(byte[] token, int pos) Returns the codepoint (unicode value) of the specified token, starting at the specified position. Returns a unicode replacement character for invalid values.- Parameters:
token- tokenpos- character position- Returns:
- current character
-
cl
public static int cl(byte cp) Returns the length of the specified UTF8 byte.- Parameters:
cp- codepoint- Returns:
- character length
-
cl
public static int cl(byte[] token, int pos) Returns the byte length of a codepoint at the specified position.- Parameters:
token- tokenpos- position- Returns:
- character length
-
cps
public static int[] cps(byte[] token) Converts a token to a sequence of codepoints.- Parameters:
token- token- Returns:
- codepoints
-
cpToken
public static byte[] cpToken(int cp) Converts a codepoint to a token.- Parameters:
cp- codepoint of the character- Returns:
- token
-
cpLength
public static int cpLength(int cp) Returns the byte length of a codepoint.- Parameters:
cp- codepoint of the character- Returns:
- length
-
length
public static int length(byte[] token) Returns the number of codepoints in the token.- Parameters:
token- token- Returns:
- number of codepoints
-
token
public static byte[] token(boolean bool) Creates a byte array representation of the specified boolean value.- Parameters:
bool- boolean value to be converted- Returns:
- boolean value in byte array
-
token
public static byte[] token(int integer) Creates a byte array representation of the specified integer value.- Parameters:
integer- int value to be converted- Returns:
- integer value in byte array
-
numDigits
public static int numDigits(int integer) Checks number of digits of the specified integer.- Parameters:
integer- number to be checked- Returns:
- number of digits
-
token
public static byte[] token(long integer) Creates a byte array representation from the specified long value, using Java's standard method.- Parameters:
integer- value to be converted- Returns:
- byte array
-
token
public static byte[] token(double dbl) Creates a byte array representation from the specified double value.- Parameters:
dbl- double value to be converted- Returns:
- byte array
-
token
public static byte[] token(float flt) Creates a byte array representation from the specified float value.- Parameters:
flt- float value to be converted- Returns:
- byte array
-
chopNumber
public static byte[] chopNumber(byte[] token) Finishes the numeric token, removing trailing zeroes.- Parameters:
token- token to be modified- Returns:
- token
-
toDouble
public static double toDouble(byte[] token) Converts the specified token into a double value.- Parameters:
token- token to be converted- Returns:
- resulting double value, or
Double.NaNis returned if the input is invalid
-
toLong
public static long toLong(byte[] token) Converts the specified token into an long value.Long.MIN_VALUEis returned if the input is invalid. Note that this may also be the actual value (MIN_LONG).- Parameters:
token- token to be converted- Returns:
- resulting long value
-
toLong
public static long toLong(byte[] token, int start, int end) Converts the specified token into an long value.Long.MIN_VALUEis returned if the input is invalid. Note that this may also be the actual value (MIN_LONG).- Parameters:
token- token to be convertedstart- first byte to be parsedend- last byte to be parsed - exclusive- Returns:
- resulting long value
-
toInt
public static int toInt(byte[] token) Converts the specified token into an integer value.Integer.MIN_VALUEis returned if the input is invalid.- Parameters:
token- token to be converted- Returns:
- resulting integer value
-
hash
public static int hash(byte[] token) Calculates a hash code for the specified token.- Parameters:
token- specified token- Returns:
- hash code
-
eq
public static boolean eq(byte[] token1, byte[] token2) Compares two tokens for equality.- Parameters:
token1- first token (can benull)token2- token to be compared (can benull)- Returns:
- true if the tokens are equal
-
eq
public static boolean eq(byte[] token, byte[]... tokens) Compares several tokens for equality.- Parameters:
token- token (can benull)tokens- tokens to be compared (single tokens can benull)- Returns:
- true if one test is successful
-
diff
public static int diff(byte[] token, byte[] compare) Compares two tokens lexicographically.- Parameters:
token- first tokencompare- token to be compared- Returns:
- 0 if tokens are equal, negative if first token is smaller, positive if first token is bigger
-
min
public static byte[] min(byte[] token, byte[] compare) Returns the smaller token.- Parameters:
token- first tokencompare- token to be compared- Returns:
- smaller token
-
max
public static byte[] max(byte[] token, byte[] compare) Returns the bigger token.- Parameters:
token- first tokencompare- token to be compared- Returns:
- bigger token
-
contains
public static boolean contains(byte[] token, byte[] sub) Checks if the first token contains the second token.- Parameters:
token- tokensub- token to be found- Returns:
- result of test
-
contains
public static boolean contains(byte[] token, byte[] sub, int pos) Checks if the first token contains the second token.- Parameters:
token- tokensub- token to be foundpos- start position- Returns:
- result of test
-
contains
public static boolean contains(byte[] token, int ch) Checks if the first token contains the specified character.- Parameters:
token- tokench- character to be found- Returns:
- result of test
-
indexOf
public static int indexOf(byte[] token, int ch) Returns the position of the specified character or -1.- Parameters:
token- tokench- character to be found- Returns:
- position or
-1
-
lastIndexOf
public static int lastIndexOf(byte[] token, int ch) Returns the last position of the specified character or -1.- Parameters:
token- tokench- character to be found- Returns:
- position or
-1
-
indexOf
public static int indexOf(byte[] token, byte[] sub) Returns the position of the specified token or -1.- Parameters:
token- tokensub- token to be found- Returns:
- position or
-1
-
indexOf
public static int indexOf(byte[] token, byte[] sub, int pos) Returns the position of the specified token or -1.- Parameters:
token- tokensub- token to be foundpos- start position- Returns:
- result of test
-
startsWith
public static boolean startsWith(byte[] token, int ch) Checks if the first token starts with the specified character.- Parameters:
token- tokench- character to be found- Returns:
- result of test
-
startsWith
public static boolean startsWith(byte[] token, byte[] sub) Checks if the first token starts with the second token.- Parameters:
token- tokensub- token to be found- Returns:
- result of test
-
startsWith
public static boolean startsWith(byte[] token, byte[] sub, int pos) Checks if the first token starts with the second token.- Parameters:
token- tokensub- token to be foundpos- start position- Returns:
- result of test
-
endsWith
public static boolean endsWith(byte[] token, int ch) Checks if the first token starts with the specified character.- Parameters:
token- tokench- character to be bound- Returns:
- result of test
-
endsWith
public static boolean endsWith(byte[] token, byte[] sub) Checks if the first token ends with the second token.- Parameters:
token- tokensub- token to be found- Returns:
- result of test
-
substring
public static byte[] substring(byte[] token, int start) Returns a substring of the specified token. Note that this method ignores Unicode codepoints; usesubtoken(byte[], int)instead.- Parameters:
token- input tokenstart- start position- Returns:
- substring
-
substring
public static byte[] substring(byte[] token, int start, int end) Returns a substring of the specified token. Note that this method ignores Unicode codepoints; usesubtoken(byte[], int)instead.- Parameters:
token- input tokenstart- start positionend- end position- Returns:
- substring
-
subtoken
public static byte[] subtoken(byte[] token, int start) Returns a partial token.- Parameters:
token- input tokenstart- start position- Returns:
- resulting text
-
subtoken
public static byte[] subtoken(byte[] token, int start, int end) Returns a partial token.- Parameters:
token- input textstart- start positionend- end position- Returns:
- resulting text
-
split
public static byte[][] split(byte[] token, int sep) Splits a token around matches of the given separator.- Parameters:
token- token to be splitsep- separation character- Returns:
- array
-
distinctTokens
public static byte[][] distinctTokens(byte[] token) Normalizes the specified input and returns its distinct tokens. Optimized for small number of tokens.- Parameters:
token- token- Returns:
- distinct tokens
-
ws
public static boolean ws(byte[] token) Checks if the specified token has only whitespaces.- Parameters:
token- token- Returns:
- true if all characters are whitespaces
-
replace
public static byte[] replace(byte[] token, int search, int replace) Replaces the specified character and returns the result token.- Parameters:
token- token to be checkedsearch- the character to be replacedreplace- the new character- Returns:
- resulting token
-
trim
public static byte[] trim(byte[] token) Removes leading and trailing whitespaces from the specified token.- Parameters:
token- token to be trimmed- Returns:
- trimmed token
-
chop
public static byte[] chop(byte[] token, int max) Chops a token to the specified length. Appends trailing dots if the string is too long.- Parameters:
token- token to be choppedmax- maximum length- Returns:
- chopped token
-
concat
public static byte[] concat(byte[]... tokens) Concatenates multiple tokens.- Parameters:
tokens- tokens- Returns:
- resulting token
-
concat
Concatenates multiple objects.- Parameters:
objects- objects- Returns:
- resulting token
-
delete
public static byte[] delete(byte[] token, int ch) Deletes a character from a token.- Parameters:
token- tokench- character to be removed- Returns:
- resulting token
-
normalize
public static byte[] normalize(byte[] token) Normalizes all whitespace occurrences from the specified token.- Parameters:
token- token- Returns:
- normalized token
-
ws
public static boolean ws(int ch) Checks if the specified character is a whitespace.- Parameters:
ch- the letter to be checked- Returns:
- result of check
-
letter
public static boolean letter(int ch) Checks if the specified character is a computer letter (A - Z, a - z, _).- Parameters:
ch- the letter to be checked- Returns:
- result of check
-
digit
public static boolean digit(int ch) Checks if the specified character is a digit (0 - 9).- Parameters:
ch- the letter to be checked- Returns:
- result of check
-
letterOrDigit
public static boolean letterOrDigit(int ch) Checks if the specified character is a computer letter or digit.- Parameters:
ch- the letter to be checked- Returns:
- result of check
-
uc
public static byte[] uc(byte[] token) Converts the specified token to upper case.- Parameters:
token- token to be converted- Returns:
- resulting token
-
tc
public static byte[] tc(byte[] token) Converts the specified token to title case.- Parameters:
token- token to be converted- Returns:
- resulting token
-
uc
public static int uc(int ch) Converts a character to upper case.- Parameters:
ch- character to be converted- Returns:
- resulting character
-
lc
public static byte[] lc(byte[] token) Converts the specified token to lower case.- Parameters:
token- token to be converted- Returns:
- resulting token
-
lc
public static int lc(int ch) Converts a character to lower case.- Parameters:
ch- character to be converted- Returns:
- resulting character
-
prefix
public static byte[] prefix(byte[] name) Returns the prefix of the specified token.- Parameters:
name- name- Returns:
- prefix or empty token if no prefix exists
-
local
public static byte[] local(byte[] name) Returns the local name of the specified name.- Parameters:
name- name- Returns:
- local name
-
encodeUri
public static byte[] encodeUri(byte[] token, boolean iri) Returns a URI encoded token.- Parameters:
token- tokeniri- input- Returns:
- encoded token
-
escape
public static byte[] escape(byte[] token) Escapes the specified token.- Parameters:
token- token- Returns:
- escaped token
-
hex
public static byte[] hex(byte[] value, boolean uc) Returns a hex representation of the specified byte array.- Parameters:
value- values to be mappeduc- upper case- Returns:
- hex representation
-
dec
public static int dec(int ch) Converts a hex character to an integer value.- Parameters:
ch- character- Returns:
- integer value, or
-1if the input is invalid
-
dec
public static int dec(int ch1, int ch2) Converts hex characters to an integer value.- Parameters:
ch1- first characterch2- second character- Returns:
- integer value, or
-1if the input is invalid
-
decodeUri
public static byte[] decodeUri(byte[] token) Returns a URI decoded token.- Parameters:
token- token- Returns:
- decoded token, or
nullif input was invalid
-