Returns a sequence of strings constructed by splitting the input wherever a separator is found; the separator is any substring that matches a given regular expression.
fn:tokenize
( $input
as xs:string?
xs:string*
fn:tokenize
( $input
as xs:string?
,$pattern
as xs:string
xs:string*
fn:tokenize
( $input
as xs:string?
,$pattern
as xs:string
,$flags
as xs:string
xs:string*
The one-argument form of this function
splits the supplied string at whitespace boundaries. More specifically, calling fn:tokenize($input)
is equivalent to calling fn:tokenize(fn:normalize-space($input), ' '))
where the second argument
is a single space character (x20).
The effect of calling the two-argument form of this function (omitting the argument
$flags
) is the same as the effect of calling the three-argument version with the
$flags
argument set to a zero-length string. Flags are defined in
.
The following rules apply to the three-argument form of the function:
The $flags
argument is interpreted in the same way as for the
fn:matches
function.
If $input
is the empty sequence, or if $input
is the
zero-length string, the function returns the empty sequence.
The function returns a sequence of strings formed by breaking the $input
string into a sequence of strings, treating any substring that matches
$pattern
as a separator. The separators themselves are not returned.
Except with the one-argument form of the function,
if a separator occurs at the start of the $input
string, the result
sequence will start with a zero-length string. Similarly, zero-length strings will also occur in
the result sequence if a separator occurs at the end of the $input
string,
or if two adjacent substrings match the supplied $pattern
.
If two alternatives within the supplied $pattern
both match at the same
position in the $input
string, then the match that is chosen is the first.
For example:
The expression fn:tokenize(" red green blue ")
returns ("red", "green", "blue")
.
The expression fn:tokenize("The cat sat on the mat", "\s+")
returns ("The", "cat", "sat", "on", "the", "mat")
.
The expression fn:tokenize(" red green blue ", "\s+")
returns ("", "red", "green", "blue", "")
.
The expression fn:tokenize("1, 15, 24, 50", ",\s*")
returns ("1", "15", "24", "50")
.
The expression fn:tokenize("1,15,,24,50,", ",")
returns ("1", "15", "", "24", "50", "")
.
fn:tokenize("abba", ".?")
raises the dynamic error .
The expression fn:tokenize("Some unparsed <br> HTML <BR> text",
"\s*<br>\s*", "i")
returns ("Some unparsed", "HTML", "text")
.
A dynamic error is raised if the value of
$pattern
is invalid according to the rules described in section .
A dynamic error is raised if the value of
$flags
is invalid according to the rules described in section .
A dynamic error is raised if the supplied
$pattern
matches a zero-length string, that is, if fn:matches("",
$pattern, $flags)
returns true
.
If the input string is not zero length, and no separators are found in the input string, the result of the function is a single string identical to the input string.
The one-argument form of the function has a similar effect to
the two-argument form with \s+
as the separator pattern, except that the one-argument
form strips leading and trailing whitespace, whereas the two-argument form delivers an extra
zero-length token if leading or trailing whitespace is present.
The function returns no information about the separators that were found
in the string. If this information is required, the fn:analyze-string
function
can be used instead.
The separator used by the one-argument form of the function is any sequence of tab (x09), newline (x0A), carriage return (x0D) or space (x20) characters. This is the same as the separator recognized by list-valued attributes as defined in XSD. It is not the same as the separator recognized by list-valued attributes in HTML5, which also treats form-feed (x0C) as whitespace. If it is necessary to treat form-feed as a separator, an explicit separator pattern should be used.