Altova MapForce 2024 Enterprise Edition

Splits the input string into a sequence of strings. Any substring that matches the regular expression pattern supplied as argument defines the separator. The matched (separator) strings are not included in the result returned by the function.

 

Note:When generating C++, C#, or Java code, the advanced features of the regular expression syntax might differ slightly. See the regex documentation of each language for more information.
mf-func-tokenize-regexp

 

Languages

Built-in, C++, C#, Java, XQuery, XSLT 2.0, XSLT 3.0.

 

Parameters

Name

Description

input

The input string.

pattern

Provides a regular expression pattern. Any substring that matches the pattern will be treated as delimiter. For more information, see Regular expressions.

flags

Optional parameter. Provides the regular expression flags to be used. For example, the flag "i" instructs the mapping process to operate in case-insensitive mode.

 

Example

The goal of the mapping illustrated below is to split the string a ,  b c,d into a sequence of strings, where each alphabetic character is an item in the sequence. Any redundant whitespace or commas must be removed.

mf-func-tokenize-regexp-example3

To achieve this goal, the regular expression pattern [ ,]+ was supplied as parameter to the tokenize-regexp function. This pattern has the following meaning:

 

It matches any of the characters inside the character class [ ,]. Therefore, a split will occur whenever a comma or a space is encountered in the input string.

The quantifier + specifies that one or more occurrences of the preceding character class are to be matched. Without this quantifier, each occurrence of space or comma would create a separate item in the resulting sequence of strings, which is not the intended result.

 

The mapping output is as follows:

 

<items>
  <item>a</item>
  <item>b</item>
  <item>c</item>
  <item>d</item>
</items>

© 2017-2023 Altova GmbH