XPath/XQuery json-to-xml function

Summary

Parses a string supplied in the form of a JSON text, returning the results in the form of an XML document node.

Signatures

fn:json-to-xml(
$json-text as xs:string?
) as document-node()?
fn:json-to-xml(
$json-text as xs:string?,
$options as map(*)
) as document-node()?

Properties

This function is nondeterministic, context-dependent, and focus-independent. It depends on static-base-uri.

Rules

The effect of the one-argument form of this function is the same as calling the two-argument form with an empty map as the value of the $options argument.

The first argument is a JSON-text as defined in , in the form of a string. The function parses this string to return an XDM node.

If $json-text is an empty sequence, the function returns the empty sequence.

The $options argument can be used to control the way in which the parsing takes place. The option parameter conventions apply.

The entries that may appear in the $options map are as follows:

Determines whether deviations from the syntax of RFC7159 are permitted. xs:boolean false The input must consist of an optional byte order mark (which is ignored) followed by a string that conforms to the grammar of JSON-text in . An error must be raised (see below) if the input does not conform to the grammar. The input may contain deviations from the grammar of , which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised (see below) if the input does not conform to the grammar. Determines the policy for handling duplicate keys in a JSON object. To determine whether keys are duplicates, they are compared using the Unicode codepoint collation, after expanding escape sequences, unless the escape option is set to true , in which case keys are compared in escaped form. xs:string If validate is true then reject, otherwise retain. An error is raised if duplicate keys are encountered. If duplicate keys are present in a JSON object, all but the first of a set of duplicates are ignored. If duplicate keys are present in a JSON object, the XML result of the function will also contain duplicates (making it invalid against the schema). This value is therefore incompatible with the option validate=true Determines whether the generated XML tree is schema-validated. xs:boolean Implementation-defined. Indicates that the resulting XDM instance must be typed; that is, the element and attribute nodes must carry the type annotations that result from validation against the schema given at , or against an implementation-defined schema if the liberal option has the value true. Indicates that the resulting XDM instance must be untyped. Determines whether special characters are represented in the XDM output in backslash-escaped form. xs:boolean false All characters in the input that are valid in the version of XML supported by the implementation, whether or not they are represented in the input by means of an escape sequence, are represented as unescaped characters in the result. Any characters or codepoints that are not valid XML characters (for example, unpaired surrogates) are passed to the fallback function as described below; in the absence of a fallback function, they are replaced by the Unicode REPLACEMENT CHARACTER (xFFFD). The attributes escaped and escaped-key will not be present in the XDM output. JSON escape sequences are used in the result to represent special characters in the JSON input, as defined below, whether or not they were represented using JSON escape sequences in the input. The characters that are considered "special" for this purpose are:

all codepoints in the range x00 to x1F or x7F to x9F;

all codepoints that do not represent characters that are valid in the version of XML supported by the processor, including codepoints representing unpaired surrogates;

the backslash character itself (x5C).

Such characters are represented using a two-character escape sequence where available (for example, \t), or a six-character escape sequence otherwise (for example \uDEAD). Characters other than these will not be escaped in the result, even if they were escaped in the input. In the result:

Any string element whose string value contains a backslash character must have the attribute value escaped="true".

Any element that contains a key attribute whose string value contains a backslash character must have the attribute escaped-key="true".

The values of the escaped and escaped-key attributes are immaterial when there is no backslash present, and it is never necessary to include either attribute when its value is false.

Provides a function which is called when the input contains an escape sequence that represents a character that is not valid in the version of XML supported by the implementation. It is an error to supply the fallback option if the escape option is present with the value true. function(xs:string) as xs:string The default is effectively the function function($s){"�"}: that is, a function that replaces the escape sequence with the Unicode REPLACEMENT CHARACTER. The function is called when the JSON input contains an escape sequence that is valid according to the JSON grammar, but which does not represent a character that is valid in the version of XML supported by the processor. In the case of surrogates, the function is called once for any six-character escape sequence that is not properly paired with another surrogate. The string supplied as the argument will always be a two- or six- character escape sequence, starting with a backslash, that conforms to the rules in the JSON grammar (as extended by the implementation if liberal:true() is specified): for example \b or \uFFFF or \uDEAD. The function is not called for an escape sequence that is invalid against the grammar (for example \x0A). The function returns a string which is inserted into the result in place of the invalid character. The function also has the option of raising a dynamic error by calling fn:error.

The various structures that can occur in JSON are transformed recursively to XDM values according to the rules given in .

The function returns a document node, whose only child is the element node representing the outermost construct in the JSON text.

The function is non-deterministic with respect to node identity: that is, if the function is called twice with the same arguments, it is implementation-dependent whether the same node is returned on both occasions.

The base URI of the returned document node is taken from the static base URI of the function call.

The choice of namespace prefix (or absence of a prefix) in the names of constructed nodes is implementation-dependant.

The XDM tree returned by the function does not contain any unnecessary (albeit valid) nodes such as whitespace text nodes, comments, or processing instructions. It does not include any whitespace in the value of number or boolean element nodes, or in the value of escaped or escaped-key attribute nodes.

If the result is typed, every element named string will have an attribute named escaped whose value is either true or false, and every element having an attribute named key will also have an attribute named escaped-key whose value is either true or false.

If the result is untyped, the attributes escaped and escaped-key will either be present with the value true, or will be absent. They will never be present with the value false.

Examples

The expression json-to-xml('{"x": 1, "y": [3,4,5]}') returns <map xmlns="http://www.w3.org/2005/xpath-functions"> <number key="x">1</number> <array key="y"> <number>3</number> <number>4</number> <number>5</number> </array> </map>.

The expression json-to-xml('"abcd"', map{'liberal': false()}) returns <string xmlns="http://www.w3.org/2005/xpath-functions">abcd</string>.

The expression json-to-xml('{"x": "\\", "y": "\u0025"}') returns <map xmlns="http://www.w3.org/2005/xpath-functions"> <string key="x">\</string> <string key="y">%</string> </map>.

The expression json-to-xml('{"x": "\\", "y": "\u0025"}', map{'escape': true()}) returns <map xmlns="http://www.w3.org/2005/xpath-functions"> <string escaped="true" key="x">\\</string> <string key="y">%</string> </map>.

The following example illustrates use of the fallback function to handle characters that are invalid in XML.

let $jsonstr := unparsed-text('http://example.com/endpoint'), $options := map { 'liberal': true(), 'fallback': function($char as xs:string) as xs:string { let $c0chars := map { '\u0000':'[NUL]', '\u0001':'[SOH]', '\u0002':'[STX]', ... '\u001E':'[RS]', '\u001F':'[US]' }, $replacement := $c0chars($char) return if (exists($replacement)) then $replacement else error(xs:QName('err:invalid-char'), 'Error: ' || $char || ' is not a C0 control character.') } } return json-to-xml($jsonstr, $options)

Error Conditions

An error is raised if the value of $input does not conform to the JSON grammar as defined by , unless the option "liberal":true() is present and the processor chooses to accept the deviation.

An error is raised if the value of the validate option is true and the processor does not support schema validation or typed data.

An error is raised if the value of $options includes an entry whose key is defined in this specification, and whose value is not a permitted value for that key.

Notes

To read a JSON file, this function can be used in conjunction with the fn:unparsed-text function.

Many JSON implementations allow commas to be used after the last item in an object or array, although the specification does not permit it. The option spec="liberal" is provided to allow such deviations from the specification to be accepted. Some JSON implementations also allow constructors such as new Date("2000-12-13") to appear as values: specifying spec="liberal" allows such extensions to be accepted, but does not guarantee it. If such extensions are accepted, the resulting value is implementation-defined, and will not necessarily conform to the schema at .

If the input starts with a byte order mark, this function ignores it. The byte order mark may have been added to the data stream in order to facilitate decoding of an octet stream to a character string, but since this function takes a character string as input, the byte order mark serves no useful purpose.

The possibility of the input containing characters that are not valid in XML (for example, unpaired surrogates) arises only when such characters are expressed using JSON escape sequences. The is because the input to the function is an instance of xs:string, which by definition can only contain characters that are valid in XML.