XPath/XQuery parse-json function

Summary

Parses a string supplied in the form of a JSON text, returning the results typically in the form of a map or array.

Signatures

fn:parse-json(
$json-text as xs:string?
) as item()?
fn:parse-json(
$json-text as xs:string?,
$options as map(*)
) as item()?

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

The effect of the one-argument form of this function is the same as calling the two-argument form with an empty map as the value of the $options argument.

The first argument is a JSON text as defined in , in the form of a string. The function parses this string to return an XDM value.

If the value of $json-text is the empty sequence, the function returns the empty sequence.

The result will also be an empty sequence if $json-text is the string "null".

The $options argument can be used to control the way in which the parsing takes place. The option parameter conventions apply.

The entries that may appear in the $options map are as follows:

Determines whether deviations from the syntax of RFC7159 are permitted. xs:boolean false The input must consist of an optional byte order mark (which is ignored) followed by a string that conforms to the grammar of JSON-text in . An error must be raised if the input does not conform to the grammar. The input may contain deviations from the grammar of , which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised if the input does not conform to the grammar. Determines the policy for handling duplicate keys in a JSON object. To determine whether keys are duplicates, they are compared using the Unicode codepoint collation, after expanding escape sequences, unless the escape option is set to true , in which case keys are compared in escaped form. xs:string use-first An error is raised if duplicate keys are encountered. If duplicate keys are present in a JSON object, all but the first of a set of duplicates are ignored. If duplicate keys are present in a JSON object, all but the last of a set of duplicates are ignored. Determines whether special characters are represented in the XDM output in backslash-escaped form. xs:boolean true All characters in the input that are valid in the version of XML supported by the implementation, whether or not they are represented in the input by means of an escape sequence, are represented as unescaped characters in the result. Any characters or codepoints that are not valid XML characters (for example, unpaired surrogates) are passed to the fallback function as described below; in the absence of a fallback function, they are replaced by the Unicode REPLACEMENT CHARACTER (xFFFD). JSON escape sequences are used in the result to represent special characters in the JSON input, as defined below, whether or not they were represented using JSON escape sequences in the input. The characters that are considered "special" for this purpose are:

all codepoints in the range x00 to x1F or x7F to x9F;

all codepoints that do not represent characters that are valid in the version of XML supported by the processor, including codepoints representing unpaired surrogates;

the backslash character itself (x5C).

Such characters are represented using a two-character escape sequence where available (for example, \t), or a six-character escape sequence otherwise (for example \uDEAD). Characters other than these are not escaped in the result, even if they were escaped in the input. Provides a function which is called when the input contains an escape sequence that represents a character that is not valid in the version of XML supported by the implementation. It is an error to supply the fallback option if the escape option is present with the value true. function(xs:string) as xs:string The default is effectively the function function($s){"�"}: that is, a function that replaces the escape sequence with the Unicode REPLACEMENT CHARACTER. The function is called when the JSON input contains a special character (as defined under the escape option) that is valid according to the JSON grammar, whether the special character is represented in the input directly or as an escape sequence. The function is called once for any surrogate that is not properly paired with another surrogate. The string supplied as the argument will always be a two- or six- character escape sequence, starting with a backslash, that conforms to the rules in the JSON grammar (as extended by the implementation if liberal:true() is specified): for example \b or \uFFFF or \uDEAD. The function is not called for an escape sequence that is invalid against the grammar (for example \x0A). The function returns a string which is inserted into the result in place of the invalid character. The function also has the option of raising a dynamic error by calling fn:error.

The various structures that can occur in JSON are transformed recursively to XDM values as follows:

  1. A JSON object is converted to a map. The entries in the map correspond to the key/value pairs in the JSON object. The key is always of type xs:string; the associated value may be of any type, and is the result of converting the JSON value by recursive application of these rules. For example, the JSON text {"x":2, "y":5} is transformed to the value map{"x":2, "y":5}.

    If duplicate keys are encountered in a JSON object, they are handled as determined by the duplicates option defined above.

  2. A JSON array is transformed to an array whose members are the result of converting the corresponding member of the array by recursive application of these rules. For example, the JSON text ["a", "b", null] is transformed to the value ["a", "b", ()].

  3. A JSON string is converted to an xs:string value. The handling of special characters depends on the escape and fallback options, as described in the table above.

  4. A JSON number is converted to an xs:double value using the rules for casting from xs:string to xs:double.

  5. The JSON boolean values true and false are converted to the corresponding xs:boolean values.

  6. The JSON value null is converted to the empty sequence.

Examples

The expression parse-json('{"x":1, "y":[3,4,5]}') returns map{"x":1e0,"y":[3e0,4e0,5e0]}.

The expression parse-json('"abcd"') returns "abcd".

The expression parse-json('{"x":"\\", "y":"\u0025"}') returns map{"x":"\","y":"%"}.

The expression parse-json('{"x":"\\", "y":"\u0025"}', map{'escape':true()}) returns map{"x":"\\","y":"%"}.

The expression parse-json('{"x":"\\", "y":"\u0000"}') returns map{"x":"\","y":codepoints-to-string(65533)}.

The expression parse-json('{"x":"\\", "y":"\u0000"}', map{'escape':true()}) returns map{"x":"\\","y":"\u0000"}.

The expression parse-json('{"x":"\\", "y":"\u0000"}', map{'fallback':function($s){'['||$s||']'}}) returns map{"x":"\","y":"[\u0000]"}.

Error Conditions

A dynamic error occurs if the value of $input does not conform to the JSON grammar, unless the option "liberal":true() is present and the processor chooses to accept the deviation.

A dynamic error occurs if the option "duplicates":"reject" is present and the value of $input contains a JSON object with duplicate keys.

A dynamic error occurs if the $options map contains an entry whose key is defined in this specification and whose value is not valid for that key, or if it contains an entry with the key fallback when the option "escape":true() is also present.

Notes

The result of the function will be an instance of one of the following types. An instance of test (or in XQuery, typeswitch) can be used to distinguish them:

map(xs:string, item()?) for a JSON object array(item()?) for a JSON array xs:string for a JSON string xs:double for a JSON number xs:boolean for a JSON boolean empty-sequence() for a JSON null (or for empty input)

If the input starts with a byte order mark, this function ignores it. The byte order mark may have been added to the data stream in order to facilitate decoding of an octet stream to a character string, but since this function takes a character string as input, the byte order mark serves no useful purpose.

The possibility of the input containing characters that are not valid in XML (for example, unpaired surrogates) arises only when such characters are expressed using JSON escape sequences. The is because the input to the function is an instance of xs:string, which by definition can only contain characters that are valid in XML.