XPath/XQuery parse-xml function

Summary

This function takes as input an XML document represented as a string, and returns the document node at the root of an XDM tree representing the parsed document.

Signature

fn:parse-xml(
$arg as xs:string?
) as document-node(element(*))?

Properties

This function is nondeterministic, context-dependent, and focus-independent. It depends on static-base-uri.

Rules

If $arg is the empty sequence, the function returns the empty sequence.

The precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether DTD and/or schema validation is invoked, and it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used.

The static base URI property from the static context of the fn:parse-xml function call is used both as the base URI used by the XML parser to resolve relative entity references within the document, and as the base URI of the document node that is returned.

The document URI of the returned node is absent.

The function is not deterministic: that is, if the function is called twice with the same arguments, it is implementation-dependent whether the same node is returned on both occasions.

Examples

The expression fn:parse-xml("<alpha>abcd</alpha>") returns a newly created document node, having an alpha element as its only child; the alpha element in turn is the parent of a text node whose string value is "abcd".

Error Conditions

A dynamic error is raised if the content of $arg is not a well-formed and namespace-well-formed XML document.

A dynamic error is raised if DTD-based validation is carried out and the content of $arg is not valid against its DTD.

Notes

Since the XML document is presented to the parser as a string, rather than as a sequence of octets, the encoding specified within the XML declaration has no meaning. If the XML parser accepts input only in the form of a sequence of octets, then the processor must ensure that the string is encoded as octets in a way that is consistent with rules used by the XML parser to detect the encoding.

The primary use case for this function is to handle input documents that contain nested XML documents embedded within CDATA sections. Since the content of the CDATA section are exposed as text, the receiving query or stylesheet may pass this text to the fn:parse-xml function to create a tree representation of the nested document.

Similarly, nested XML within comments is sometimes encountered, and lexical XML is sometimes returned by extension functions, for example, functions that access web services or read from databases.

A use case arises in XSLT where there is a need to preprocess an input document before parsing. For example, an application might wish to edit the document to remove its DOCTYPE declaration. This can be done by reading the raw text using the fn:unparsed-text function, editing the resulting string, and then passing it to the fn:parse-xml function.