XPath/XQuery unparsed-text function

Summary

The fn:unparsed-text function reads an external resource (for example, a file) and returns a string representation of the resource.

Signatures

fn:unparsed-text(
$href as xs:string?
) as xs:string?
fn:unparsed-text(
$href as xs:string?,
$encoding as xs:string
) as xs:string?

Properties

This function is deterministic, context-dependent, and focus-independent. It depends on static-base-uri.

Rules

The $href argument must be a string in the form of a URI reference, which must contain no fragment identifier, and must identify a resource for which a string representation is available. If the URI is a relative URI reference, then it is resolved relative to the static base URI property from the static context.

The mapping of URIs to the string representation of a resource is the mapping defined in the available text resources component of the dynamic context.

If the value of the $href argument is an empty sequence, the function returns an empty sequence.

The $encoding argument, if present, is the name of an encoding. The values for this attribute follow the same rules as for the encoding attribute in an XML declaration. The only values which every implementation is required to recognize are utf-8 and utf-16.

The encoding of the external resource is determined as follows:

  1. external encoding information is used if available, otherwise

  2. if the media type of the resource is text/xml or application/xml (see ), or if it matches the conventions text/*+xml or application/*+xml (see and/or its successors), then the encoding is recognized as specified in , otherwise

  3. the value of the $encoding argument is used if present, otherwise

  4. the processor may use implementation-defined heuristics to determine the likely encoding, otherwise

  5. UTF-8 is assumed.

The result of the function is a string containing the string representation of the resource retrieved using the URI.

Examples

This XSLT example attempts to read a file containing 'boilerplate' HTML and copy it directly to the serialized output file:

<xsl:output method="html"/> <xsl:template match="/"> <xsl:value-of select="unparsed-text('header.html', 'iso-8859-1')" disable-output-escaping="yes"/> <xsl:apply-templates/> <xsl:value-of select="unparsed-text('footer.html', 'iso-8859-1')" disable-output-escaping="yes"/> </xsl:template>

Error Conditions

A dynamic error is raised if $href contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the string representation of a resource.

A dynamic error is raised if the value of the $encoding argument is not a valid encoding name, if the processor does not support the specified encoding, if the string representation of the retrieved resource contains octets that cannot be decoded into Unicode characters using the specified encoding, or if the resulting characters are not permitted XML characters.

A dynamic error is raised if $encoding is absent and the processor cannot infer the encoding using external information and the encoding is not UTF-8.

Notes

If it is appropriate to use a base URI other than the dynamic base URI (for example, when resolving a relative URI reference read from a source document) then it is advisable to resolve the relative URI reference using the fn:resolve-uri function before passing it to the fn:unparsed-text function.

There is no essential relationship between the sets of URIs accepted by the two functions fn:unparsed-text and fn:doc (a URI accepted by one may or may not be accepted by the other), and if a URI is accepted by both there is no essential relationship between the results (different resource representations are permitted by the architecture of the web).

There are no constraints on the MIME type of the resource.

The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:

The set of URI schemes that the implementation recognizes is implementation-defined. Implementations may allow the mapping of URIs to resources to be configured by the user, using mechanisms such as catalogs or user-written URI handlers. The handling of media types is implementation-defined. Implementations may provide user-defined error handling options that allow processing to continue following an error in retrieving a resource, or in reading its content. When errors have been handled in this way, the function may return a fallback document provided by the error handler. Implementations may provide user options that relax the requirement for the function to return deterministic results.

The rules for determining the encoding are chosen for consistency with . Files with an XML media type are treated specially because there are use cases for this function where the retrieved text is to be included as unparsed XML within a CDATA section of a containing document, and because processors are likely to be able to reuse the code that performs encoding detection for XML external entities.

If the text file contains characters such as < and &, these will typically be output as &lt; and &amp; if the string is serialized as XML or HTML. If these characters actually represent markup (for example, if the text file contains HTML), then an XSLT stylesheet can attempt to write them as markup to the output file using the disable-output-escaping attribute of the xsl:value-of instruction. Note, however, that XSLT implementations are not required to support this feature.