1. Location Path Expressions

    In chapter 1 we briefly touched upon location path expressions, noting their similarity to filesystem paths. In this chapter we delve deeper. Several interactive examples are provided to facilitate understanding of this fundamental type of XPath expression.

    1. Absolute and Relative Location Paths

      As with filesystem paths, location paths can also be absolute or relative.

      An absolute location path is always evaluated from the document root.

      A relative location path is always evaluated from a context node.

      1. Absolute Location Path

        An absolute location path begins with a '/' to signify that it is starting from the document root. The document root is the top level of the XML document's node hierarchy and contains all other nodes in the XML document, including the root element (which is the top level of the XML document's element hierarchy).

        An absolute location path can also begin with '//', however '/' and '//' mean different things. The following section includes two examples which illustrate the difference.

        Absolute Location Path Examples

        (Please click on radio buttons to see result of XPath expression evaluation)

        Download example file: company_1.xml

        /
        /comment()
        /company
        /company/office
        /company/office[2]
        /company/office/@location
        /company/office[@location = 'Boston']
        /*
        //*

        This XPath expression selects the document root. The document root is the top of the XML document node hierarchy, it contains all other nodes in the XML document.

        This XPath expression selects all comment nodes which are children of the document root. In this case there is only one comment '<!-- Edited with XML Spy-->'.

        This XPath expression selects the 'company' element node which is a child of the document root. In this XML document 'company' is the root element. The root element is the top level of the element hierarchy in the XML document, i.e. it contains all other elements and attributes in the XML document. The document root is not to be confused with the root element.

        This XPath expression selects all 'office' elements which are children of the 'company' element, which in turn is a child element of the document root.

        This XPath expression selects the second 'office' element which is a child of the 'company' element, which in turn is a child element of the document root. The square brackets indicate a predicate which is used to filter sequences (in this case the sequence of 'office' nodes).'[2]' is short form for '[position() = 2]'.

        This XPath expression selects the 'location' attribute of all 'office' elements which are children of the 'company' element, which in turn is a child element of the document root. '@' is an abbreviated form of the 'attribute::'axis specifier.

        This XPath expression selects the 'office' element(s) which have an attribute named 'location' with a value equal to 'Boston'.

        This XPath expression selects all element children of the document root. '*' is an element wildcard.

        This XPath expression selects all element descendants of the document root.'//' is the abbreviated form of the 'descendant-or-self' axis specifier.

      2. Relative Location Path

        A relative location path is always evaluated from a context node.

        A context node can be thought of as the node that the XPath processor is 'currently processing'.

        The context node can change within an XPath query.

        office/employee/first_name[ . = 'John']
        This example pertains to the 'company_1.xml' file used previously. The context node from which the expression is evaluated is the 'company' element i.e. the parent element of 'office'. The context node can change during evaluation of the expression, as indicated by the dot character '.' in the predicate. In this case '.' is used to denote the context node 'first_name' in order to compare the value of this node with the string 'John'.

        Relative Location Path Examples

        (Please click on radio buttons to see result of XPath expression evaluation)

        In the following examples the 'employee' element (marked in blue) is the context node.

        Download example file: company_1.xml

        .
        ..
        ../..
        first_name
        *
        age/text()
        ../following-sibling::office/@location
        ancestor::*
        ancestor-or-self::*

        This XPath expression selects the context node. A dot i.e. '.' is an abbreviated form of the 'self::' axis specifier.

        This XPath expression selects the parent element of the context node. '..' is an abbreviated form of the 'parent::' axis specifier.

        This XPath expression selects the parent element of the parent element of the context node. '..' is an abbreviated form of the 'parent::' axis specifier.

        This XPath expression selects all 'first_name' elements which are children of the context node.

        This XPath expression selects all child element nodes of the context node.

        This XPath expression selects the text node of the 'age' child element of the context node.

        This XPath expression first navigates to the parent element of the context node (which is the first 'office' element child of the 'company' element). The next step navigates to the sibling 'office' element which follows (in document order) the first 'office' element. The third step access the 'location' attribute of the following sibling 'office' element.

        This XPath expression uses the 'ancestor::' axis specifier and the '*' wildcard to select all ancestor elements (parent, grandparent,etc.) of the context node.

        This XPath expression uses the 'ancestor-or-self::' axis specifier and the '*' wildcard to select the context node and all ancestor elements (parent, grandparent,etc.) of the context node.

    2. Steps

      A location path contains one or more steps.

      A step consists of:

      • an axis
      • a node test
      • zero or more predicates
      axis::node_test[predicate]
      A step consists of an axis followed by a node test and optional predicate(s).
      child::office[@location='Vienna']
      This XPath expression selects (from the context node) all child elements named 'office' which have an attribute named 'location' with a value equal to 'Vienna'.
      1. Axis

        The axis is the first part of a location step, it determines which direction to navigate with respect to a particular node. There are 13 different axes which belong to one of two groups: forward axis or reverse axis.

        The forward axis returns nodes in document order, the reverse axis returns nodes in reverse document order.

        A double colon :: is used to separate the axis specifier from the node test.

        Forward Axis

        • child
        • descendant
        • attribute
        • self
        • descendant-or-self
        • following-sibling
        • following
        • namespace

        Reverse Axis

        • parent
        • ancestor
        • preceding-sibling
        • preceding
        • ancestor-or-self
        child::office
        This XPath expression is a relative location path which selects all 'office' element children of the context node.

        Every axis has a principle node type. For the 'attribute' and 'namespace' axes the principle node type is 'attribute' and 'namespace' respectively, for all other axes the principle node type is 'element'.

        The default axis (i.e. if no other axis has been specified) is the child axis

        Several of the examples that we have encountered have not explicitly specified an axis e.g. 'child::' . The reason for this is because an abbreviated syntax exists for some axis and axis node test combinations e.g. an XPath expression to select all child elements named 'office' from the context node can be written as 'child::office' or simply 'office' in the abbreviated syntax.

      2. Node test

        A node test appears after the axis specifier in a location path step.

        There are three types of node test:

        • by name
        • by kind
        • by type
        1. by name

          A name can be any XML name

          office
          This XPath expression contains a named node test which select all 'office' child elements (of the context node). Because no axis is specified, the 'child' axis is assumed, and because the principle node type of the child axis is 'element' the string 'office' refers to all child elements (of the context node) named 'office'.
          @location
          This XPath expression selects the attribute named 'location' from the context node. The abbreviated syntax for the 'attribute' axis is '@'.
        2. by kind

          The kinds of node that can be tested for are:

          • document-node()
          • element()
          • attribute()
          • schema-element()
          • schema-attribute()
          • processing-instruction()
          • comment()
          • text()
          • namespace-node()
          • node()
          //attribute()
          This XPath expression selects all attribute nodes in the XML document (irrespective of name or type). The double forward slash at the beginning of the XPath expression is short form for the axis 'descendant-or-self'.
          //element()
          This XPath expression selects all elements in the XML document (irrespective of name or type).
          //*
          This XPath expression is equivalent to the previous example. In this abbreviated node test, the wildcard '*' is used to denote element nodes.
        3. by type

          The types of node that can be tested can be any built-in XML Schema datatype as well as any user defined simple or complex types.

          //element(*, xs:date)
          The first argument of the element function in this XPath expression is the node name, and the second argument is the type. The asterisk used in the first argument is used to denote 'any name'. The expression selects all elements in the XML document (irrespective of element name) which are of type 'xs:date'. The XML document must have an associated schema file for the type information to be available to the XPath engine.
      3. Predicates

        Predicates are used to filter nodes in a location path step. They appear within square brackets i.e. '[' and ']' after the node test.

        //first_name[. = 'Andy']
        This XPath expression selects all elements in the XML document named 'first_name' with a value equal to 'Andy'. A dot i.e. '.' indicates the self axis.
        //employee[age > 18 and age < 30]
        This XPath expression selects all 'employee' elements in the XML document which have a child element named 'age' with a value between 18 and 30.
        //employee[age >= 30][4]/first_name
        This XPath expression selects the 'first name' of the fourth 'employee' element which has a child element named 'age' with a value greater than or equal to 30. There are two predicates in this example. The first predicate filters the 'employee' elements which have an 'age' greater than or equal to 30, and the second predicate then filters these results and returns the 'first_name' of the fourth 'employee' with an 'age' greater than or equal to 30. The second predicate [4] is short form for [position()=4].
      4. Abbreviated Syntax

        Because of their frequent use, an abbreviated syntax exists for the following axis specifier and axis specifier node test combinations:

        • child
        • attribute
        • self::node()
        • descendant-or-self::node()
        • parent::node()
        child::If no axis is specified the 'child::' axis is assumed. 'child::' is the default axis.
        @attribute::'@' is the abbreviated form of the 'attribute::' axis.
        .self::node()'.' is the abbreviated form of the 'self::' axis and the 'node' node test.
        //descendant-or-self::node()'//' is the abbreviated form of the 'descendant-or-self::' axis and the 'node' node test.
        ..parent::node()'..' is the abbreviated form of the 'parent::' axis and the 'node' node test.
        [4][position() = 4]A predicate which contains only an integer value i.e. '[4]' is abbreviated form for '[position() = 4]'.