If you want to add a new HTML page source, click the Add Source toolbar button. This brings up the first screen of the Add Page Source dialog (screenshot below). You now have two options:
•To create the structure manually yourself, select New, empty XML, HTML or JSON.
•To import the structure from a file, select New XML, HTML or JSON structure imported from file. If the XPath-based file name option is selected, then the Edit XPath/XQuery Expression dialog appears and you can build an XPath expression to generate the file URL you need. Otherwise, a dialog box appears in which you can select the file that provides the structure of the page source. You can browse for the file, or use a file URL or global resource.
Click Next to go to the second screen of the dialog. Here, you (i) specify that the data type of the page source must be HTML and (ii) define the other properties of the new page source. If you are not sure about how to define these properties, then use the default settings. You can always change the settings later by right-clicking the root node of the page source.
When you click Finish, a root node named $HTML is created for the new page source (see screenshot below). You can change the name of the root node if you like by double-clicking in it to edit. If you specified that the structure of the page source must be imported from a file, then, on clicking Finish, you will be prompted to select an HTML file. The page source $HTML will, in this case, be created with the structure of the selected file.
You can now (i) create or modify the structure of the page source via the toolbar commands, and (ii) add data to nodes of the page source. How to do this is described in the section, Tree Data.
Note that HTML retrieval is done using a correcting parser. As a result, if an imported HTML structure has an invalid data object model because of missing elements (according to the HTML 5 specification), then these missing elements are added to the page source tree in the Page Sources Pane. For example:
will be corrected to:
Note: You can change the data type of the page source (to XML or JSON) via the root node's context menu command Data Type.