Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: Get untagged text between elements

From: Peter Flynn <peter.nosp@-.--------.-->
To: NULL
Date: 2/4/2007 5:12:00 PM

FMAS wrote:
> On 4 Feb., 00:03, Peter Flynn <peter.n...@m.silmaril.ie> wrote:
>> FMASwrote:
>>> I am transforming XML files which sometimes have untaged text between
>>> elements.
>>> It looks like this:
>>> <ut Style="external" DisplayText="bb">&lt;bb &quot;Document Numbering
>>> Formats&quot;&gt;</ut>
>>> <ut Style="external" DisplayText="bb">&lt;bb &quot;Custom Footnote
>>> Numbering&quot;&gt;</ut>*†‡
>>> <ut Style="external" DisplayText="bb">&lt;bb &quot;Volume
>>> Numbering&quot;&gt;</ut>
>>> No problem to transform the content of tags. But I do not find a way
>>> to copy untagged text such as "*†‡" in the example above.
>>> Any suggestion?
>> a) This looks like rather poor design by someone who really
>>     hasn't grasped the idea of XML properly.
>>
>> b) It also appears that they are trying to hide data marked in
>>     another language within the XML.
>>
>> c) Use the text() function to access character data in Mixed Content
>>     (which is what this resembles).
>>
>> ///Peter
>> --
>> XML FAQ:http://xml.silmaril.ie/- Zitierten Text ausblenden -
>>
>> - Zitierten Text anzeigen -
> 
> I agree that this is ugly XML but it get parsed witout errors. 

Well-formed documents will parse without errors. Assuming there is an 
outer containing element to your example, it's well-formed. But 
well-formed doesn't mean well-designed. I had a car once that was 
well-formed (it had four wheels and it went forwards and backwards) but 
I wouldn't have called it well-designed :-)

> The
> untaged text varies. It may be also numbers for chapters etc. I also
> thougt about the text() function, but it doesn' seem to help here as
> we are talking about input and not output. 

Not relevant. As Dmitre pointed out, text() isn't a function even though 
it looks like one: I was trying to oversimplify (usually a bad idea :-) 
It's actually a node type which will match any stretch of unmarked text.
[That too is an oversimplification, but it will do for the moment.]

> I have a structure like this:
> 
> <tag>text I can get</tag>text I can't get<tag>text2 I can get</tag>

The content of the two tag elements can -- as you have presumably found 
-- be accessed with a template matching "tag". [Please note the 
difference between a tag and an element: it's very difficult to explain 
anything using the wrong terms.]

As you don't give the name of the containing element it's harder to give 
an example of how to access the text between them, but let's assume that 
what you *really* have is approximately:

<root>
   <foo>text I can get</foo>text I can't get<foo>text2 I can get</foo>
</root>

There are three unmarked text nodes in the root element:

a) newline-space-space (before the first foo start-tag)
b) text I can't get (between the foo elements)
c) newline (after the last foo end-tag)

A template matching root/text() will access them. Making it 
root/text()[2] will give you (b) only.

> I wonder if I have to preprocess the XML file, maybe with perl. But if
> it can be avoided so much the better.

XSLT can do this unaided.

///Peter
-- 
XML FAQ: http://xml.silmaril.ie/


transparent
Print
Mail
Digg
delicious
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent