![]() |
![]() | ![]() | ![]() | Altova Mailing List Archives>Archive Index >microsoft.public.xsl Archive Home >Recent entries >Thread Prev - Re: Get untagged text between elements [Thread Next] Re: Get untagged text between elementsTo: NULL Date: 2/4/2007 5:12:00 PM FMAS wrote: > On 4 Feb., 00:03, Peter Flynn <peter.n...@m.silmaril.ie> wrote: >> FMASwrote: >>> I am transforming XML files which sometimes have untaged text between >>> elements. >>> It looks like this: >>> <ut Style="external" DisplayText="bb"><bb "Document Numbering >>> Formats"></ut> >>> <ut Style="external" DisplayText="bb"><bb "Custom Footnote >>> Numbering"></ut>*†‡ >>> <ut Style="external" DisplayText="bb"><bb "Volume >>> Numbering"></ut> >>> No problem to transform the content of tags. But I do not find a way >>> to copy untagged text such as "*†‡" in the example above. >>> Any suggestion? >> a) This looks like rather poor design by someone who really >> hasn't grasped the idea of XML properly. >> >> b) It also appears that they are trying to hide data marked in >> another language within the XML. >> >> c) Use the text() function to access character data in Mixed Content >> (which is what this resembles). >> >> ///Peter >> -- >> XML FAQ:http://xml.silmaril.ie/- Zitierten Text ausblenden - >> >> - Zitierten Text anzeigen - > > I agree that this is ugly XML but it get parsed witout errors. Well-formed documents will parse without errors. Assuming there is an outer containing element to your example, it's well-formed. But well-formed doesn't mean well-designed. I had a car once that was well-formed (it had four wheels and it went forwards and backwards) but I wouldn't have called it well-designed :-) > The > untaged text varies. It may be also numbers for chapters etc. I also > thougt about the text() function, but it doesn' seem to help here as > we are talking about input and not output. Not relevant. As Dmitre pointed out, text() isn't a function even though it looks like one: I was trying to oversimplify (usually a bad idea :-) It's actually a node type which will match any stretch of unmarked text. [That too is an oversimplification, but it will do for the moment.] > I have a structure like this: > > <tag>text I can get</tag>text I can't get<tag>text2 I can get</tag> The content of the two tag elements can -- as you have presumably found -- be accessed with a template matching "tag". [Please note the difference between a tag and an element: it's very difficult to explain anything using the wrong terms.] As you don't give the name of the containing element it's harder to give an example of how to access the text between them, but let's assume that what you *really* have is approximately: <root> <foo>text I can get</foo>text I can't get<foo>text2 I can get</foo> </root> There are three unmarked text nodes in the root element: a) newline-space-space (before the first foo start-tag) b) text I can't get (between the foo elements) c) newline (after the last foo end-tag) A template matching root/text() will access them. Making it root/text()[2] will give you (b) only. > I wonder if I have to preprocess the XML file, maybe with perl. But if > it can be avoided so much the better. XSLT can do this unaided. ///Peter -- XML FAQ: http://xml.silmaril.ie/ | ![]() | ![]() | ![]() |
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | |||||
|
