Altova Mailing List Archives>Archive Index >microsoft.public.xml Archive Home >Recent entries >Thread Prev - MSXML - LINEAR cost taking a SINGLE step of IXMLDomNodeList enumerator?? [Thread Next] Re: MSXML - LINEAR cost taking a SINGLE step of IXMLDomNodeList enumerator??To: NULL Date: 12/5/2007 7:53:00 PM
"lab27" <lee.benfield@g...> wrote in message
news:795f1bf0-544a-481f-ba38-2649a6f67728@p......
> When walking a large DomNodeList using the IEnumVariant interface
> (I've included VB code below, but obviously it's the same in C++ using
> COM enumerators), the cost of going to the next element seems to be
> proportional to the TOTAL length of the list!! Is this right?
>
> I haven't found this mentioned anywhere, maybe I'm just not googling
> right ;)
>
> i.e. walking the first 1000 nodes of a 2000 node list is massively
> cheaper than walking the first 1000 nodes of a 20000 node list. - in
> practice this means if you're traversing the whole list your cost is
> n^2, which on the large lists I'm dealing with is causing a bit of
> pain....
>
> This is pretty confusing - it doesn't look like the list is being
> generated lazily (and even if it was I'd not expect linear behaviour
> like this for accessing a prefix) - it almost looks like it's re-
> evaluating the predicate each iteration step! ;)
>
> Anyone know anything about this? Behaviour doesn't seem to differ
> between msxml4 / 6...
>
> Thanks,
>
> Lee.
>
> -----8<---------- sample VBA to demo, needs ref to msxml4/6.
>
> Private Declare Function QueryPerformanceCounter& Lib
> "kernel32.dll" (l As pbdw)
> Private Declare Function QueryPerformanceFrequency& Lib
> "kernel32.dll" (l As pbdw)
>
> Private Type pbdw
> bottom As Long
> top As Long
> End Type
>
> Public Function tickspers() As Double
> Static l As Long
> If l = 0 Then
> Dim ll As pbdw
> Call QueryPerformanceFrequency(ll)
> l = ll.bottom
> End If
> ticksperms = l
> End Function
>
> Public Function ticks() As Double
> Dim ll As pbdw
> Call QueryPerformanceCounter(ll)
> ticks = ll.bottom
> End Function
>
> Private Sub add_node(ByRef s As String)
> s = s & "<node><a>1</a><b>2</b></node>"
> End Sub
>
> Private Sub check_timing(ByVal s As String)
> s = s & "</doc>"
>
> Dim d As DOMDocument
> Set d = New DOMDocument40 ' or domdocument60,
> If Not d.loadXML(s) Then Call Err.Raise(10101, , "Didn't load")
>
> Dim n As IXMLDOMNode
> Set n = d.documentElement
> Dim dn As IXMLDOMNodeList
>
> Set dn = n.selectNodes("node")
> If dn.Length < 1000 Then Call Err.Raise(10102, , "Odd...")
>
> Dim dstart As Double, dstop As Double
> dstart = ticks()
>
> Dim i As Long
> i = 0
> Dim count As Long
> For Each n In dn
> i = i + 1
> count = count + Len(n.Text)
> If (i = 1000) Then
> GoTo breakout
> End If
> Next n
> breakout:
> dstop = ticks()
> Debug.Print Join(Array(dn.Length, (dstop - dstart) / tickspers,
> count), " , ")
> End Sub
>
> Public Sub test()
> Dim doc As String
> doc = "<doc>"
> Dim l As Long
> For l = 1 To 1000
> Call add_node(doc)
> Next l
>
> Dim step_ As Long
> step_ = 1000
>
> Dim stepstr As String
> Dim x As Long
> For x = 1 To step_
> Call add_node(stepstr)
> Next x
>
> For l = 1000 To 40000 Step step_
> Call check_timing(doc)
> doc = doc & stepstr
> Next l
> End Sub
>
>
The culprit is this line in your check_timing routine:-
count = count + Len(n.Text)
Specifically the use of n.Text. It also isn't related to the number of
nodes selected try this change:-
Set dn = n.selectNodes("node[position() < 1001]")
You'll notice that dn.Length is consitently 1000 yet the growth in time
taken is still proportional to the full set of nodes. IOW, the cost of
retrieving .Text is proportional to the size of document that the node
belongs to. You can try this change to keep it to a single simple element:-
Set dn = n.selectNodes("node[position() < 1001]/a")
It's still bad. Take it a step further lets get the text node itself:-
Set dn = n.selectNodes("node[position() < 1001]/a/text()")
Nope still just as bad.
Now make this change
count = count + Len(n.nodeValue)
Suddenly it much quicker and consistent. Make these changes:-
Set dn = n.selectNodes("node[position() < 1001]/a")
count = count + Len(n.firstChild.nodeValue)
Again consistent, lets go back to selecting the node and build the text
string ourselves.
Set dn = n.selectNodes("node[position() < 1001]")
count = count + Len(n.firstChild.firstChild.nodeValue &
n.firstChild.nextSibling.nodeValue)
Ugly but still a couple of orders of magnitude faster than .Text and
consistant despite document size.
Finally remove the position predicate. Just as fast.
Why would using the Text property be so costly? I haven't got the foggiest,
very odd and annoying.
--
Anthony Jones - MVP ASP/ASP.NET
| ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
