Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xsl] NFA to DFA conversion using XSLT

From: "Mukul Gandhi" <gandhi.mukul@--------->
To:
Date: 6/1/2007 5:57:00 PM
Hi Mike,
  Thanks a lot for your reply, and insightful remarks. I now have
some new things to ponder about.

After I have read something more about parsing theory, I'll have more questions.



Perhaps XSLT is not the right language to implement parsers. But to
learn some concepts from the list, I had put my questions in the
context of XSLT.

On 6/1/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>
> I am toying with the following idea (a sample state table) to
> represent state table of NFA:

Looks a perfectly reasonable representation.



> I am
> following some examples and algorithms from the book,
> "Principles of compiler design - by Aho, Ullman).
>
> One of the algorithms for NFA to DFA conversion states, that
> we essentially follow the below two steps:
> 1) Compute e-CLOSURE (e stands for epsilon transitions) -
> which involves pushing & popping states from a stack.
> 2) The subset construction - which uses e-CLOSURE function to
> construct a DFA.

I've spent many happy hours studying those three pages of the book, because
Saxon implements those algorithms in its schema processor.

Doing the epsilon-closure of a state is quite easy, I think. Although the
algorithm given by Aho and Ullmann is procedural, it's not difficult to come
up with a (simpler) algorithm that's purely functional.

By contrast, the subset construction algorithm doesn't intuitively translate
into functional form at all, I think you would have to devise a completely
different algorithm, and proving its correctness would not be at all easy.
For what it's worth, I implemented an "optimization" to the Aho and Ullman
algorithm and it took nearly three years before I discovered it was
incorrect - it ran all the 6000 or so test cases in the Schema test suite
without problems, but eventually tripped up on a very innocuous-looking
user-written content model.

I haven't even looked at the state minimization algorithms. I don't see the
point: all the performance issues are to do with the speed of determinizing
the FSA, not with the size of the resulting DFA. Though that might be
because of the lengths XML Schema goes to to disallow ambiguous grammars; if
you were compiling regular expressions it might be more of an issue, I don't
know.

Michael Kay
http://www.saxonica.com/




--
Regards,
Mukul Gandhi


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent