Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Resolving entities using MSXML

From: mikeidge@-----.---
To: NULL
Date: 5/5/2009 7:17:00 AM
I'm trying to parse an XML document using MSXML v4 in C++, using my
own entity resolver to re-direct the parser at local DTDs on my own
hard-drive, rather than allowing the parser to go online to get the
DTDs from their locations as specified in the XML file being parsed.
I've managed to get this working with Xerces, but the behaviour I'm
experiencing in MSXML seems somewhat bizarre.

The XML document I'm reading is completely valid, so I don't expect
any errors to be reported by the parser. Indeed, this is the case if I
let the parser go online to get the DTD files by leaving the pvarInput
VARIANT pointer NULL in the resolveEntity(...) callback. However, as
soon as I try to supply the parser with the text of the identical DTDs/
MODs sourced from my local disk, I get the following error: "incorrect
document syntax", which, apparently, occurs on the first line of the
XML file. This doesn't happen for all of the DTD files though; in the
case I'm trying to debug, the first DTD it asks for works with no
problems, but I get the error as soon as it tries to use the second
MOD file it asks for.

I'm new to COM, so it's quite possible I'm doing something pretty
stupid which is why this isn't working. Essentially, what I'm doing
(using a simplified pseudo String class) is this:

HRESULT __stdcall resolveEntity(unsigned short* pwchPublicId, unsigned
short* pwchSystemId, VARIANT* pvarInput)
{
    // Get the file name without its path
    String systemId = pwchSystemId;
    const int idx = systemId.FindLastChar(L('/'));

    String fileName = systemId;
    if (idx > -1) {
        fileName = systemId.SubString(idx + 1);
    }

    // All the DTDs/MODs are in UTF-8 format, so load the file in
memory and convert it to a unicode string
    String fileContent = LoadFileAsUTF8ConvertToUnicode(fileName);

    CComBSTR data(fileContent);
    data.CopyTo(pvarInput);
    data.Detach(); // Unsure of ownership semantics, so this might not
be necessary
}

The fact that this works with some files but not with others is
particularly baffling. I've made sure that the "fileContent" variable
contains valid unicode content for all the affected files, so it's
nothing to do with any bugs that might exist in my UTF8 conversion
code. It's definitely something in the last three lines of code that
MSXML is taking offence to, but I can't work out what it is!

Any help at all in respect of resolving entities in MSXML would be
greatly appreciated. I can find very little about the subject at all
online.


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent