Altova Mailing List Archives


Re: newbie: Word2003 -> XML -> SQL Server

From: Kent Tegels <ktegels@-------.--->
To: NULL
Date: 3/18/2006 1:12:00 PM
Hello jockster,

> Apologise for the simplistic newbie question.
> 
> I need to get transform many (100's) of word documents into XML for
> import into SQL Server 2000 (or 2005) in an easy and efficient manner.
> Objective is as follows,
> 
> 1. Sections/paragraphs of the individual word documents can be queried
> and then published from SQL Server.

Your best choice here then is to have save the Word Docs as XML [0] (Office 
2003 can do that and you should be able to write a little VSTO application 
that does over a folder-full of directories as needed).

> 2. Should be able to (easily if possible) tag sections/paragraphs of
> word documents (from within Word) prior to conversion to XML.Tagging
> information should then be able to be imported and used to facilitate
> search for documents in SQL Server

Not sure what you're really aiming at here. The OfficeXML format in O11 writes 
paragraphs nicely and you might be able to mark sections and so on using 
existing features in Word itself. You should then be able to use XQuery over 
that in SQL Serve 2005.

> 3. If it was possible to create a template within Word to better
> enable the above then that would be good for when new documents are
> produced by original authors

That should be completely doable, and then you help make the document itself 
more intelligent by using VSTO as well. More on VSTO at [1].

> If people could suggest methods, tools and or books/articles
> explaining how the above can be acheived I would be very grateful

Aside from the VSTO Link, there's an article or two in my blog about loading 
and querying XML documents into SQL Server. The basic trick is to use some 
like:

insert into dbo.docs(doc) select * from openrowset(bulk 'path-to-file',single_blob) 
as p 

to load

and then use something like:

select doc.query('...') from dbo.docs where doc.exist(...) 

to find whatever it is your looking for. Its pretty easy to do against the 
document properties. Content queries are helped much by combining Full-Text 
Search and XQuery.

[0]: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wordxmlcdk/html/WelcomeWordCDK_HV01147170.asp
[1]: http://msdn.microsoft.com/office/understanding/vsto/default.aspx?pull=/library/en-us/odc_vsto2005_ta/html/officewhatsnewinvsto2005.asp

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels


Disclaimer

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.