Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: Word 2007 XML merge & PDF conversion on Unix

From: Peter Flynn <peter.nosp@-.--------.-->
To: NULL
Date: 8/4/2007 10:04:00 PM

[Jongware] wrote:
> "Praveen Mohanan" <nospam@n...> wrote in message
> news:_R3ti.1360$qa3.1073@n......
>> I can convert all the Word documents to Word 2007 xml & store on unix
>> platform.
>>
>> The Q I have is If I use xml/xslt to merge the data with the Word XML
>> document & then store it back as an xml on unix ,how do I convert into PDF?
> 
> 1. You don't "convert" something to PDF. Ever. Please repeat for yourself.
> PDF is printer output, just as the paper from your printer. Have you ever
> converted something to paper?
> So, if PDF is printer output, your question becomes: "how do I print an xml on
> unix to pdf?"
> 
> 2. XML is an abstract data format. If you print XML, you'll get lots and lots of
> <this>stuff</this>. "Hey, now I *know* you're wrong! My Word file can be
> converted to XML!" Not so. Your Word file, saved as XML, is not different from
> the Word .DOC file (well, it shouldn't be). It is saved in another output
> format, yes, but you can't print your .DOC file to a printer either. (No you
> can't. You need Word to read the byte codes and interpret them for you.)
> 
> 3. The only way you can print your XSLT'ed file in the format you expect (a
> nice-looking text document, not line after line of <..>'s) is if you ensured
> your output XML format is still readable by Word. Then you can use Word to print
> to PDF.

The first two are right on target, but the third not the only answer.
If the merged data+text are now in an XML format, you can use XSL[T] 
transform to PDF by one of two methods:

XSL:FO --> FO --> PDF using any FO processor
XSLT --> LaTeX -- PDF using LaTeX

Both work fine: LaTeX has better typographics but you have to learn it 
and it's not written in Java (some process pipelines demand end-to-end 
Java). Using XSL:FO you have to reinvent the wheel every time, and the 
only free processor (fop) is incomplete.

Either way it's going to be tedious because Word does not identify the 
important parts of your document in a form that a computer can 
recognise, only its appearance in a form that human eyes and brain can 
understand, unless your authors have used specifically designed styles 
in a template. If you allowed your authors to put anything anywhere, in 
any format they wanted, you will now have to cope with the result, which 
can be painful.

///Peter
-- 
XML FAQ: http://xml.silmaril.ie/


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent