Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries >Thread Prev - HTML to XML call [Thread Next] Re: HTML to XML callTo: NULL Date: 5/3/2007 10:46:00 AM joeraymond@g... <joeraymond@g...> wrote in <1178117559.981093.272090@p...>: > I am trying to figure out if it would be possible to write > an app that could visit certain website, obtain the > relevant data from those pages and convert it into a > readable XML file. What's the problem? Retrieve the document using your http library of choice, parse it using your HTML parser of choice, feed the DOM document to your XSLT processor of choice, serialize the resulting DOM document (using, well, your serializer of choice). Better yet, stuff the resulting DOM document into your XML database of choice. Heck, make it RDF/XML, stuff it into some SPARQL-aware storage and venture capitalists will be swarming all over you. I would, if I had a few million bucks I didn't need. Providing third-party semantics for all the meatbag-parsable content on the web will be quite an industry some ten years from now I think. Okay, scratch that, I'm just daydreaming. But the basic process for your task would be what I described indeed (the real heart of it would be an XSLT transformation that would suck the relevant data out of your source document, naturally). > All I would like to know is how would I go about starting > this task, and any advice comments anyone has on it. My professional opinion is that if you don't even know where to start, you're not really up for it. I hope you don't need it for anything serious--but it certainly would make an excellent exercise in many modestly exciting areas. -- Pavel Lepin | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
