Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries >Thread Prev - Aggregation of RSS Feeds >Thread Next - Re: Aggregation of RSS Feeds Re: Aggregation of RSS FeedsTo: NULL Date: 3/3/2006 11:06:00 PM On 3 Mar 2006 09:45:41 -0800, jamesjacobyu@g... wrote: >I'm programming an aggregator that keeps track of a large number of >feeds (basically an rss reader). The problem is, I want an automatic >way to know when sites have updated, There are several ways. Register with an update service (look at "clouds" in the Winer specs (RSS 0.92/0.94 and RSS 2.0) Ask for the RSS document by HTTP and look at the headers received. This often doesn't work, because the "Last modified" date is set to the date of serving the document by badly coded servers. You might also be able to use a HTTP HEAD command rather than a GET, so you don't have to download the whole document (rarely implemented though). Download an RSS 2.0 document, or an RSS 1.0 document that uses the Syndication module, and look at the suggested time to revisit after. Download the document, hash it to a signature (SHA1 or MD5 is easy to find code for, but you might want to normalise the XML first). When the signature changes, assume it's a changed document. Develop your own "revisit after" estimation, based on how often the document actually changes. Randomly vary the time your server revisits, so as to track update frequencies that vary over time (many blogs are quite unpredictable). Some combination of the last two techniques. Just download the document anyway. | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
