This is the blog of Adam Kalsey. Unusual depth and complexity. Rich, full body with a hint of nutty earthiness.
Freshness Warning
This blog post is over 20 years old. It's possible that the information you read below isn't current and the links no longer work.
17 Jan 2003
I read the contents of several hundred sites each day through my news aggregator. I’ve switched aggregators a few times over the last few years, starting with a home-grown web-based aggregator, followed by Radio, Amphetadesk, then Radio again, back to Amphetadesk, and currently Aggie. If the site doesn’t have a news feed, I either create one through a small scraping tool I built, or don’t bother to read it.
All the aggregators I’ve tried have a serious flaw. They let feed developers break the aggregator display. A single unclosed <em>
tag causes all the posts from all the sites from that point on to be italicized. That’s something that I can generally live with, but it’s possible for those unclosed tags to compound (or would that be to aggregate?) and make a mess of my display.
Today, one feed forgot to close an <em>
, another forgot to close a <b>
and another forgot to close a <small>
. What I ended up with was tiny, bold, italicized text that’s next to impossible to read.
The problem could have been worse. If someone forgot to add the >
to the end of a tag, the rest of the text on the page might be ignored, at least until a closing angle bracket occured.
But much worse, it’s possible for a feed developer to inject malicious code into their feed, and most aggregators would happily render the HTML to the browser. A bit of clever JavaScript in a feed that exploits a browser’s vulnerability to cross site scripting attacks could do quite a bit of damage.
I suggest that the various aggregator developers take steps to eliminate these problems. Check for unclosed tags and close them. Strip things like script tags from the feed before rendering them.
It was built in ASP. All it really did was grabbed the XML files, parsed them and cached the result. The admin interface was quite primitive. You can see a cached copy of the aggregator output on the Wayback Machine at http://web.archive.org/web/20010803122838/http://kalsey.com/news/
While it doesn't serve the immediate need, feeds listed on Syndic8 do get marked as bad when they produce invalid XML. You could have an aggregator double-check with Syndic8 (via XMLRPC) as to whether a feed is known to be working or not. That and you can always run it through the validator and send that URL to the feed author.
This discussion has been closed.
Meredith
January 18, 2003 3:10 AM
I'd love to hear more about the homegrown web-based aggregator you built. After getting frustrated with those written by others (including some you mentioned above), I'm trying to write my own using MT and mt-rssfeed.