Broken aggregator

I read the contents of several hundred sites each day through my news aggregator. I’ve switched aggregators a few times over the last few years, starting with a home-grown web-based aggregator, followed by Radio, Amphetadesk, then Radio again, back to Amphetadesk, and currently Aggie. If the site doesn’t have a news feed, I either create one through a small scraping tool I built, or don’t bother to read it.

All the aggregators I’ve tried have a serious flaw. They let feed developers break the aggregator display. A single unclosed <em> tag causes all the posts from all the sites from that point on to be italicized. That’s something that I can generally live with, but it’s possible for those unclosed tags to compound (or would that be to aggregate?) and make a mess of my display.

Today, one feed forgot to close an <em>, another forgot to close a <b> and another forgot to close a <small>. What I ended up with was tiny, bold, italicized text that’s next to impossible to read.

The problem could have been worse. If someone forgot to add the > to the end of a tag, the rest of the text on the page might be ignored, at least until a closing angle bracket occured.

But much worse, it’s possible for a feed developer to inject malicious code into their feed, and most aggregators would happily render the HTML to the browser. A bit of clever JavaScript in a feed that exploits a browser’s vulnerability to cross site scripting attacks could do quite a bit of damage.

I suggest that the various aggregator developers take steps to eliminate these problems. Check for unclosed tags and close them. Strip things like script tags from the feed before rendering them.

Meredith
January 18, 2003 3:10 AM

I’d love to hear more about the homegrown web-based aggregator you built. After getting frustrated with those written by others (including some you mentioned above), I’m trying to write my own using MT and mt-rssfeed.

Adam Kalsey
January 18, 2003 8:59 AM

It was built in ASP. All it really did was grabbed the XML files, parsed them and cached the result. The admin interface was quite primitive.

You can see a cached copy of the aggregator output on the Wayback Machine at http://web.archive.org/web/20010803122838/http://kalsey.com/news/

Bill Kearney
January 21, 2003 2:57 PM

While it doesn’t serve the immediate need, feeds listed on Syndic8 do get marked as bad when they produce invalid XML. You could have an aggregator double-check with Syndic8 (via XMLRPC) as to whether a feed is known to be working or not. That and you can always run it through the validator and send that URL to the feed author.


Your comments:

Text only, no HTML. URLs will automatically be converted to links. Your email address is required, but it will not be displayed on the site.

Name:

Not your company or your SEO link. Comments without a real name will be deleted as spam.

Email: (not displayed)

If you don't feel comfortable giving me your real email address, don't expect me to feel comfortable publishing your comment.

Website (optional):

Follow me on Twitter

Lijit Search

Best Of

Recently Read

Get More

Subscribe | Archives

Recently

Ideas, Risk, and Investors (Jan 1)
Over at SacStarts, I have piece up discussing a common question I get from entrepreneurs....
VoiceXML for web developers (Dec 17)
Building voice applications isn't hard at all. Any web developer can do it.
De-skunking a dog (Oct 27)
How to clean up your pet after a skunk attack.
Pressure sales via Twitter (Oct 16)
Sticking an ad in my face when we first meet is a good way to lose my interest.
Loma Prieta, 20 years later (Oct 13)
Looking at the earthquake from October 17, 1989
Red light cameras don't work (Oct 13)
Cameras installed to catch people running red lights aren't about traffic safety at all.
Jack-o-lantern pumpkin carving patterns (Oct 12)
It's a tradition, what can I say?
SEO realities (Oct 12)
The real search engine optimization. Works every time.

Subscribe to this site's feed.

Elsewhere

IMified
Build instant messaging applications. (My company)
SacStarts
The Sacramento technology startup community.
Pinewood Freak
Pinewood Derby tips and tricks

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

AIM or Skype: akalsey

Resume

PGP Key

©1999-2010 Adam Kalsey.
Content management by Movable Type.