Broken aggregator

Freshness Warning
This article is over 9 years old. It's possible that the information you read below isn't current.

I read the contents of several hundred sites each day through my news aggregator. I’ve switched aggregators a few times over the last few years, starting with a home-grown web-based aggregator, followed by Radio, Amphetadesk, then Radio again, back to Amphetadesk, and currently Aggie. If the site doesn’t have a news feed, I either create one through a small scraping tool I built, or don’t bother to read it.

All the aggregators I’ve tried have a serious flaw. They let feed developers break the aggregator display. A single unclosed <em> tag causes all the posts from all the sites from that point on to be italicized. That’s something that I can generally live with, but it’s possible for those unclosed tags to compound (or would that be to aggregate?) and make a mess of my display.

Today, one feed forgot to close an <em>, another forgot to close a <b> and another forgot to close a <small>. What I ended up with was tiny, bold, italicized text that’s next to impossible to read.

The problem could have been worse. If someone forgot to add the > to the end of a tag, the rest of the text on the page might be ignored, at least until a closing angle bracket occured.

But much worse, it’s possible for a feed developer to inject malicious code into their feed, and most aggregators would happily render the HTML to the browser. A bit of clever JavaScript in a feed that exploits a browser’s vulnerability to cross site scripting attacks could do quite a bit of damage.

I suggest that the various aggregator developers take steps to eliminate these problems. Check for unclosed tags and close them. Strip things like script tags from the feed before rendering them.

Meredith
January 18, 2003 3:10 AM

I’d love to hear more about the homegrown web-based aggregator you built. After getting frustrated with those written by others (including some you mentioned above), I’m trying to write my own using MT and mt-rssfeed.

Adam Kalsey
January 18, 2003 8:59 AM

It was built in ASP. All it really did was grabbed the XML files, parsed them and cached the result. The admin interface was quite primitive.

You can see a cached copy of the aggregator output on the Wayback Machine at http://web.archive.org/web/20010803122838/http://kalsey.com/news/

Bill Kearney
January 21, 2003 2:57 PM

While it doesn’t serve the immediate need, feeds listed on Syndic8 do get marked as bad when they produce invalid XML. You could have an aggregator double-check with Syndic8 (via XMLRPC) as to whether a feed is known to be working or not. That and you can always run it through the validator and send that URL to the feed author.


Your comments:

Text only, no HTML. URLs will automatically be converted to links. Your email address is required, but it will not be displayed on the site.

Name:

Not your company or your SEO link. Comments without a real name will be deleted as spam.

Email: (not displayed)

If you don't feel comfortable giving me your real email address, don't expect me to feel comfortable publishing your comment.

Website (optional):

Follow me on Twitter

Lijit Search

Best Of

  • Let it go Netscape 4 is six years old.
  • California State Fair The California State Fair lets you buy tickets in advance from their Web site. That's good. But the site is a horror house of usability problems.
  • Comment Spam Manifesto Spammers are hereby put on notice. Your comments are not welcome. If the purpose behind your comment is to advertise yourself, your Web site, or a product that you are affiliated with, that comment is spam and will not be tolerated. We will hit you where it hurts by attacking your source of income.
  • Customer reference questions. Sample questions to ask customer references when choosing a software vendor.
  • Lock-in is bad T-Mobile thinks they'll get new Hotspot customers with exclusive content and locked-in devices.
  • More of the best »

Recently Read

Get More

Subscribe | Archives

9

Recently

invisible Fence (Mar 22)
The New York Times has a paywall now. Sorta. If you don't choose to ignore it.
Black status icon for Chrometa (Mar 17)
Replacing the status icon of Chrometa
Using Google Voice as your voicemail on AT&T (Oct 26)
How I set up my iPhone to use Google Voice as it's voicemail system.
Don Mattingly forced to make coaching change (Sep 17)
New LA Dodgers coach starts to wonder if he knows the rules of baseball at all.
In which Vonage pretends their prices haven't changed (Apr 12)
Translating what Vonage marketing says about their price increase into plain English.
Twitter app competition (Apr 12)
Life as a Twitter app developer is far from over.
Twitter app competition (Apr 12)
Life as a Twitter app developer is far from over.
The rest of the world is not like you (Apr 5)
Normal people are different. Keep that in mind when creating or marketing a product.

Subscribe to this site's feed.

Elsewhere

IMified
Build instant messaging applications. (My company)
SacStarts
The Sacramento technology startup community.
Pinewood Freak
Pinewood Derby tips and tricks

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

AIM or Skype: akalsey

Resume

PGP Key

©1999-2012 Adam Kalsey.
Content management by Movable Type.