Your Ad Here

Is an API really needed?

Drew McClellan suggests that perhaps your website could be your API, using microformats, semantic HTML, and a little screen scraping.

Almost three years ago, John Udell suggested something similar and I wrote a rebuttal. Much of it applies to McClellan’s suggestions, too. Here’s a reposting of my original piece.


Jon Udell waxes nostalgic about the good old days of screen scraping HTML in order to build the first generation of Web services. That’s great and I’ve built my share of screen scraping applications as well. But then Udell goes on to propose that companies should abandon modern Web services technologies in favor of screen scrapes helped along by well-formed XHTML.

Udell’s reasoning is that Web services through SOAP is too complicated. “But if I’d had to register for an API key and locate WSDL documentation for each of the three services whose results I compared, I probably wouldn’t have bothered,” he says. His entire argument is based on his experiences with the Google API and their specific SOAP implementation.

Google requires that anyone using their API register for and use an API key — an identifying token that lets Google track the usage of their API down to a specific user or application. Google requires it, but the SOAP protocol does not. Most SOAP services don’t have any sort of key and if you were building a tool for an intranet, you probably wouldn’t need or want such a scheme. Not only does Udell miss that point, but he also forgets that SOAP isn’t the only Web services technology.

Udell says that a primary threat to your intranet is disuse. If people find it too difficult to create and use information on the intranet, they won’t bother. That’s true; if you create onerous processes that content creators must follow, they’ll find ways around them, publishing their information in ways that you don’t expect. But Udell’s assertion that building data access through Web services will make it too difficult for people to use your data is preposterous. Screen scraping is more difficult and more apt to fail than using stable, published APIs. And with REST, the APIs are just as easy to access as any other Web document.

As an example, let’s use product data for my new camera. What’s easier — scraping or getting it in XML format from their REST interface? For each method I have a unique URL that I request to get the data. There aren’t any complicated steps to follow for either system. But the HTML version, even if it were well-formed XHTML, would be significantly harder to retrieve meaningful data from. And changes to the display of the information would often mean changes to the structure of the HTML, necessitating further changes to my screen scraping application. Amazon does require a developer’s token (an API key, essentially), but again, that’s only so they can control usage. There’s no reason at at all that a REST system like this couldn’t be built without it.

But doesn’t creating a REST interface mean more work for the content producers? Probably not. Presumably your corporate intranet is using some sort of content management system. Otherwise there’d be no way to enforce this XHTML-only rule. Furthermore, that content management system probably stores the content in a database somewhere separate from the presentation of said content. All you need to do is build one REST interface that retrieves the required content from that database and presents it as a pre-determined XML document instead of an HTML document. The content producers could go along creating content as they always have, blissfully unaware that they are also populating a Web service.

Udell’s XHTML scraping suggestion has significant risks as well. Remember that making the process of content creation difficult will ensure that people find other ways to create content — ways that you don’t control. But in advocating screen scraping, Udell says, “it’s true that creating XHTML pages requires more discipline than hacking out HTML, and it may incur some retraining costs.” Not only are you going to make it difficult for people to build systems that automatically consume information, but you also propose making it more difficult to create it?

People will flock to things that are easy. RSS took off because it was easy to create and easy to consume. Sure, it would be possible to create screen scraping applications that would take any well-formed XHTML content source and pull that content into a newsreader. But it’s much easier for everyone concerned to create a simple, easy-to-understand format that contains all of the information in logical chunks and just run with it.

Drew McLellan
October 23, 2006 1:26 PM

Hi Adam. My presentation wasn’t about screen scraping at all - I didn’t even mention it. Not sure how you got that impression, but a podcast is available, so do take a listen if you get the chance.

(btw, it’s McLellan with one C, but no sweat).


Your comments:

Text only, no HTML. URLs will automatically be converted to links. Your email address is required, but it will not be displayed on the site.

Name:

Email: (not displayed)

If you don't feel comfortable giving me your real email address, don't expect me to feel comfortable publishing your comment.

Website (optional):

Lijit Search

Best Of

  • Debunking predictions Read/Write Web's authors have some goofy predictions.
  • Writing Realistic Job Descriptions Publish a job listing like this one and you are virtually guaranteeing that you won't get qualified applicants for the position.
  • Newly Digital Newly Digital is an experimental writing project. I've asked 11 people to write about their early experiences with computing technology and post their essays on their weblogs. So go read, enjoy, and then contribute. This collection is open to you. Write up your own story, and then let the world know about it.
  • Comment Spam Manifesto Spammers are hereby put on notice. Your comments are not welcome. If the purpose behind your comment is to advertise yourself, your Web site, or a product that you are affiliated with, that comment is spam and will not be tolerated. We will hit you where it hurts by attacking your source of income.
  • The importance of being good Starbucks is pulling CD burning stations from their stores. That says something interesting about their brand.
  • More of the best »

Recently Read

Get More

Subscribe | Archives

Recently

Sprout Test (May 7)
A test post for Sprout widgets.
Product Leadership (May 3)
An anthology of product leadership writing.
Fighting Monster patent claims (Apr 16)
The patent bully picked on the wrong little guy.
Peavy's pine tar (Apr 6)
Jake Peavy's cheating
Bush and Morgan on inner city baseball (Mar 30)
Morgan and Bush discuss the role of baseball in the inner cities.
Not a fork (Mar 27)
We have no intention of forking Drupal. That would be nuts. So what are we doing then?
Eating our dogfood in the sausage factory (Mar 26)
Recursive development for the new Drupal powered community platform.

Subscribe to this site's feed.

Elsewhere

Feed Crier
Get alerted by IM when your favorite web sites and feeds are updated.
SacStarts
The Sacramento technology startup community.
Pinewood Freak
Pinewood Derby tips and tricks
Del.icio.us
My tagstream at del.icio.us.
Waddlespot
My son's Club Penguin community. News, blogs, tips, and tricks.

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

AIM or Skype: akalsey

Resume

PGP Key

©1999-2008 Adam Kalsey.
Content management by Movable Type.