Referral Abuse

Freshness Warning
This article is over 15 years old. It's possible that the information you read below isn't current.

It appears that each time an RSS file from my site is loaded by one of these applications, a referer is deposited in the log file. Each time I load a page in Internet Explorer, I don’t leave a referer for www.microsoft.com/ie in the log files of the site whose page I loaded, so why should any of the RSS readers be different?

RSS readers misusing the referer field? (kottke.org)

Amen. I’ve always found it irritating that news aggregators insert their URL into the referrer field. Some aggregators have taken things a step further by allowing the user to use any arbitrary URL as the referrer. So I get 48 "referrals" each day from www.hardhathosting.com even though there’s not a single link from their site to mine. It’s not that I mind knowing where my readers are coming from—that’s kind of nice, and I’m glad the person behind Hardhat Hosting finds me interesting enough to grab the feeds for this this blog and Simplelinks once an hour. It just makes it difficult to distinguish a real referrer from someone else. If Hardhat Hosting were to put a real link on their site to mine and people followed it, I probably wound’t notice. I’d just think the referral logs were crying wolf.

As a temporary measure, you can use a custom URL for the referrals instead of your site’s home page or the home page for the aggregator. L.M. Orchard does this with Amphetadesk, setting the referrer to a thanks page on 0xDECAFBAD and I do this with Aggie. Sites I visit see a link to http://kalsey.com/blog/thanks/ in their referral logs.

It would be nice if there was some sort of browser header the aggregator could send to identify itself instead of using the referrer field. Oh, that’s right, there is. It’s called User-Agent.

The user agent field is designed for browsers, robots, and other user agents to identify themselves to the Web server. You can even add additional information, like a contact URL or email address. I’d like to see aggregators start using it.

Chris
January 30, 2003 10:42 PM

Thanks for the suggestion: http://www.hardhathosting.com/thanks.html

l.m.orchard
January 31, 2003 9:36 AM

Hallo there. I just changed my AmphetaDesk to stuff my thank-you page into the User-Agent: http://www.decafbad.com/news_archives/000405.phtml Seems like a simple enough thing to adopt everywhere, and if everyone agrees on the User-Agent format for aggregators, we might just beable to do something useful with it.

Adam Kalsey
January 31, 2003 2:26 PM

Good idea. Many of the popular news aggregators are open source, so let's just patch them ourselves. I've patched Aggie: http://kalsey.com/2003/01/patching_aggies_referrer/

Mark Paschal
January 31, 2003 4:31 PM

Pretty easy to do in Radio UserLand, if you only want to disable the spurious "Referer" header: http://markpasc.org/weblog/2003/01/31_end_referrer_abuse_in_radio.html

Ingve
January 31, 2003 4:45 PM

I think your summary ("Aggregators are misusing the http referrer header to identify themselves.") is overly harsh. This is not some evil spamming operation by aggregator makers to promote themselves or their software. The idea everybody was enthusiastic about was an attempt to create a more two-way web and give readers a way to "leave a trail", and the referer header was chosen to be able to piggyback on all the referer-reporting infrastructure that already existed in weblog publishing/hosting solutions.

Adam Kalsey
January 31, 2003 4:52 PM

My top five referrers for today are: radio.userland.com/newsAggregator ranchero.com/netnewswire/ ranchero.com/software/netnewswire/ frontier.userland.com/xmlAggravator www.disobey.com/amphetadesk/ Farther down the list I see: www.rassoc.com/newsgator www.syndirella.net/ None of those are being used to tell me who's reading my feed. The only trail that's being left is by the aggregator. It seems to me that these are there as a form of advertising by the aggregators.

Adam Kalsey
January 31, 2003 5:11 PM

And I understand that this is being done on purpose. Some readers (like Aggie, AmphetaDesk, and Radio) can display a pointer to my site in someone'e referral logs. The idea is to allow people to see who was reading their sites. But that's not how it's often implemented. Instead, you get an ad for Aggie. I understand that many people don't have access to their user-agent logs, but they do have referral information. But that doesn't make it right. As Kottke pointed out, it's like IE sticking it's own URL in the referrer header for every page it vists. If Microsoft did that, we'd all scream bloody murder, and rightfully so. But doing it as an upswell from the open source and blogging communities is okay? I lke Mork Nottingham's summary of the whole thing: "If I understand the reasoning behind this, it's that logs make referer available more often than user-agent, and some people were interested in seeing how much their particular agent is used. Unfortunately, misusing the headers for purposes of what frankly amounts to vanity screws the people who want to use them properly." http://groups.yahoo.com/group/syndication/message/3364

Ingve
January 31, 2003 5:13 PM

If the user doesn't configure a specific weblog url then you get the default. Aggregators should probably have a setup/configuration wizard. Since you're one of the lucky few people with access to server logs, what's the situation on Aggie referrals?

Ingve
January 31, 2003 5:51 PM

Mark's conclusion is right even though his reasoning is flawed (the point was to enable producers of syndicated content (webloggers) to see how often and by how many readers their content was fetched, not for aggregator writers to see how much their aggregator was used. Aggie had a traditional (non-user-customizable) user-agent header before it got the overloaded referer header.) I still think that using the address of a web-accesssible subscriptions file would be a "legal" referer header value, but I'm sure someone will tell me if I'm wrong. :-)

Adam Kalsey
January 31, 2003 6:29 PM

I'm not sure what you mean by "the situation on Aggie referrals." Userland stuffed their addresses in the referrer header long before they started adding the user's weblog URL to the field, according to http://radio.userland.com/moreVisibleInRefererLogs Like I said, I understand the reasons for doing so, but I still think it's a bad thing.

Ingve
January 31, 2003 7:06 PM

I also think it is a bad idea, but the intentions behind it were good. Most of the people who "discover" this horrible abuse tend to sound like it's being done for evil purposes and that aggregator writers must be stupid since they're not using the user-agent header. The Aggie referrals question was more or less how many "generic" Aggie referrals you get vs. how many Aggie users take advantage of the opportunity to provide you with potentially useful "hey, this is me and I'm reading you" information... Userland stuffing their address in the referer header provided a benefit for many users even before the user's weblog url inclusion by providing the *count* for how many times a resource was accessed. Noise to you and the server logs crowd, helpful to the many people with blogs on Manila sites etc. :-)

Adam Kalsey
January 31, 2003 10:47 PM

Most Aggie users don't take advantage of this. You do and Anders Jacobsen does, but there's several people that don't. That might be on purpose -- maybe they don't want me to know who they are, or that might be on accident. I also just noticed, while looking through the Aggie source, that you were the one that provided the patch to add the "I'm reading you" referrer advertising into Aggie. You might want to take a stab at my patch and see if you can improve things a bit. This was the first time I'd ever looked at C# code, so I'm sure there's a better way to do what I did.

Ingve
February 1, 2003 12:04 AM

Your patch has a few minor issues (a newline in constant error, and I can't really see where the aggieBase_ string is used now, probably just cut and paste problems) but I'm not sure that abusing the User-Agent header is a huge improvement. If nobody is using the ability to provide information about their feed reading habits then maybe we should just declare this experiment a failure and move on.

Jacques Distler
February 2, 2003 10:34 PM

Well, assuming that if the Request_URI is your RSS feed, one can regard the Referer as most likely being bogus, one can simply not log it. Easy to set up with SetEnvIF and CustomLog.

Rod
March 9, 2006 11:18 PM

Hard Hat Hosting got exactly what they wanted: a link from your site to theirs...

These are the last 15 comments. Read all 16 comments here.


Your comments:

Text only, no HTML. URLs will automatically be converted to links. Your email address is required, but it will not be displayed on the site.

Name:

Not your company or your SEO link. Comments without a real name will be deleted as spam.

Email: (not displayed)

If you don't feel comfortable giving me your real email address, don't expect me to feel comfortable publishing your comment.

Website (optional):

Follow me on Twitter

Best Of

  • California State Fair The California State Fair lets you buy tickets in advance from their Web site. That's good. But the site is a horror house of usability problems.
  • Best of Newly Digital There have been dozens of Newly Digital entries from all over the world. Here are some of the best.
  • How not to apply for a job Applying for a job isn't that hard, but it does take some minimal effort and common sense.
  • Newly Digital Newly Digital is an experimental writing project. I've asked 11 people to write about their early experiences with computing technology and post their essays on their weblogs. So go read, enjoy, and then contribute. This collection is open to you. Write up your own story, and then let the world know about it.
  • Lock-in is bad T-Mobile thinks they'll get new Hotspot customers with exclusive content and locked-in devices.
  • More of the best »

Recently Read

Get More

Subscribe | Archives

Recently

Encouraging 1:1s from other managers in your organization (Jan 4)
If you’re managing other managers, encourage them to hold their own 1:1s. It’s such an important tool for managing and leading that everyone needs to be holding them.
One on One Meetings - a collection of posts about 1:1s (Jan 2)
A collection of all my writing on 1:1s
Are 1:1s confidential? (Jan 2)
Is the discussion that occurs in a 1:1 confidential, even if no agreed in the meeting to keep it so?
Skip-level 1:1s are your hidden superpower (Jan 1)
Holding 1:1s with peers and with people far below you on the reporting chain will open your eyes up to what’s really going on in your business.
Do you need a 1:1 if you’re regularly communicating with your team? (Dec 28)
You’re simply not having deep meaningful conversation about the process of work in hallway conversations or in your chat apps.
What agenda items should a manager bring to a 1:1? (Dec 23)
At least 80% of a 1:1 agenda should be driven by your report, but if you also to use this time to work on things with them, then you’ll have better meetings.
Handling “I don’t have anything to talk about” in your 1:1s (Dec 21)
When someone says they have nothing to discuss, they’re almost always thinking too narrowly.
What should you talk about in a 1:1? (Dec 19)
Who sets the agenda? What should you discuss, and what should you avoid discussing?

Subscribe to this site's feed.

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

Twitter, etc: akalsey

Resume

PGP Key

©1999-2019 Adam Kalsey.