Distributed comment spam prevention

Earlier I mentioned some ideas for preventing comment spam. Thanks to a TrackBack ping, I found out that Simon Willison had been discussing the same thing yesterday. I need to read Simon more often. This is the second time that I’ve been working on something only to find out that he’s doing something similar.

Simon’s offering a blacklist of domains that are used in his spam, and that gave me an idea. Combine a distributed blacklist with my distributed anti-spam concept. Sites could participate by sending the IP address, URL, and a digest of the comment body (an MD5 hash would work) to a central server or a cloud of servers. If the server saw that the same comment was being posted multiple places within a short time period it would send a ping to all participating sites. The ping would contain the IP address and URL of the spammer. The sites would then use this information to ban further comments from that site and IP. Ideally the ban would be temporary to minimize the impact of false positives, but that would be up to the site’s software.

Essentially, this would create an organic system that responds to wholesale comment spamming in real time. This wouldn’t solve the problem of someone posting an individual comment on a single site, but that’s not really the way spammers work. For spam to be effective, it needs enormous volume. And the only way to have that sort of posting volume is to automate it.

Adam Kalsey
October 10, 2003 10:12 AM

Bots would simply set the checkbox and submit it. There’s all sorts of things you could do with JavaScript, if you want to require the user has JS before submitting a comment.

For instance, you could have a checkbox that alters the value of a hidden field through JavaScript. Ignore or moderate any postings that don’t have the correct value in the hidden field.

Trackback from random ruminations
October 11, 2003 9:21 AM

Comment Spam

Excerpt: I've been struck with comment spam three times in the last week. I don't know if this means that, suddenly, my blog has hit the radar screens of whatever search engine spammers use, or if I'm just lucky. Regardless, the first time is was mild, the seco...

Rick
November 7, 2003 9:01 AM

i’ve noticed a trick to get rid of comment noise when filtering. SPAM random characters will still allow the message to be read (otherwise the spam would have no impact). So they usually insert non-alphanumeric characters in the comment subject. Here’s a small formula that i’d like to try out in your anti-spam blocker.

1) Perform an anti-l337 filter. A simple translation table will do the job. (result must be always lowercase) 2) Strip spaces and non alphabetic characters. 3) Change 2-character sequences for their phonetic equivalent (i.e. ph -> f ). Simple translation tables also work.

4) There you go. The message has been filtered and ready for digest.

Example:

Phr’33 v149r4 ’ ph; 0r .U

Step 1 - Anti 1337 filter:

Phr’ee viagra ’ ph; or .u

Step 2 - Strip non alphanum chars:

phreeviagraphoru

Step 3 - 2-char Phonetic replacement:

freeviagraforu

We could have a massive test and then perhaps, with some “scientific” research build a database, who knows. Still, the content has been filtered and ready for a keyworkd search. The keywords that can be found by any simple search routine are “free”,”viagra”,”for”.

The trick here is that spammers are cheapstakes. They won’t do artificial intelligence programs to fool spam filters. They will use insted simple translation tables. Therefore, simple translation tables can also be used to decrypt their subject fields.

About input forms, I find this one easy.

1) Use sessions cookies.

2) For each mail submit form, include a delay of 3 seconds before processing the submission.

3) Include the hidden random fileld. That will ensure that a same mail form will only be processed once.

This will ensure that the spammer will at least have to wait 3 seconds between mail submissions. This will narrow the spammer’s “damage zone”. i.e. from 20 form submissions that could be performed in three seconds, you only get one.

Replies welcome :)

Trackback from floating atoll
November 14, 2003 8:00 PM

A thousand monkeys filtering advertising

Excerpt: A common thread between the most effective forms of online advertising is the introduction of a hyperlink to a targeted user. In this respect, there is no difference between Google text ads, Orbitz pop-ups, and DoubleClick banner ads: for the advertise...

Kevin
October 16, 2005 2:21 PM

I don’t think an MD5 of the body would be useful. Even a tiny variation in the message would generate a different hash.

These are the last 15 comments. Read all 17 comments here.

This discussion has been closed.

Lijit Search

Best Of

  • Comment Spam Manifesto Spammers are hereby put on notice. Your comments are not welcome. If the purpose behind your comment is to advertise yourself, your Web site, or a product that you are affiliated with, that comment is spam and will not be tolerated. We will hit you where it hurts by attacking your source of income.
  • Customer reference questions. Sample questions to ask customer references when choosing a software vendor.
  • Movie marketing on a budget Mark Cuban's looking for more cost effective ways to market movies.
  • Rounded corners in CSS There lots of ways to create rounded corners with CSS, but they always require lots of complex HTML and CSS. This is simpler.
  • Debunking predictions Read/Write Web's authors have some goofy predictions.
  • More of the best »

Recently Read

Get More

Subscribe | Archives

Recently

Unfriendly returns (Dec 27)
Toys R Us blocks returns. You can bet I'll do all my shopping at a store with a friendlier return policy in the future.
The ongoing Comcast saga (Dec 27)
Using Twitter as a customer service tool.
Comcast and Vonage, Part 2 (Dec 26)
A Comcast tech blew their credibility.
How to make friends and influence music fans (Dec 25)
Apparently some of these labels have all the customers they need.
Comcast and Vonage (Dec 24)
I hate Comcast.
Traditions (Dec 22)
What are your family Christmas traditions?
Charlie Brown Agency (Dec 17)
Brilliant Charlie Brown Christmas and ad agency mashup.

Subscribe to this site's feed.

Elsewhere

Feed Crier
Get alerted by IM when your favorite web sites and feeds are updated.
SacStarts
The Sacramento technology startup community.
Pinewood Freak
Pinewood Derby tips and tricks
Del.icio.us
My tagstream at del.icio.us.
Waddlespot
My son's Club Penguin community. News, blogs, tips, and tricks.

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

AIM or Skype: akalsey

Resume

PGP Key

©1999-2009 Adam Kalsey.
Content management by Movable Type.