Distributed comment spam prevention

Earlier I mentioned some ideas for preventing comment spam. Thanks to a TrackBack ping, I found out that Simon Willison had been discussing the same thing yesterday. I need to read Simon more often. This is the second time that I’ve been working on something only to find out that he’s doing something similar.

Simon’s offering a blacklist of domains that are used in his spam, and that gave me an idea. Combine a distributed blacklist with my distributed anti-spam concept. Sites could participate by sending the IP address, URL, and a digest of the comment body (an MD5 hash would work) to a central server or a cloud of servers. If the server saw that the same comment was being posted multiple places within a short time period it would send a ping to all participating sites. The ping would contain the IP address and URL of the spammer. The sites would then use this information to ban further comments from that site and IP. Ideally the ban would be temporary to minimize the impact of false positives, but that would be up to the site’s software.

Essentially, this would create an organic system that responds to wholesale comment spamming in real time. This wouldn’t solve the problem of someone posting an individual comment on a single site, but that’s not really the way spammers work. For spam to be effective, it needs enormous volume. And the only way to have that sort of posting volume is to automate it.

Adam Kalsey
October 10, 2003 10:12 AM

Bots would simply set the checkbox and submit it. There’s all sorts of things you could do with JavaScript, if you want to require the user has JS before submitting a comment.

For instance, you could have a checkbox that alters the value of a hidden field through JavaScript. Ignore or moderate any postings that don’t have the correct value in the hidden field.

Trackback from random ruminations
October 11, 2003 9:21 AM

Comment Spam

Excerpt: I've been struck with comment spam three times in the last week. I don't know if this means that, suddenly, my blog has hit the radar screens of whatever search engine spammers use, or if I'm just lucky. Regardless, the first time is was mild, the seco...

Rick
November 7, 2003 9:01 AM

i’ve noticed a trick to get rid of comment noise when filtering. SPAM random characters will still allow the message to be read (otherwise the spam would have no impact). So they usually insert non-alphanumeric characters in the comment subject. Here’s a small formula that i’d like to try out in your anti-spam blocker.

1) Perform an anti-l337 filter. A simple translation table will do the job. (result must be always lowercase) 2) Strip spaces and non alphabetic characters. 3) Change 2-character sequences for their phonetic equivalent (i.e. ph -> f ). Simple translation tables also work.

4) There you go. The message has been filtered and ready for digest.

Example:

Phr’33 v149r4 ’ ph; 0r .U

Step 1 - Anti 1337 filter:

Phr’ee viagra ’ ph; or .u

Step 2 - Strip non alphanum chars:

phreeviagraphoru

Step 3 - 2-char Phonetic replacement:

freeviagraforu

We could have a massive test and then perhaps, with some “scientific” research build a database, who knows. Still, the content has been filtered and ready for a keyworkd search. The keywords that can be found by any simple search routine are “free”,”viagra”,”for”.

The trick here is that spammers are cheapstakes. They won’t do artificial intelligence programs to fool spam filters. They will use insted simple translation tables. Therefore, simple translation tables can also be used to decrypt their subject fields.

About input forms, I find this one easy.

1) Use sessions cookies.

2) For each mail submit form, include a delay of 3 seconds before processing the submission.

3) Include the hidden random fileld. That will ensure that a same mail form will only be processed once.

This will ensure that the spammer will at least have to wait 3 seconds between mail submissions. This will narrow the spammer’s “damage zone”. i.e. from 20 form submissions that could be performed in three seconds, you only get one.

Replies welcome :)

Trackback from floating atoll
November 14, 2003 8:00 PM

A thousand monkeys filtering advertising

Excerpt: A common thread between the most effective forms of online advertising is the introduction of a hyperlink to a targeted user. In this respect, there is no difference between Google text ads, Orbitz pop-ups, and DoubleClick banner ads: for the advertise...

Kevin
October 16, 2005 2:21 PM

I don’t think an MD5 of the body would be useful. Even a tiny variation in the message would generate a different hash.

These are the last 15 comments. Read all 17 comments here.

This discussion has been closed.

Follow me on Twitter

Lijit Search

Best Of

  • Embrace the medium The Web is different than print, television, or any other medium. To be successful, designers must embrace those differences.
  • Customer reference questions. Sample questions to ask customer references when choosing a software vendor.
  • Simplified Form Errors One of the most frustrating experiences on the Web is filling out forms. When mistakes are made, the user is often left guessing what they need to correct. We've taken an approach that shows the user in no uncertain terms what needs to be fixed.
  • Debunking predictions Read/Write Web's authors have some goofy predictions.
  • The best of 2006 I wrote a lot of drivel in 2006. Here's the things that are less crappy than the rest.
  • More of the best »

Recently Read

Get More

Subscribe | Archives

Recently

Ideas, Risk, and Investors (Jan 1)
Over at SacStarts, I have piece up discussing a common question I get from entrepreneurs....
VoiceXML for web developers (Dec 17)
Building voice applications isn't hard at all. Any web developer can do it.
De-skunking a dog (Oct 27)
How to clean up your pet after a skunk attack.
Pressure sales via Twitter (Oct 16)
Sticking an ad in my face when we first meet is a good way to lose my interest.
Loma Prieta, 20 years later (Oct 13)
Looking at the earthquake from October 17, 1989
Red light cameras don't work (Oct 13)
Cameras installed to catch people running red lights aren't about traffic safety at all.
Jack-o-lantern pumpkin carving patterns (Oct 12)
It's a tradition, what can I say?
SEO realities (Oct 12)
The real search engine optimization. Works every time.

Subscribe to this site's feed.

Elsewhere

IMified
Build instant messaging applications. (My company)
SacStarts
The Sacramento technology startup community.
Pinewood Freak
Pinewood Derby tips and tricks

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

AIM or Skype: akalsey

Resume

PGP Key

©1999-2010 Adam Kalsey.
Content management by Movable Type.