Comments for Distributed comment spam prevention

Excerpt: An organic system that responds to wholesale comment spamming in real time. Read the whole article…

September 4, 2003 5:55 AM

Wrap a web services engine around that and you make it very easy for others to use the list. Presto!

Matthew Walker
September 4, 2003 6:43 AM

I don't think you want to go the MD5 route, you're probably better off leveraging existing software in the form of Razor/Pyzor/DCC which have already built this stuff for email. They incorporate stuff like fuzzy checksumming.

Simon Willison
September 4, 2003 7:19 AM

The problem with using an MD5 hash of the comment body is that spammers can get around it by adding a couple of random characters to each comment, causing almost identical comments to generate completely different hashes. They do this in emails already (the weird random characters you get in the subject lines of some spams). There's also a trust issue with a single server or crowd of servers - what if one of them starts maliciously flagging legitimate comments as spam? A real time distributed system is a very interesting idea but there are quite a few kinks to iron out.

Adam Kalsey
September 4, 2003 9:18 AM

I agree that a message hash isn't perfect. The addition of a single character to a message would throw the hash off completely. The basic problem is that we need to identify comments from multiple sites that are identical. Not itdentical to a computer, but identical to a person. People recognize that the following two lines mean the same thing, but computers don't: One, Two, Three, Four 0ne, t wo, three, F`o`u`r I'll look into the fuzzy checksumming of some of the anti-spam systems, but I don't think that using such a system directly is the way to go. What we're looking for here isn't content that someone has flagged as spam (as Razor and Cloudmark do), but content that is repeated across multiple sites. The problem of innocent people being blacklisted due to false positives or maliciousness is mitigated by the fact that the blacklist is temporary. Sites implementing the blacklist would be expected to expire all bannings within a few hours. Malicious blacklisting could be even further removed by making sure that the system is maintained by a trusted group, sort of how the RBL or ORBS are for email.

September 4, 2003 10:19 AM

It would be better to prevent robots entering comments at all, so a human validator is my suggestion, like paypal vB and ebay use. "Enter the digits from the box on the right", where the box is an obscureed graphic. random text can be added to a url or comment to make it unique to avoid a blacklist type system. It's easy to get hold of a daily list of 200-400 open http proxies, or use ISP dialups with DHCP to spam with. ... Just thoughts. Anything to stop the spammers would slow them down ...(cynic) and move them off to wiki's and trackback... *sigh*

Adam Kalsey
September 4, 2003 11:20 AM

Those random text images are an accessibility nightmare. If you make it hard for machines to interact with your site, you are making it hard for screen readers as well.

David Beckemeyer
September 8, 2003 10:23 PM

Here's an idea I've implemented on my blog: It is a simple CAPTCHA Turing Test for posters. It doesn't stop all spam, but it prevents spam robots from posting.

Nick Altmann
September 14, 2003 6:02 PM

Would this problem be moot if comments were kept in the posters blog instead of on the commented page? Then a feature like Google's "backward links" could find related comments. The display could end up being the same (with the user agent pulling the comments into a single page), but it would shift the filtering burden (or privelege) to the user instead of the publisher.

Adam Kalsey
September 15, 2003 9:00 AM

That assumes that everyone who would like to comment has a blog. It also assumes that they want their blog to become a list of comments on other blogs.

Wolfgang Flamme
September 19, 2003 12:47 PM

Adam, why go for text content? We should a) aim at posted URLs b) monitor poster's IP activity (a) will prevent backlink spam activity targeting search engines (b) will prevent any excessive or automatic comment activity from a spammer (someone leaving 50 comments per day probably doesn't have that much to say) Wolfgang

Mean Dean
October 6, 2003 2:45 AM

You've been blogged in a post of mine about how I was able to discern a pattern used by a particular comment spammer who afflicted my site 2x today. Perhaps we could combine technologies to thwart this putz? See the hyperlink associated with my name.

Mike Steinbaugh
October 10, 2003 10:00 AM

Adam, I think I have a solution. Include a checkbox in the comment form that says something like, "Are you human? (prevents against comment spam)". Then once the user checks the box, the comment will go through. This can be used as a short term fix until Moveable Type allows users to change the names of the form elements, which I think is the easiest fix. I totally agree that the random digits approach is an accessibility nightmare and should be avoided. I think the delay time idea is good, but very hard to implement since it would involve JavaScript for the time being until Ben and Mena can make it server side in MT. Just some thoughts...I'd love to get this fixed though. My blog is starting to get lots of comment spam.

Adam Kalsey
October 10, 2003 10:12 AM

Bots would simply set the checkbox and submit it. There's all sorts of things you could do with JavaScript, if you want to require the user has JS before submitting a comment. For instance, you could have a checkbox that alters the value of a hidden field through JavaScript. Ignore or moderate any postings that don't have the correct value in the hidden field.

Trackback from random ruminations
October 11, 2003 9:21 AM

Comment Spam

Excerpt: I've been struck with comment spam three times in the last week. I don't know if this means that, suddenly, my blog has hit the radar screens of whatever search engine spammers use, or if I'm just lucky. Regardless, the first time is was mild, the seco...

November 7, 2003 9:01 AM

i've noticed a trick to get rid of comment noise when filtering. SPAM random characters will still allow the message to be read (otherwise the spam would have no impact). So they usually insert non-alphanumeric characters in the comment subject. Here's a small formula that i'd like to try out in your anti-spam blocker. 1) Perform an anti-l337 filter. A simple translation table will do the job. (result must be always lowercase) 2) Strip spaces and non alphabetic characters. 3) Change 2-character sequences for their phonetic equivalent (i.e. ph -> f ). Simple translation tables also work. 4) There you go. The message has been filtered and ready for digest. Example: Phr'33 v149r4 ' ph; 0r .U Step 1 - Anti 1337 filter: > Phr'ee viagra ' ph; or .u Step 2 - Strip non alphanum chars: > phreeviagraphoru Step 3 - 2-char Phonetic replacement: > freeviagraforu We could have a massive test and then perhaps, with some "scientific" research build a database, who knows. Still, the content has been filtered and ready for a keyworkd search. The keywords that can be found by any simple search routine are "free","viagra","for". The trick here is that spammers are cheapstakes. They won't do artificial intelligence programs to fool spam filters. They will use insted simple translation tables. Therefore, simple translation tables can also be used to decrypt their subject fields. About input forms, I find this one easy. 1) Use sessions cookies. 2) For each mail submit form, include a delay of 3 seconds before processing the submission. 3) Include the hidden random fileld. That will ensure that a same mail form will only be processed once. This will ensure that the spammer will at least have to wait 3 seconds between mail submissions. This will narrow the spammer's "damage zone". i.e. from 20 form submissions that could be performed in three seconds, you only get one. Replies welcome :)

Trackback from floating atoll
November 14, 2003 8:00 PM

A thousand monkeys filtering advertising

Excerpt: A common thread between the most effective forms of online advertising is the introduction of a hyperlink to a targeted user. In this respect, there is no difference between Google text ads, Orbitz pop-ups, and DoubleClick banner ads: for the advertise...

October 16, 2005 2:21 PM

I don't think an MD5 of the body would be useful. Even a tiny variation in the message would generate a different hash.

This discussion has been closed.

Follow me on Twitter

Best Of

  • How not to apply for a job Applying for a job isn't that hard, but it does take some minimal effort and common sense.
  • Movie marketing on a budget Mark Cuban's looking for more cost effective ways to market movies.
  • California State Fair The California State Fair lets you buy tickets in advance from their Web site. That's good. But the site is a horror house of usability problems.
  • Customer reference questions. Sample questions to ask customer references when choosing a software vendor.
  • Comment Spam Manifesto Spammers are hereby put on notice. Your comments are not welcome. If the purpose behind your comment is to advertise yourself, your Web site, or a product that you are affiliated with, that comment is spam and will not be tolerated. We will hit you where it hurts by attacking your source of income.
  • More of the best »

Recently Read

Get More

Subscribe | Archives


Assumptions and project planning (Feb 18)
When your assumptions change, it's reasonable that your project plans and needs change as well. But too many managers are afraid to go back and re-work a plan that they've already agreed to.
Feature voting is harmful to your product (Feb 7)
There's a lot of problems with using feature voting to drive your product.
Encouraging 1:1s from other managers in your organization (Jan 4)
If you’re managing other managers, encourage them to hold their own 1:1s. It’s such an important tool for managing and leading that everyone needs to be holding them.
One on One Meetings - a collection of posts about 1:1s (Jan 2)
A collection of all my writing on 1:1s
Are 1:1s confidential? (Jan 2)
Is the discussion that occurs in a 1:1 confidential, even if no agreed in the meeting to keep it so?
Skip-level 1:1s are your hidden superpower (Jan 1)
Holding 1:1s with peers and with people far below you on the reporting chain will open your eyes up to what’s really going on in your business.
Do you need a 1:1 if you’re regularly communicating with your team? (Dec 28)
You’re simply not having deep meaningful conversation about the process of work in hallway conversations or in your chat apps.
What agenda items should a manager bring to a 1:1? (Dec 23)
At least 80% of a 1:1 agenda should be driven by your report, but if you also to use this time to work on things with them, then you’ll have better meetings.

Subscribe to this site's feed.


Adam Kalsey

Mobile: 916.600.2497

Email: adam AT

Twitter, etc: akalsey



©1999-2019 Adam Kalsey.