Need someone to lead product management at your software company? I create software for people that create software and I'm looking for my next opportunity. Check out my resume and get in touch.

Ounce of prevention

Freshness Warning
This blog post is over 20 years old. It's possible that the information you read below isn't current and the links no longer work.

At the risk of this starting to look like a blog about comment spam, I have some additional thoughts on the matter.

I’ve made some changes to my comment forms here. The first is that the CGI script that comments get posted to is no longer the default mt-comments.cgi. I’ve created a clone of the comments script and renamed it fbda07e9fd3bb656bbf62c5b0ed6480e.cgi. That should stop bots that search for copies of mt-comments.cgi.

The next thing I’ve done is included a hidden field in each comment form that contains a MD5 hash of the entry ID and a secret word. Then I modified MT to check for that field. The comments script now creates a hash of the entry id and secret word and compares it to the one submitted with the comment. If that field isn’t submitted or it doesn’t match, the comment is rejected and the user is shown an error message.

But I wonder if these steps are useful at all. What I question is how spam bots are finding entries on which to comment. The entries that get the most spam comments here are those that have a large number of incoming links. The SimpleComments page is one of the hardest hit. That seems to suggest that bots are crawling from blog to blog, following links and posting comments.

This means that in order to post a comment, the bots must be parsing the HTML in order to find out if there’s a comment form on it. They aren’t apparently searching Google for common comment scripts otherwise the top search results would have the most spam comments.

Since the bots are parsing the HTML adding hidden form fields probably won’t deter them. If the authors of the bots have any brains whatsoever, they’re submitting all the hidden fields along with the forms. My hidden hash will be submitted by a bot just like it would by a person. What will probably be the biggest help is the thing that was easiest to do: changing the comment script name.

What else would be effective is changing the names of all the form fields. Making them short random strings would make it impossible for a bot to recognize the comment form using only the field names. People would be able to understand the form because of the labels, but bots would have to implement a large amount of fuzzy logic in order to recognize that “Name,” “Your Name:,” and other forms are really the same thing.

MojoMark
September 18, 2003 2:18 PM

I wonder if you could place a random number in a cookie that is placed by the comment entry form. When the comment is submitted, it would only be accepted if the browser provides a valid number back. If the spammers have the smarts to capture and provide the cookie during comment submission, you could then measure the time between generating the random number, and when it comes back in a comment. Conceivably, the time delta would unusually small (I know I can't create a comment in under 30 seconds) if done by a spammer, and could be filtered. Defeatable sure, but will the spammers take the time to find out where its failing, decode why its failing, and put a stall in to handle it? Seems kinda unlikely to me if they lust for speed and coverage.

Frederic
September 18, 2003 2:33 PM

"Back to the drawing board." Unfortunately you are right. Yes, brute force would be a way around what I suggested and random field name would kill autofill. Adding steps that can be automated to the comment posting procedure will not stop spammer. I think that because you put a system that is open to interaction such as a comment system, an email or something else, you can only loose the game. Your line of defense is broken because in order to work, the comment system should accept comment and if regular user can use it, so can spammers. If you want it to make 1 time harder for spammers, you will make it 2 times harder for regular users. I think that the only thing we can do is either only allow comments from a closed group of people that we trust or let the system be open and clean the spam after they appear. Now if we can't avoid spam, we can make it easier to clean it. What about a link such as "Report comment spam" that would send you an email with the comment and a link to delete it ? Remember ... we used to put an email address on our webpages and when we received spam we used web form 2 email systems and now these systems are broken by spammer as well as comment forms. I think that all these have the same weakness. Find a way to stop spam for email and you will find a solution that can be applied to other problems.

Frederic
September 18, 2003 2:44 PM

MojoMark, timing issue are easy to break. Forum scripts usually forbid users to post more than X post in X minutes. I guess that if I'm a comment script spammer it's something that I would try. "This comment script is made for real user that take time to type a comment, emulate user interaction and put a delay where needed" You know, with multi-threading, while I spend 30 seconds on a website before submitting my comment spam, I can move to the next blog and go on ... You know, it's easy to reverse engineer system that are open. Look at the Google toolbar, it use a checksum algorithm that run against the url you are watching so Google backend know that the request come from their toolbar or from a software built to run hundred of queries to get the pagerank of your competitors. Even this can of thing is easy so beleive me, professional spammers will think about that delay thing and they will take the time to find a work around. When you loose time to keep your comment system free of time you loose money, when spammers take the time to analyse your line of defense this is an investment. They will make more money later.

Frederic
September 18, 2003 2:55 PM

Here is a thread on this subject in Movable Type support forum: http://www.movabletype.org/support/index.php?act=ST&f=10&t=26946&hl=comment+spam

Trackback from Noch'n Blogg.
September 19, 2003 1:41 AM

Effektive Massnahmen gegen Comment-Spam

Excerpt: Immer mehr Leute beschweren sich über Comment-Spam und einige Maßnahmen wurden ergriffen, um dieser Methode entgegenzuwirken. Ich hatte bislang noch...

Saintjude
September 29, 2003 6:42 PM

Not that I know anything of course... Perhaps you could try the system that Yahoo uses to prevent automated registrations. Namely that the poster has to enter a codeword that is presented as a distressed image on the page. I'm sure that anyone who wants to post won't mind a few extra characters - and it can be fun. My most recent word was "death".

Adam Kalsey
September 29, 2003 7:07 PM

The problem with those is that blind users won't be able to comment.

anthony
September 30, 2003 11:22 AM

For your "honeypot" form idea: Use standard HTML comments to block it off.

Trackback from cce blog
October 6, 2003 2:54 PM

quick-n-dirty comment spam fix

Excerpt: i started getting a LOT of comment spam ... so i just renamed mt-comments.cgi to mt-c0mments.cgi to keep the robots away. i haven't received any comment spam since then, and i used to get several every day, so i suppose it must be working. publicizin...

JK
October 26, 2003 4:25 AM

The best response I have seen addresses not how the spammers do this, but rather denying them the payoff they seek. If you make all of your comments' URL links go to an intermediate page which has no inbound links (and hence no pagerank value) then that page can give the user's own URL, which is clickable, but the spammer's purpose will have been defeated. Some sort of blog software upgrade broadly implementing this type of fix appears to be the best medium term way out of this mess.

JK
October 26, 2003 4:27 AM

Maybe it would be enough if the intermediate page had a 'robots.txt' tag. Google wouldn't index the link.

Trackback from Spam-Block Specialists
November 10, 2003 10:24 AM

SPEWS works for --YOU-- to eradicate SPAM

Excerpt: SPEWS-- the spam reduction specialists!

Paul Makepeace
September 30, 2004 4:56 PM

I fully agree, and really despise this solution. Especially with MT Blacklist it is essentially redundant anyway. Are you aware of any patches or ways of turning it off?

David
October 29, 2004 7:35 PM

Ok, so I have a question: Did this end up working sufficiently for you?

Wil
February 19, 2006 8:41 AM

I've been purging our forum membership page of spurious spambot placed addies, but many of them have some sort of cloaking device that prevents me from identifying, and hence deleting them. Short of turning our forum into a closed enter by invitation only site, is their a simple way to attack these listings? I am a simple poet and not very conversant with techno skills.

These are the last 15 comments. Read all 24 comments here.

This discussion has been closed.

Recently Written

Mastery doesn’t come from perfect planning (Dec 21)
In a ceramics class, one group focused on a single perfect dish, while another made many with no quality focus. The result? A lesson in the value of practice over perfection.
The Dark Side of Input Metrics (Nov 27)
Using input metrics in the wrong way can cause unexpected behaviors, stifled creativity, and micromanagement.
Reframe How You Think About Users of your Internal Platform (Nov 13)
Changing from "Customers" to "Partners" will give you a better perspective on internal product development.
Measuring Feature success (Oct 17)
You're building features to solve problems. If you don't know what success looks like, how did you decide on that feature at all?
How I use OKRs (Oct 13)
A description of how I use OKRs to guide a team, written so I can send to future teams.
Build the whole product (Oct 6)
Your code is only part of the product
Input metrics lead to outcomes (Sep 1)
An easy to understand example of using input metrics to track progress toward an outcome.
Lagging Outcomes (Aug 22)
Long-term things often end up off a team's goals because they can't see how to define measurable outcomes for them. Here's how to solve that.

Older...

What I'm Reading

Contact

Adam Kalsey

+1 916 600 2497

Resume

Public Key

© 1999-2024 Adam Kalsey.