Comments for Ounce of prevention

Excerpt: Steps I’ve taken to deter comment spam and how to keep bots from finding your comment forms. Read the whole article…

September 18, 2003 12:51 AM

One day, I tried to install FormMail.cgi to avoid to have to put my email on my website. Oh surprise, while trying the script, instead of sending my test message back to myself I received an error message telling me that all variations of FormMail where forbiden on this host. So I changed the name of the script to a random string of 8 chars. The security check of my host was happy, me too until a few days ago when I received spam sent through my renamed FormMail script. So I'm sure that the spammer crawl the site and use script parameters to spot FormMail and other scripts. Maybe by making the script generate the form used to send data back to the script, we could generate parameters who would be crypted and valid for the next 15 minutes. The script would be able to say I want the variable "name" in the parameter 63PZV, the "email" in parameter 27KWA, ... And once these data are posted, the cgi will use a mapping table or something else (simple crypto algo based on time) to know which data is in which field. No need to use fuzzy logic, just self changing parameter names. That way, even if a spamer inspect the form output, if he try it again later the crypto key change and the parameter he store in his db for his spambot are useless. All you have to do is to change your secret key from time to time and keep it secret. F.

September 18, 2003 4:32 AM

When you made these changes, did you do them bit by bit, or all at once? I'd be interested to see the effects that each individual modification had on your spam counts. My feeling is still that wacky dynamic stuff like your MD5 trick, while smart, is misplaced effort; as Frederic points out, the difficulty for a robot in finding and using this hidden field is identical whether the field value is hardcoded or dynamically generated, and all the evidence I've seen from those who are being seriously hit with spam (namely, you and Shelley) is that the robots are finding those fields. A couple of extra changes that might be worthwhile are tips 3 and 5 in my anti-spam entry. 3 (multiple form elements) makes parsing a little harder, though it's still quite beatable (especially if the robot gives up on trying to find the correct one and just blats all of them). 5 (separate "preview" and "post" scripts), though, would put a serious spanner in the works. In addition, there are plenty of tricks one can do with Javascript, though I've tended to stay away from those since forcing users into Javascript is not great from an accessibility POV. -- Yoz

Daniel Von Fange
September 18, 2003 5:21 AM

Sadly, if you randomized form element names, visitor's browsers would no longer be able to autocomplete. For instance, I just typed "Dan" into the name field on the comments, and Safari completed my name, and filled in the email and website text boxs.

Trackback from Gadgetopia
September 18, 2003 6:49 AM

Preventing Comment Spam

Excerpt: Ounce of prevention: Adam Kalsey is in a war against comment spam, and he shares some great ideas here: "What will probably be the biggest help is the thing that was easiest to do: changing the comment script name. What...

September 18, 2003 9:19 AM

Adam, do you think checking the HTTP_REFERER in the script that processes comments to verify that the comment was indeed submitted from a page on your site would help? I guess a smart bot could set that header to your site. I've been hit with a few spam comments myself, and I'm using my own homegrown publishing tool. That tells me they're parsing HTML.

Adam Kalsey
September 18, 2003 9:37 AM

I don't think that decoy forms will have much effect. Since the bot is already parsing the HTML, my suspiscion is that they'd submit multiple comment forms if they were present. One thing that might work is a honeypot. Have decoy forms in your code that post to a spam trap. Anyone posting to that spam trap and your real comment form is automatically banned. You'd need a way of making sure that real people don't submit the form however. Simply using CSS to hide a section of the page wouldn't do it because not everyone is using a CSS-capable browser. Clearly labeling the form as a spam trap (in addition to hiding with CSS) would work, but would look a bit hokey. Removing the ability to post in one step would probably be effective in the short term, but I'm worried about the usability problems this might create. And I bet that once a large enough number of people do that it creates an incentive for spammers to work around that block. Unfortunately, doing so would be easy, since they already have the code to find and submit the form field. Adding an extra step that searches the page after a form post for another comment form to submit would be trivial. Daniel: I hadn't even considered the autocomplete implications of randomized field names. Making it hard for machines to automatically fill in forms will make it hard for all machines, not just malicious ones. In my case, I think that it would be worth it. The slight benefit that autocomplete provides to my readers is outweighed by the enormous benefit that it provides to spammers. Frederic: In order to do what you propose with time-based encryption, the form needs to be dynamically generated. I'm not ready to go that far yet. I also don't think that most people would ever need to change the secret key. The power in random field names is that because every form is different, spammers can't parse them automatically. They aren't going to analyze the contents of every form they come across in order to map the field names. The problem with random field names is that they are easier to defeat than I first thought. When I described the concept, I figured that bots would have to implement an enormous amount of intelligence in order to decipher the form. Now that I think about it, they could still submit spam using what they do best -- brute force. All a bot would need to do is submit each form 9 times, putting their name, email, and URL in all possible combinations of fields. Back to the drawing board.

Phillip Harrington
September 18, 2003 9:38 AM

I don't have comments anymore. Not due to this reason, but this is one benefit of not having a comments form. People who *really* have something to say email me.

Adam Kalsey
September 18, 2003 9:47 AM

Gina: Since the referrer is so easy to spoof, as soon as any significant number of people started checking for it, spam bots would start spoofing it. That's the problem with any of these systems. In order to be truly effective, they need to be limited in use. Just like email spam, once enough people started filtering on certain words and phrases, spammers changed their words and phrases. I think the only good anti-spam system will be one that turns the nature of spam against itself. In order to be effective, spam must be sent on a massive scale. There's no way around that. Detecting the fact that mutliple similar messages are being sent to multiple places and then warning others should be effective. It's how Vipul's Razor and Cloudmark opperate on email spam.

September 18, 2003 9:49 AM

Last time I got comment spam, it was immediately preceded by a referral from a Google search for data that would yield a current comment form, i.e. "blog 2003 august Name: Email Address: URL: Comments:". The comment spam appeared to come from a real person, not a bot, based on the timings reported in the access log... YMMV.

September 18, 2003 2:18 PM

I wonder if you could place a random number in a cookie that is placed by the comment entry form. When the comment is submitted, it would only be accepted if the browser provides a valid number back. If the spammers have the smarts to capture and provide the cookie during comment submission, you could then measure the time between generating the random number, and when it comes back in a comment. Conceivably, the time delta would unusually small (I know I can't create a comment in under 30 seconds) if done by a spammer, and could be filtered. Defeatable sure, but will the spammers take the time to find out where its failing, decode why its failing, and put a stall in to handle it? Seems kinda unlikely to me if they lust for speed and coverage.

September 18, 2003 2:33 PM

"Back to the drawing board." Unfortunately you are right. Yes, brute force would be a way around what I suggested and random field name would kill autofill. Adding steps that can be automated to the comment posting procedure will not stop spammer. I think that because you put a system that is open to interaction such as a comment system, an email or something else, you can only loose the game. Your line of defense is broken because in order to work, the comment system should accept comment and if regular user can use it, so can spammers. If you want it to make 1 time harder for spammers, you will make it 2 times harder for regular users. I think that the only thing we can do is either only allow comments from a closed group of people that we trust or let the system be open and clean the spam after they appear. Now if we can't avoid spam, we can make it easier to clean it. What about a link such as "Report comment spam" that would send you an email with the comment and a link to delete it ? Remember ... we used to put an email address on our webpages and when we received spam we used web form 2 email systems and now these systems are broken by spammer as well as comment forms. I think that all these have the same weakness. Find a way to stop spam for email and you will find a solution that can be applied to other problems.

September 18, 2003 2:44 PM

MojoMark, timing issue are easy to break. Forum scripts usually forbid users to post more than X post in X minutes. I guess that if I'm a comment script spammer it's something that I would try. "This comment script is made for real user that take time to type a comment, emulate user interaction and put a delay where needed" You know, with multi-threading, while I spend 30 seconds on a website before submitting my comment spam, I can move to the next blog and go on ... You know, it's easy to reverse engineer system that are open. Look at the Google toolbar, it use a checksum algorithm that run against the url you are watching so Google backend know that the request come from their toolbar or from a software built to run hundred of queries to get the pagerank of your competitors. Even this can of thing is easy so beleive me, professional spammers will think about that delay thing and they will take the time to find a work around. When you loose time to keep your comment system free of time you loose money, when spammers take the time to analyse your line of defense this is an investment. They will make more money later.

September 18, 2003 2:55 PM

Here is a thread on this subject in Movable Type support forum:

Trackback from Noch'n Blogg.
September 19, 2003 1:41 AM

Effektive Massnahmen gegen Comment-Spam

Excerpt: Immer mehr Leute beschweren sich ber Comment-Spam und einige Manahmen wurden ergriffen, um dieser Methode entgegenzuwirken. Ich hatte bislang noch...

September 29, 2003 6:42 PM

Not that I know anything of course... Perhaps you could try the system that Yahoo uses to prevent automated registrations. Namely that the poster has to enter a codeword that is presented as a distressed image on the page. I'm sure that anyone who wants to post won't mind a few extra characters - and it can be fun. My most recent word was "death".

Adam Kalsey
September 29, 2003 7:07 PM

The problem with those is that blind users won't be able to comment.

September 30, 2003 11:22 AM

For your "honeypot" form idea: Use standard HTML comments to block it off.

Trackback from cce blog
October 6, 2003 2:54 PM

quick-n-dirty comment spam fix

Excerpt: i started getting a LOT of comment spam ... so i just renamed mt-comments.cgi to mt-c0mments.cgi to keep the robots away. i haven't received any comment spam since then, and i used to get several every day, so i suppose it must be working. publicizin...

October 26, 2003 4:25 AM

The best response I have seen addresses not how the spammers do this, but rather denying them the payoff they seek. If you make all of your comments' URL links go to an intermediate page which has no inbound links (and hence no pagerank value) then that page can give the user's own URL, which is clickable, but the spammer's purpose will have been defeated. Some sort of blog software upgrade broadly implementing this type of fix appears to be the best medium term way out of this mess.

October 26, 2003 4:27 AM

Maybe it would be enough if the intermediate page had a 'robots.txt' tag. Google wouldn't index the link.

Trackback from Spam-Block Specialists
November 10, 2003 10:24 AM

SPEWS works for --YOU-- to eradicate SPAM

Excerpt: SPEWS-- the spam reduction specialists!

Paul Makepeace
September 30, 2004 4:56 PM

I fully agree, and really despise this solution. Especially with MT Blacklist it is essentially redundant anyway. Are you aware of any patches or ways of turning it off?

October 29, 2004 7:35 PM

Ok, so I have a question: Did this end up working sufficiently for you?

February 19, 2006 8:41 AM

I've been purging our forum membership page of spurious spambot placed addies, but many of them have some sort of cloaking device that prevents me from identifying, and hence deleting them. Short of turning our forum into a closed enter by invitation only site, is their a simple way to attack these listings? I am a simple poet and not very conversant with techno skills.

This discussion has been closed.

Recently Written

The Trap of The Sales-Led Product (Dec 10)
It’s not a winning way to build a product company.
The Hidden Cost of Custom Customer Features (Dec 7)
One-off features will cost you more than you think and make your customers unhappy.
Domain expertise in Product Management (Nov 16)
When you're hiring software product managers, hire for product management skills. Looking for domain experts will reduce the pool of people you can hire and might just be worse for your product.
Strategy Means Saying No (Oct 27)
An oft-overlooked aspect of strategy is to define what you are not doing. There are lots of adjacent problems you can attack. Strategy means defining which ones you will ignore.
Understanding vision, strategy, and execution (Oct 24)
Vision is what you're trying to do. Strategy is broad strokes on how you'll get there. Execution is the tasks you complete to complete the strategy.
How to advance your Product Market Fit KPI (Oct 21)
Finding the gaps in your product that will unlock the next round of growth.
Developer Relations as Developer Success (Oct 19)
Outreach, marketing, and developer evangelism are a part of Developer Relations. But the companies that are most successful with developers spend most of their time on something else.
Developer Experience Principle 6: Easy to Maintain (Oct 17)
Keeping your product Easy to Maintain will improve the lives of your team and your customers. It will help keep your docs up to date. Your SDKs and APIs will be released in sync. Your tooling and overall experience will shine.


What I'm Reading


Adam Kalsey

+1 916 600 2497


Public Key

© 1999-2022 Adam Kalsey.