The other day, I had a very interesting conversation with a friend about what and how comment spam works. I though it was a good blog topic as well.
What is comment spam?
Comment spams are weird comments full of links and garbage you might have seen on some blogs. Unlike junk mail that fill up your mailbox, comment spams are targeted to search engine robots, not humans — it’s called spamdexing.
It’s a form of SEO to help spammers drive traffic to commercial websites. Spammers use the huge web of blogs as a diffusion platform to cheat search engines. Driving and controlling traffic is what brings value to spammers.
How do they get my blog?
However if you post a comment on a popular blog, you can be sure you will be the next one.
How do they manage all these blogs?
Spammers would perform a test campaign on collected blogs before they would actually start spamming. You know you are a potential target when you start getting fake comments such as “nice blog” or “keep up the good work”.
Spammers use this technique to gather information about your blogging platform and check if it’s open, moderated or filtered. If you leave these comments long enough, your blog will be marked as elligible for spamming.
How do they actually spam?
Spammers are smart programmers, everything is automated.
As soon as your blog is open for spamming, it will be flooded by link comments. They use open proxies (badly configured by negligence of system administrators) available accross the Internet to relay comments on their behalf so they can better hide.
Spammers also tend to comment on older posts so that spams don’t show up on the main page of your blog but are still accessible by search engines.
What are the solutions?
There is no perfect solution to this problem. Actually, two solutions are efficient, automatic filters and captcha.
Automatic filters such as Akismet or Defensio — available as plugins for most blogging platforms — validates all your comment traffic using complex algorithms to score comments. This generally works quite well but there is a little chance that good comments (false positives) be marked as spam and removed.
A captcha is a scrambled picture that can’t be scanned by programs but is readable by humans. Users are asked to enter the word that shows up to be allowed to post their comment. That’s generally very efficient but it’s quite bad from the user perspective and might discourage people from posting comments.
Spamming is a plague
80% of the email traffic on the Internet is actually junk and I guess comment spam is also quite huge. Sadly, even if we manage to filter spam from our blogs and mailboxes, the bandwidth will still be drained by the traffic.