Ever since my last post about the subject, I have managed to reduce comment spam on this blog by a significant margin. There were a number of methods that I applied and I will attempt to explain them here, though I won’t go into so much detail that it can be used to circumvent these mesaures.
First of all, I disabled links back to the commenter’s website. That is unless it is a registered site that I have already approved. A number of spammers and abusers were using this to point back to their sites which were either spam or phishing sites or contained malicious code for various uses.
Secondly, I removed the motivation for putting up comment spam by removing the link code on all posts (except those by registered users). This can easily be done in Python through either the sgmllib module or the re module’s “sub” method. Once you remove the link code, all that will be posted is the URL itself without being an active link. Since this is not the aim of comment spammers, it can act as an effective deterrent.
Another method that I implemented was to count the number of links in the post and have the whole post discarded in case it contains a high proportion of links. This is checked before the link code is removed, as described above. I still get a notification of these posts, but so far there has not been a single false positive. This has made it much simpler to handle the spam. I can just discard all comments marked as comment spam.
Something that would be relatively difficult to implement with most other blogging software is changing the form input “names” on the comment form on a periodic basis. This stops spammers from “learning” how your comment system works and using automated tools to directly POST comments. This method helped me cut down on a huge amount of comment spam the last time I changed the values.
I must admit that all this hasn’t stopped spam completely, but it has helped a lot. Some other methods that could be used include the use of picture verification codes, using Bayesian algorithms to identify spam and implementing an approval system (using Mailman maybe?). Anyway, I’ll leave those for another day and when I have enough time to work on them.