Spam and antispam

Analysis of spam, both e-mail and web-based, and of technology that attempts to defeat it.

January 8, 2008

A very fast splogger

The first post ever on Strategic Messaging went up at 2:49 am. Within four hours, I had my first splog trackbacks, all from the same site. The domain itself had just repropagated through DNS hours earlier, and had no incoming links other than Whois and the like.

Pretty impressive spamming. Not that it did him any good, of course, except insofar as he was stealing a bit of my content …

January 2, 2008

Restoring security and function to my mail and websites

OK. Here’s the story as I now know it.

  1. was hit by a massive mail-bomb Christmas Eve. My email and websites went down for a while as a consequence. What’s more, with a flooded mail queue, there were further mail problems through at least 12/28. Some mail bounced, and other mail that appeared to go through was lost forever. If you’ve mailed me since 12/24 and I haven’t answered, please send again.
  2. The mail-bomb paved the way for an injection of some malware. I started noticing possible trojans on 12/31. Melissa Bradshaw, my stellar web designer, noticed Javascript that she hadn’t written, both on and So far as we could tell, standard anti-malware client protections were sufficient to keep any trojans from being successfully downloaded to clients.
  3. My very attentive web hosting company, Dimension Servers, is rebuilding its Linux kernel accordingly. Scheduled downtime for all my sites and mail is midnight to 3:00 am Eastern tonight, but that’s obviously just a rough estimate. Company president Jonathan MacAllister telephoned me to tell me this personally, notwithstanding that his wife delivered a baby by emergency C-section today. (Wife and baby are OK!)
  4. Jonathan also told me that after the attack, he bought a Cisco appliance. Every web hosting company needs to do that, as appliances are much more efficient at dealing with overloading attacks than the servers themselves. Cisco was a brand choice pretty much dictated by his remote data center.
  5. David Ferris and Richi Jennings have convinced me to move email to Google’s free mail hosting service. This is what they’re doing with mail and all of Richi’s domains as well. NO analysts are more reliable on email than David and Richi. And hosting is surely no exception, as David and I did a research project together some years ago uncovering the Critical Path sham.
  6. The net effect of that move will be that and have their email managed quite separately, so if you can’t get me at one, please try the other. Generally, if you don’t know me you should write to, and I’ll probably write back from
  7. I’ll post about all this again after things seem to have worked out, possibly over on the Monash Report.

Happy New Year,


December 31, 2007

I’m getting mailbombed again

Shortly after my first reference to Shoemoney’s DMOZ issues — who did you think I meant with “shoe in his mouth“? — I got mailbombed big time. Things calmed down after a month or so, although I did change web hosting companies in the fallout.

Starting Christmas Eve — which coincidentally was shortly after a forum mention of various Shoemoney flaps, and of the first attack — I got hit again. And there was another wave right after Christmas. A fair amount of email was lost forever, possibly both professional and personal. My blogs also were down for a while, as were other sites on the same server. (And if you sent me any email over that time period, please resend it.)

It seems that I should move my email/MX record to a different service than hosts my websites, perhaps one that has invested in technology to efficiently deflect DDOS attacks. (Or perhaps I should move one domain with it, if a traditional hosting deal seems best.) Does anybody have any recommendations of such services? Read more

November 29, 2007

An Occurrence at Owl Creek Bridge and other SEO spam explained

I average upwards of 100 spam comments per day per blog, very little of which actually gets through (although that very little is obviously enough to be quite annoying!). Recent research from Sunbelt explains part of what’s going on. (More here in Computerworld.) What’s going on is this:

1. Aggressive black-hat SEO is being done for all kind of long-tail terms and phrases, by posting comment spam filled with little except links on those phrases. For example, one of the first spams I checked for this post consists simply of 10 links to the same .cn, with anchor text, with anchor text and subdomain name being the same keyphrase. Keyphrases included “an occurrence at owl creek bridge”, “allegheny assessment county tax”, and “am been hate i ive who who.” As this kind of spam came by, I’d been wondering why people bothered, since it didn’t seem terribly easy to monetize. Read more

July 22, 2007

Text analytics marketplace trends

It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.

*Factiva is the most significant exception. Hint, hint.

If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.

*I.e., part of the “T” in “ETL” (Extract/Transform/Load).

Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more

June 6, 2007

I’ve decided to trust Akismet/Bad Behavior

Akismet recently upgraded so that you can see all the spam it’s holding, not just the last 150 messages. This made me a lot happier — but ironically I quickly gave up, and decided to trust Akismet without checking. Why? Well, Akismet sequesters 15 days of spam, and I currently have the following numbers of messages stashed away in it:

That’s over 800 spam per day across four blogs. And when I did check, I almost never found a false positive, except occasionally a trackback of my own.

More problematic is my e-mail. Eudora flags pretty much everything that isn’t from an established sender as spam. So along with my 300+ true spam, I get a number of false positives per day, some of which have turned into paying customer relationships. So THAT spam directory I do check carefully …

March 26, 2007

So THAT’S why Andrew Orlowski still has a job (Part 2)

Andrew Orlowski is an over-the-top jerk, and a pretty sloppy reporter and analyst to boot. But he occasionally makes a good point even so. In the most recent instance, he confronted Tim Berners-Lee. As the article makes clear, Berners-Lee reacted badly to Orlowski, reflecting an attitude that is probably shared by 99% of the people who encounter the guy, and in the future will probably be adopted by sentient computers as well. Even so, Orlowski’s underlying point is valid: If the Semantic Web is going to be any more spam-free than the current Web, nobody has adequately explained why.

February 7, 2007

Is DMOZ the cure to Wikipedia’s spam problem?

Joost de Valk makes an interesting suggestion, namely that Wikipedia should drop all external links other than to DMOZ, and rely on DMOZ as the outside link directory. As division of labor, it makes perfect sense. However, it’s a total non-starter until at least two problems are solved. Read more

February 3, 2007

Please switch to my back-up e-mail address

At least for the moment. e-mail has been turned off by my hosting company, due to what they claim is a still on-going attack. My backup address, however —, where domain = dbms2 — is working fine. And my e-mail client traditionally checks them at the same time. So I suggest switching, at least for the moment.

Both are through the same hosting company (Hostgator, which I aspire to replace in the immediate future, given that I also lost admin access to the blogs on two separate occasions this week, and given that support claims over half my e-mails are unreadably empty and hence suitable for being ignored, despite me never having that problem elsewhere). Thus, for other kinds of problems there might be a single point of failure. But in this case, the dbms2 address is a working alternative to the standard one.

January 30, 2007

A great new (to me) phrase – “Adversarial Information Retrieval”

I’ve just discovered a great new phrase – adversarial information retrieval. It’s not really new, since papers are now being accepted for what will be the third annual conference on the subject. But it seems to have gained currency over the past few months.

Edit: The term seems to have been coined in 2000.

I think this area is really where the bulk of the research into public search engine algorithms goes. And that’s another way of saying that web and enterprise search are very different things.

← Previous PageNext Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.