Spam and antispam

Analysis of spam, both e-mail and web-based, and of technology that attempts to defeat it.

June 19, 2008

3 specialized markets for text analytics

In the previous post, I offered a list of eight linguistics-based market segments, and a slide deck surveying them. And I promised a series of follow-up posts based on the slides.

Read more

June 19, 2008

The Text Analytics Marketplace: Competitive landscape and trends

As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:

1. Web search

2. Public-facing site search

3. Enterprise search and knowledge management

4. Custom publishing

5. Text mining and extraction

Three are more standalone:

6. Spam filtering

7. Voice recognition

8. Machine translation

Read more

May 8, 2008

Google seems to have rehabilitated us

As previously noted, we were de-indexed by Google, due to the injection of a whole lot of spammy hidden links. We’re back now, after about two weeks, even on the blog (this one) where there was no official de-indexing notice and hence no way to apply for re-consideration. And thus we once again have high rankings for search terms such as Netezza, DATAllegro, Clarabridge, and Attivio.

We’re designing a new blog theme — the current one is just an emergency stopgap — that will (among myriad more important virtues) be more SEO-friendly. I’ll be curious to see whether that makes much actual difference from a search ranking standpoint.

April 25, 2008

Drive-by Google de-listing

As previously noted, we got hit with some hidden text, probably by SQL injection, and that lead to a Google de-listing. Of the three blogs affected by the attack, I got a de-indexing notice for one (DBMS2); another was de-listed without a notice (Text Technologies); and a third seems to have waltzed through still indexed (Software Memories). I also received a de-indexing notice for another site I have nothing to do with and indeed had never heard of before. Go figure …

We’ve now upgraded to Wordpress 2.5, which should close the vulnerability. (Thank you Melissa Bradshaw!) Fearing our old, buggy theme would degrade further, we upgraded to a new one, Biru, designed by Bob. There are some teething-pain stability issues, but if they don’t cause a reversion in the next day, I’ll apply to Google for re-inclusion. (Uh, does anybody have some boundaries around how long that’s likely to take?)

All these hours of aggravation because some criminal wanted a bit of SEO advantage …

March 4, 2008

Over 80 percent of blog posts are probably spam

Doug Caverly highlights a Matt Mullenweg quote indicating that about 1/4 of all the blogs ever on Wordpress.com were spam (aka splogs). Now, that’s probably a higher fraction than for the blogoverse overall, because:

But there’s one more factor. Splogs have much higher posting frequency than real ones. 10-20+ posts per day is not uncommon, and 50-100+ is not unheard of. That’s 5-10X the post frequency of even the more active human-written blogs. So let’s assume:

In that case, over 80% (and indeed probably over 90%) of all blog posts are made by machines rather than by human beings.

February 3, 2008

19 Microsoft/Yahoo synergies that could revolutionize the Internet

Many – perhaps most — commentators on Microsoft’s bid for Yahoo are thoroughly missing the point. The most interesting part of Microsoft’s bid for Yahoo isn’t the horse-race retrospective “How did they screw up so much as to need each other?” It’s not the incipient bidding war for Yahoo. And it’s certainly not the antitrust implications.

The Microsoft/Yahoo combination could revolutionize the Internet. I’m serious. The opportunities for huge synergies might just be enough to blast the merged companies out of their current uncreative, Innovator’s Dilemma funks. Search is open for radical transformation in user interface, universal search relevancy, Web/enterprise integration, and just about everything to do with advertising and monetization. Email stands to be utterly reinvented. Portals and business intelligence have only scratched the surface of their potential. And social networking is of course in its infancy.

Here’s an overview of where some synergies and opportunities for a combined Microsoft/Yahoo lie.

Read more

January 26, 2008

Anatomy of spam blogs

A post that gives you a clear sense of how gobbledydook is automatically generated (from another knowledgeable black-hat SEO who can’t be bothered to get his permalink structure sensible ;) )

January 16, 2008

Automation secrets of black hat SEO

XMCP writes one of the better black hat SEO blogs. In a post last November, he laid out a ton of advice about automating black hat SEO. Personally, I don’t approve of doing black hat SEO. Still, it’s an intellectually interesting subject. What’s more, black hat SEOs create a large fraction of all websites, and certainly of all blog comments, links, and so on. So it’s interesting to track them.

Most interesting to me and probably to most readers here is the part that shows where black hat SEOs get their content: Read more

January 8, 2008

A very fast splogger

The first post ever on Strategic Messaging went up at 2:49 am. Within four hours, I had my first splog trackbacks, all from the same site. The strategicmessaging.com domain itself had just repropagated through DNS hours earlier, and had no incoming links other than Whois and the like.

Pretty impressive spamming. Not that it did him any good, of course, except insofar as he was stealing a bit of my content …

January 2, 2008

Restoring security and function to my mail and websites

OK. Here’s the story as I now know it.

  1. monash.com was hit by a massive mail-bomb Christmas Eve. My email and websites went down for a while as a consequence. What’s more, with a flooded mail queue, there were further mail problems through at least 12/28. Some mail bounced, and other mail that appeared to go through was lost forever. If you’ve mailed me since 12/24 and I haven’t answered, please send again.
  2. The mail-bomb paved the way for an injection of some malware. I started noticing possible trojans on monash.com 12/31. Melissa Bradshaw, my stellar web designer, noticed Javascript that she hadn’t written, both on monash.com and dbms2.com. So far as we could tell, standard anti-malware client protections were sufficient to keep any trojans from being successfully downloaded to clients.
  3. My very attentive web hosting company, Dimension Servers, is rebuilding its Linux kernel accordingly. Scheduled downtime for all my sites and mail is midnight to 3:00 am Eastern tonight, but that’s obviously just a rough estimate. Company president Jonathan MacAllister telephoned me to tell me this personally, notwithstanding that his wife delivered a baby by emergency C-section today. (Wife and baby are OK!)
  4. Jonathan also told me that after the attack, he bought a Cisco appliance. Every web hosting company needs to do that, as appliances are much more efficient at dealing with overloading attacks than the servers themselves. Cisco was a brand choice pretty much dictated by his remote data center.
  5. David Ferris and Richi Jennings have convinced me to move monash.com email to Google’s free mail hosting service. This is what they’re doing with ferris.com mail and all of Richi’s domains as well. NO analysts are more reliable on email than David and Richi. And hosting is surely no exception, as David and I did a research project together some years ago uncovering the Critical Path sham.
  6. The net effect of that move will be that monash.com and dbms2.com have their email managed quite separately, so if you can’t get me at one, please try the other. Generally, if you don’t know me you should write to monash.com, and I’ll probably write back from dbms2.com.
  7. I’ll post about all this again after things seem to have worked out, possibly over on the Monash Report.

Happy New Year,

CAM

Next Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.