March 15, 2008

MuseGlobal – ETL for text, sort of

Lynda Moulton introduced me to MuseGlobal, and specifically CEO Kate Noerr, last month. MuseGlobal sort of does ETL (Extract/Transform/Load) for text, although they prefer to call it Gather/Transform/Deliver. In any case, each of the three parts of the process are rather different for text than they are for traditional data warehousing. To wit: Read more

March 5, 2008

Google could dominate single-site search

Google has begun to introduce a feature whereby, if your search obviously leads you to a single site (e.g., you searched on a company name), you get a second search box to search only within that site. More details at Google and Search Engine Land. Basically, this is Google Site Search made a lot easier to use.

I think this could be a really big deal. Read more

March 4, 2008

Over 80 percent of blog posts are probably spam

Doug Caverly highlights a Matt Mullenweg quote indicating that about 1/4 of all the blogs ever on WordPress.com were spam (aka splogs). Now, that’s probably a higher fraction than for the blogoverse overall, because:

But there’s one more factor. Splogs have much higher posting frequency than real ones. 10-20+ posts per day is not uncommon, and 50-100+ is not unheard of. That’s 5-10X the post frequency of even the more active human-written blogs. So let’s assume:

In that case, over 80% (and indeed probably over 90%) of all blog posts are made by machines rather than by human beings.

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.