March 15th, 2008 Curt Monash
Lynda Moulton introduced me to MuseGlobal, and specifically CEO Kate Noerr, last month. MuseGlobal sort of does ETL (Extract/Transform/Load) for text, although they prefer to call it Gather/Transform/Deliver. In any case, each of the three parts of the process are rather different for text than they are for traditional data warehousing. To wit: Read the rest of this entry »
Posted in MuseGlobal | No Comments »
March 5th, 2008 Curt Monash
Google has begun to introduce a feature whereby, if your search obviously leads you to a single site (e.g., you searched on a company name), you get a second search box to search only within that site. More details at Google and Search Engine Land. Basically, this is Google Site Search made a lot easier to use.
I think this could be a really big deal. Read the rest of this entry »
Posted in Enterprise search, Google, Search and text storage, Specialized search engines | 4 Comments »
March 4th, 2008 Curt Monash
Doug Caverly highlights a Matt Mullenweg quote indicating that about 1/4 of all the blogs ever on Wordpress.com were spam (aka splogs). Now, that’s probably a higher fraction than for the blogoverse overall, because:
- Wordpress.com provides costless hosting; using your own domain costs money.
- Besides being free, Wordpress.com hosting may provide a little “google juice”, which is the whole SEO point of spam blogging.
But there’s one more factor. Splogs have much higher posting frequency than real ones. 10-20+ posts per day is not uncommon, and 50-100+ is not unheard of. That’s 5-10X the post frequency of even the more active human-written blogs. So let’s assume:
- 10% of all blogs are spam.
- 10% of all blogs are actively written by humans.
- 80% of all blogs belong to humans, but are updated very infrequently if at all.
In that case, over 80% (and indeed probably over 90%) of all blog posts are made by machines rather than by human beings.
Please sign up for our feed!
Technorati Tags: Splog, Matt Mullenweg, Wordpress
Posted in Blogosphere, Search engine optimization (SEO), Social software and media, Spam and antispam | No Comments »