Analysis of Google and its search offerings, both on the Web and for enterprises. Related subjects include:

January 30, 2007

The Chinese censorship threat continues to ratchet up

Ted Samsen of Infoworld is worried that the Chinese are attempting to ratchet up internet censorship yet further. Welcome to the club, buddy. This problem is a big one, and I don’t think it’s going to be addressed without vigorous action. I particular, I suspect that what is needed may be some major efforts in white-hat spamming. Lance Cottrell of Anonymizer has clever ideas along those lines for fighting censorship in the short term, but I think a bigger effort is needed as well.

Google, by the way, is caught in a tough spot and knows it.

January 23, 2007

But Google trumps most site search

Popular on Digg, for obvious reasons, is a post showing that Google is better for searching Digg than Digg’s own search engine. No shock there. If I want to search Wikipedia for information on astrowidgets, I’ll just google on the phrase wikipedia astrowidgets. That works much better than Wikipedia’s own search.

Speaking of which — if you want to search for my writing, I’m using Google web search technology too. It works like a charm.

January 22, 2007

41 differences between web and enterprise search

Based on a patent application, SEOmoz has discerned 65 aspects of the Google ranking algorithm.* I counted only 24 that really had much at all to do with enterprise search. This leaves 41 or so focused on spam/SEO-fighting and/or on-page linking issues that have no enterprise parallel. And for more depth, here’s a long article from another SEO site, on a specific phrase-concurrence spam-fighting technique that has no apparent applicability to trusted corpuses.
*I highly recommend this link. It is by far the best single-page overview of web search algorithmic issues I’ve ever seen.

I’ve said it before, but it bears repeating — web search and enterprise search (or search of a constrained corpus) are very different technical problems.

November 11, 2006

Text mining and search, joined at the hip

Most people in the text analytics market realize that text mining and search are somewhat related. But I don’t think they often stop to contemplate just how close the relationship is, could be, or someday probably will become. Here’s part of what I mean:

  1. Text mining powers search. The biggest text mining outfits in the world, possibly excepting the US intelligence community, are surely Google, Yahoo, and perhaps Microsoft.
  2. Search powers text mining. Restricting the corpus of documents to mine, even via a keyword search, makes tons of sense. That’s one of the good ideas in Attensity 4.
  3. Text mining and search are powered by the same underlying technologies. For starters, there’s all the tokenization, extraction, etc. that vendors in both areas license from Inxight and its competitors. Beyond that, I think there’s a future play in integrated taxonomy management that will rearrange the text analytics market landscape.

Read more

October 22, 2006

Enterprise-specific web search: High-end web search/mining appliances?

OK. I have a vision of one way search could evolve, which I think deserves consideration on at least a “concept-car” basis. This is all speculative; I haven’t discussed it at length with the vendors who’d need to make it happen, nor checked the technical assumptions carefully myself. So I could well be wrong. Indeed, I’ve at least half-changed my mind multiple times this weekend, just in the drafting of this post. Oh yeah, I’m also mixing several subjects together here too. All-in-all, this is not my crispest post …

Anyhow, the core idea is that large enterprises spider and index a subset of the Web, and use that for most of their employees’ web search needs. Key benefits would include:

Read more

October 3, 2006

Two own-dogfood text-based bug-tracking applications

Last July I wrote about Google’s text-based project management system. Dave Kellogg of Mark Logic offers links to discussion of a related Google project, and adds news of his own — Mark Logic built a text-based bug tracking system in its own MarkLogic technology.

September 1, 2006

Why the BI vendors are integrating with Google OneBox

I’m hearing the same thing from multiple BI vendors, with SAS being the most recent and freshest in my mind — customers want them to “integrate” with Google OneBox. Why Google rather than a better enterprise search technology, such as FAST’s? So far as I’ve figured out, these are the reasons, in no particular order:

The last point, I think, is the most interesting. Lots of people think text search is and/or should be the dominant UI of the future. Now, I’ve been a big fan of natural language command line interfaces ever since the days of Intellect and Lotus HAL. But judging by the market success of those products — or for that matter of voice command/control — I was in a very small minority. Maybe the even simpler search interface — words jumbled together without grammatical structure — will win out instead.

Who knows? Progress is a funny thing. Maybe the ultimate UI will be one that responds well to grunts, hand gestures, and stick-figure drawings. We could call it NeanderHAL, but that would wrong …

August 2, 2006

Introduction to FAST

FAST, aka Fast Search & Transfer ( is a pretty interesting and important company. They have 3500 enterprise customers, a rapidly growing $100 million revenue run rate, and a quarter billion dollars in the bank. Their core business is of course enterprise search, where they boast great scalability, based on a Google-like grid architecture, which they fondly think is actually more efficient than Google’s. Beyond that, they’ve verticalized search, exploiting the modularity of their product line to better serve a variety of niche markets. And they’re active in elementary fact/entity extraction as well. Oh yes – they also have forms of guided navigation, taxonomy-awareness, and probably everything else one might think of as a checkmark item for a search or search-like product.

Read more

July 11, 2006

Google’s internal text-based project/knowledge management

Slashdot turned up an amazing article in Baseline on Google’s infrastructure. There’s lots of gee-whiz stuff in there about server farms, petabytes of disk packed into a standard shipping container so as to allow the setup of more server farms around the globe, and so on. But even more interesting to me was another point, about Google’s internal use of its own technology. In at least one case – a hybrid of project and knowledge management – Google really seems to be doing what other firms only dream about as futures. Here’s the relevant excerpt:

Read more

← Previous Page

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.