Analysis of enterprise search vendor FAST (Fast Search & Transfer) and its products. Related subjects include:

January 26, 2007

FAST said to be pursuing BI

Dave Kellogg thinks FAST will be ineffective and defocused because of its efforts in business intelligence. I can’t comment on whether that analysis is brilliant, self-serving, or both, because anything I’ve been told on the subject is under embargo.

Embargos were a crucial PR tactic when Regis McKenna exploited them for the original rollout of the Macintosh in 1984. But I suspect that in many cases they’ve quite outlived their usefulness. If I wait between the time I’m briefed and the time the embargo is up to write something, my thoughts about it get fuzzy. If I write something at the time and put it on ice, it may be obsolete because of what other people write in the mean time.

More and more, if something is embargoed, I wind up not writing about it at all.

EDIT: Point #4 of my post on the mismatch between relational databases and text search is pretty relevant here.

November 11, 2006

Text mining and search, joined at the hip

Most people in the text analytics market realize that text mining and search are somewhat related. But I don’t think they often stop to contemplate just how close the relationship is, could be, or someday probably will become. Here’s part of what I mean:

  1. Text mining powers search. The biggest text mining outfits in the world, possibly excepting the US intelligence community, are surely Google, Yahoo, and perhaps Microsoft.
  2. Search powers text mining. Restricting the corpus of documents to mine, even via a keyword search, makes tons of sense. That’s one of the good ideas in Attensity 4.
  3. Text mining and search are powered by the same underlying technologies. For starters, there’s all the tokenization, extraction, etc. that vendors in both areas license from Inxight and its competitors. Beyond that, I think there’s a future play in integrated taxonomy management that will rearrange the text analytics market landscape.

Read more

October 22, 2006

Enterprise-specific web search: High-end web search/mining appliances?

OK. I have a vision of one way search could evolve, which I think deserves consideration on at least a “concept-car” basis. This is all speculative; I haven’t discussed it at length with the vendors who’d need to make it happen, nor checked the technical assumptions carefully myself. So I could well be wrong. Indeed, I’ve at least half-changed my mind multiple times this weekend, just in the drafting of this post. Oh yeah, I’m also mixing several subjects together here too. All-in-all, this is not my crispest post …

Anyhow, the core idea is that large enterprises spider and index a subset of the Web, and use that for most of their employees’ web search needs. Key benefits would include:

Read more

September 1, 2006

Why the BI vendors are integrating with Google OneBox

I’m hearing the same thing from multiple BI vendors, with SAS being the most recent and freshest in my mind — customers want them to “integrate” with Google OneBox. Why Google rather than a better enterprise search technology, such as FAST’s? So far as I’ve figured out, these are the reasons, in no particular order:

The last point, I think, is the most interesting. Lots of people think text search is and/or should be the dominant UI of the future. Now, I’ve been a big fan of natural language command line interfaces ever since the days of Intellect and Lotus HAL. But judging by the market success of those products — or for that matter of voice command/control — I was in a very small minority. Maybe the even simpler search interface — words jumbled together without grammatical structure — will win out instead.

Who knows? Progress is a funny thing. Maybe the ultimate UI will be one that responds well to grunts, hand gestures, and stick-figure drawings. We could call it NeanderHAL, but that would wrong …

August 2, 2006

Introduction to FAST

FAST, aka Fast Search & Transfer ( is a pretty interesting and important company. They have 3500 enterprise customers, a rapidly growing $100 million revenue run rate, and a quarter billion dollars in the bank. Their core business is of course enterprise search, where they boast great scalability, based on a Google-like grid architecture, which they fondly think is actually more efficient than Google’s. Beyond that, they’ve verticalized search, exploiting the modularity of their product line to better serve a variety of niche markets. And they’re active in elementary fact/entity extraction as well. Oh yes – they also have forms of guided navigation, taxonomy-awareness, and probably everything else one might think of as a checkmark item for a search or search-like product.

Read more

August 2, 2006

Petabyte-scale search scalability

I’ve had a couple of good talks with Andrew McKay of FAST recently. When discussing FAST’s scalability, he likes to use the word “petabytes.” I haven’t probed yet as to exactly which corpus(es) he’s referring to, but here’s a thought for comparison:

Google, if I recall correctly, caches a little over 100Kb/page (assuming, of course, that the page has at least that much text, which is not necessarily the case at all). And they were up into Carl Sagan range – i.e., “billions and billions” – before they stopped giving counts of how many pages they’d indexed.

10 billion times 100 Kb is, indeed, a petabyte. So, in the roughest of approximations, the Web is a petabyte-range corpus.

EDIT: Hah. I bet eBay and its 2-petabyte database is one of the examples Andrew is referring to …

July 29, 2006

Analyst reports about enterprise search

Gartner and Forrester have high opinions of FAST. Not coincidentally, you can download both those firms’ recent search industry survey reports from almost any page of Of the two, Forrester’s is both better and more recent.

Summarizing brutally, the big firms’ consensus seems to be:

Forrester is particularly harsh on Convera. Presumably this has much to do with the fact that Convera did not cooperate well with the survey process. I shall not speculate as to which way the causality runs there – but I should note that Convera was quite cooperative with my research last week.

July 29, 2006

Web search and enterprise search are coming together

Web search and enterprise search are in many ways fundamentally different problems. The biggest problem in web search is screening out pages that deliberately pretend to be relevant to a search. The second biggest problem is picking out the crème de la crème from a long list of essentially good hits. In enterprise search, on the other hand, the biggest problem is finding a single document, or single fact, that is lonely at best, and if you’re unlucky doesn’t exist in the corpus at all. Document structures are also completely different, as are linking structures and almost every other input to the ranking algorithms except the raw words themselves.

Even so, the businesses and technologies of web and enterprise search are beginning to combine. Read more

← Previous Page

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Warning: include(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/texttechnologies/public_html/wp-content/themes/monash/static_sidebar.php on line 29

Warning: include( failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/texttechnologies/public_html/wp-content/themes/monash/static_sidebar.php on line 29

Warning: include(): Failed opening '' for inclusion (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/texttechnologies/public_html/wp-content/themes/monash/static_sidebar.php on line 29