Comprehensive or exhaustive extraction

Analysis of exhaustive or comprehensive extraction, an approach to text mining that entails extracting a broad range assertions from text and dumping them into a (usually relational) database for further analysis. Related subjects include:

October 24, 2008

Maybe text mining SHOULD be playing a bigger role in data warehousing

When I chatted last week with David Bean of Attensity, I commented to him on a paradox:

Many people think text information is important to analyze, but even so data warehouses don’t seem to wind up holding very much of it.

Read more

June 10, 2008

5 ideas for how to pick between Attensity and Clarabridge

Jim D. of UPS asked in the comment thread to the recent Attensity update post how one should decide between Attensity and Clarabridge. I wrote an answer, and then decided to just split it out in a separate post. Here are five ideas about how to pick between Attensity and Clarabridge for the kind of Voice of the Customer/Market application both companies are focusing on.

1. Attensity is the older company than Clarabridge, and is good at more things. Is Clarabridge really good at everything you want them to be?

2. In particular, Attensity has more overall sophistication at linguistic extraction. Do any of the differences matter to you?

3. Both companies are working hard on ease of use, for multiple kinds of user (business user tweaking linguistic rules, IT user, etc.). Whose approach and feature set do you like better?

4. Usually, buying one of these products involves some professional services. Whose organization do you like better?

5. Attensity’s default database schema for its exhaustive extraction is pretty flat and normalized, as befits a happy Teradata partner. Clarabridge’s is more of a star schema, as befits a bunch of ex-Microstrategy guys. Either can be straightforwardly translated into the other, so you may not care — but do you?

November 14, 2007

Clarabridge does SaaS, sees Inxight

I just had a quick chat with text mining vendor Clarabridge’s CEO Sid Banerjee. Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question. Attensity is unsurprisingly #1. What’s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force. Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.

The most interesting point was text mining SaaS (Software as a Service). When Clarabridge first put out its “We offer SaaS now!” announcement, I yawned. But Sid tells me that about half of Clarabridge’s deals now are actually SaaS. The way the SaaS technology works is pretty simple. The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center. If there’s a desire to join the results of the text analysis with some tabular data from the client’s data warehouse, the needed columns get sent over as well. And then Clarabridge does its thing. Read more

October 6, 2007

The Clarabridge approach to text mining

And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)

October 5, 2007

When to use exhaustive extraction

I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read more

October 5, 2007

David Bean of Attensity explains sentiment and other qualifiers

David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what? Read more

March 26, 2007

Clarabridge takes on Attensity

Text mining newbie Clarabridge gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine. Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form.

Frankly, if somebody wants an alternative to the Attensity/Teradata/Business Objects partnership they could do worse than talk with Clarabridge.

June 24, 2006

Attensity, extractive exhaustion, and the FRN

Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.
Read more

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.