Comprehensive or exhaustive extraction

Analysis of exhaustive or comprehensive extraction, an approach to text mining that entails extracting a broad range assertions from text and dumping them into a (usually relational) database for further analysis. Related subjects include:

October 24, 2008

Maybe text mining SHOULD be playing a bigger role in data warehousing

When I chatted last week with David Bean of Attensity, I commented to him on a paradox:

Many people think text information is important to analyze, but even so data warehouses don’t seem to wind up holding very much of it.

Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining

5 Comments

June 10, 2008

5 ideas for how to pick between Attensity and Clarabridge

Jim D. of UPS asked in the comment thread to the recent Attensity update post how one should decide between Attensity and Clarabridge. I wrote an answer, and then decided to just split it out in a separate post. Here are five ideas about how to pick between Attensity and Clarabridge for the kind of Voice of the Customer/Market application both companies are focusing on.

1. Attensity is the older company than Clarabridge, and is good at more things. Is Clarabridge really good at everything you want them to be?

2. In particular, Attensity has more overall sophistication at linguistic extraction. Do any of the differences matter to you?

3. Both companies are working hard on ease of use, for multiple kinds of user (business user tweaking linguistic rules, IT user, etc.). Whose approach and feature set do you like better?

4. Usually, buying one of these products involves some professional services. Whose organization do you like better?

5. Attensity’s default database schema for its exhaustive extraction is pretty flat and normalized, as befits a happy Teradata partner. Clarabridge’s is more of a star schema, as befits a bunch of ex-Microstrategy guys. Either can be straightforwardly translated into the other, so you may not care — but do you?

Categories: Attensity, Clarabridge, Competitive intelligence, Comprehensive or exhaustive extraction, Text mining, Voice of the Customer

4 Comments

November 14, 2007

Clarabridge does SaaS, sees Inxight

I just had a quick chat with text mining vendor Clarabridge’s CEO Sid Banerjee. Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question. Attensity is unsurprisingly #1. What’s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force. Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.

The most interesting point was text mining SaaS (Software as a Service). When Clarabridge first put out its “We offer SaaS now!” announcement, I yawned. But Sid tells me that about half of Clarabridge’s deals now are actually SaaS. The way the SaaS technology works is pretty simple. The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center. If there’s a desire to join the results of the text analysis with some tabular data from the client’s data warehouse, the needed columns get sent over as well. And then Clarabridge does its thing. Read more

Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, IBM and UIMA, Software as a Service (SaaS), Text mining, Text mining SaaS

1 Comment

October 6, 2007

The Clarabridge approach to text mining

And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)

Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.) Read more

Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies, Text mining

2 Comments

October 5, 2007

When to use exhaustive extraction

I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read more

Categories: Attensity, Clarabridge, Comprehensive or exhaustive extraction, Text mining

1 Comment

October 5, 2007

David Bean of Attensity explains sentiment and other qualifiers

David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what? Read more

Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining, Voice of the Customer

1 Comment

March 26, 2007

Clarabridge takes on Attensity

Text mining newbie Clarabridge gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine. Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form.

The closest analogy to what Clarabridge does is Attensity’s new(ish) strategy – extract “facts” from documents and dump them into a relational database management system. In particular, Clarabridge and Attensity alike make the case “Our categorization is more flexible because it’s applied only after the extraction happens.”
Clarabridge’s sweet spot is extracting user opinions from short documents. E.g., the customer uses cases they talk about are customer feedback forms, public blog postings, etc. about A. hotels and B. consumer software products.
Clarabridge has a strong business intelligence mentality, describing the product as “ETL for unstructured data.” But then, it’s spun out of a BI consultancy that itself was founded by Microstrategy veterans.
Clarabridge uses a different database schema than Attensity. Attensity’s fact-relationship network (FRN) is basically just two thin, long tables. Clarabridge, however, uses a Microstrategy-like star schema, in which different kinds of things that you can tokenize correspond to different dimensions.

Frankly, if somebody wants an alternative to the Attensity/Teradata/Business Objects partnership they could do worse than talk with Clarabridge.

Categories: Attensity, BI integration, Clarabridge, Comprehensive or exhaustive extraction, Text mining

Attensity, extractive exhaustion, and the FRN

Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.
Read more

Categories: Attensity, BI integration, Comprehensive or exhaustive extraction, Text Analytics Summit, Text mining

10 Comments

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Comprehensive or exhaustive extraction

Maybe text mining SHOULD be playing a bigger role in data warehousing

5 ideas for how to pick between Attensity and Clarabridge

Clarabridge does SaaS, sees Inxight

The Clarabridge approach to text mining

When to use exhaustive extraction

David Bean of Attensity explains sentiment and other qualifiers

Clarabridge takes on Attensity

Attensity, extractive exhaustion, and the FRN

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin