Comprehensive or exhaustive extraction
Analysis of exhaustive or comprehensive extraction, an approach to text mining that entails extracting a broad range assertions from text and dumping them into a (usually relational) database for further analysis. Related subjects include:
Many people think text information is important to analyze, but even so data warehouses don’t seem to wind up holding very much of it.
|Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining||5 Comments|
Jim D. of UPS asked in the comment thread to the recent Attensity update post how one should decide between Attensity and Clarabridge. I wrote an answer, and then decided to just split it out in a separate post. Here are five ideas about how to pick between Attensity and Clarabridge for the kind of Voice of the Customer/Market application both companies are focusing on.
1. Attensity is the older company than Clarabridge, and is good at more things. Is Clarabridge really good at everything you want them to be?
2. In particular, Attensity has more overall sophistication at linguistic extraction. Do any of the differences matter to you?
3. Both companies are working hard on ease of use, for multiple kinds of user (business user tweaking linguistic rules, IT user, etc.). Whose approach and feature set do you like better?
4. Usually, buying one of these products involves some professional services. Whose organization do you like better?
5. Attensity’s default database schema for its exhaustive extraction is pretty flat and normalized, as befits a happy Teradata partner. Clarabridge’s is more of a star schema, as befits a bunch of ex-Microstrategy guys. Either can be straightforwardly translated into the other, so you may not care — but do you?
|Categories: Attensity, Clarabridge, Competitive intelligence, Comprehensive or exhaustive extraction, Text mining, Voice of the Customer||4 Comments|
I just had a quick chat with text mining vendor Clarabridge’s CEO Sid Banerjee. Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question. Attensity is unsurprisingly #1. What’s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force. Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.
The most interesting point was text mining SaaS (Software as a Service). When Clarabridge first put out its “We offer SaaS now!” announcement, I yawned. But Sid tells me that about half of Clarabridge’s deals now are actually SaaS. The way the SaaS technology works is pretty simple. The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center. If there’s a desire to join the results of the text analysis with some tabular data from the client’s data warehouse, the needed columns get sent over as well. And then Clarabridge does its thing. Read more
|Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, IBM and UIMA, Software as a Service (SaaS), Text mining, Text mining SaaS||1 Comment|
And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)
- Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
- Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.) Read more
|Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies, Text mining||2 Comments|
I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read more
David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what? Read more
|Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining, Voice of the Customer||1 Comment|
Text mining newbie Clarabridge gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine. Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form.
- The closest analogy to what Clarabridge does is Attensity’s new(ish) strategy – extract “facts” from documents and dump them into a relational database management system. In particular, Clarabridge and Attensity alike make the case “Our categorization is more flexible because it’s applied only after the extraction happens.”
- Clarabridge’s sweet spot is extracting user opinions from short documents. E.g., the customer uses cases they talk about are customer feedback forms, public blog postings, etc. about A. hotels and B. consumer software products.
- Clarabridge has a strong business intelligence mentality, describing the product as “ETL for unstructured data.” But then, it’s spun out of a BI consultancy that itself was founded by Microstrategy veterans.
- Clarabridge uses a different database schema than Attensity. Attensity’s fact-relationship network (FRN) is basically just two thin, long tables. Clarabridge, however, uses a Microstrategy-like star schema, in which different kinds of things that you can tokenize correspond to different dimensions.
Frankly, if somebody wants an alternative to the Attensity/Teradata/Business Objects partnership they could do worse than talk with Clarabridge.
|Categories: Attensity, BI integration, Clarabridge, Comprehensive or exhaustive extraction, Text mining||Leave a Comment|
Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.