November 14th, 2007 Curt Monash
I just had a quick chat with text mining vendor Clarabridge’s CEO Sid Banerjee. Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question. Attensity is unsurprisingly #1. What’s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force. Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.
The most interesting point was text mining SaaS (Software as a Service). When Clarabridge first put out its “We offer SaaS now!” announcement, I yawned. But Sid tells me that about half of Clarabridge’s deals now are actually SaaS. The way the SaaS technology works is pretty simple. The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center. If there’s a desire to join the results of the text analysis with some tabular data from the client’s data warehouse, the needed columns get sent over as well. And then Clarabridge does its thing.
Read the rest of this entry »
Posted in BI integration, Clarabridge, Comprehensive or exhaustive extraction, IBM and UIMA, Text mining | 1 Comment »
October 6th, 2007 Curt Monash
And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)
- Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
- Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.)
Read the rest of this entry »
Posted in BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies and context identification, Text mining | 1 Comment »
October 5th, 2007 Curt Monash
I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read the rest of this entry »
Posted in Attensity, Clarabridge, Comprehensive or exhaustive extraction, Text mining | 1 Comment »
October 5th, 2007 Curt Monash
David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what?
Our exhaustive extraction approach doesn’t compromise detection of qualifiers* because we recognize the qualifications while we have access to the complete linguistic information of the input. Much of that information is later stripped away, since it’s way more information than a user would want. We make sure we project qualifications like you mention in the final representations. In fact, we’ve put a lot of effort into recognizing “voicing,” i.e. distinguishing among negations, conditional statements, and variations in the degree of sentiment.
Examples will help here:
Read the rest of this entry »
Posted in Attensity, Comprehensive or exhaustive extraction, Text mining | No Comments »
March 26th, 2007 Curt Monash
Text mining newbie Clarabridge gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine. Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form.
- The closest analogy to what Clarabridge does is Attensity’s new(ish) strategy – extract “facts” from documents and dump them into a relational database management system. In particular, Clarabridge and Attensity alike make the case “Our categorization is more flexible because it’s applied only after the extraction happens.”
- Clarabridge’s sweet spot is extracting user opinions from short documents. E.g., the customer uses cases they talk about are customer feedback forms, public blog postings, etc. about A. hotels and B. consumer software products.
- Clarabridge has a strong business intelligence mentality, describing the product as “ETL for unstructured data.” But then, it’s spun out of a BI consultancy that itself was founded by Microstrategy veterans.
- Clarabridge uses a different database schema than Attensity. Attensity’s fact-relationship network (FRN) is basically just two thin, long tables. Clarabridge, however, uses a Microstrategy-like star schema, in which different kinds of things that you can tokenize correspond to different dimensions.
Frankly, if somebody wants an alternative to the Attensity/Teradata/Business Objects partnership they could do worse than talk with Clarabridge.
Technorati Tags: Attensity, Clarabridge, text mining
Posted in Attensity, BI integration, Clarabridge, Comprehensive or exhaustive extraction, Text mining | No Comments »
June 24th, 2006 Curt Monash
Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post I made at the time, and on Attensity’s web site. This time, David’s Text Analytics Summit speech was basically a pitch for Attensity’s latest product release – and it was a pitch well worth hearing.
Read the rest of this entry »
Posted in Attensity, BI integration, Comprehensive or exhaustive extraction, Text Analytics Summit, Text mining | 7 Comments »