Text mining

Analysis of text mining companies, technology, and trends. Related subjects include:

October 8, 2007

SAP is acquiring Inxight

More precisely, SAP is acquiring Business Objects, and of course Business Objects already acquired Inxight.

 This could be interesting …

October 6, 2007

The Clarabridge approach to text mining

And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)

October 5, 2007

Text mining applications as per Attensity and Clarabridge

Besides asking them technical questions, I surveyed Attensity and Clarabridge last week about text mining application trends, getting generously detailed answers from Michelle De Haaff of Attensity and Justin Langseth of Clarabridge. Perhaps the most important point to emerge was that it’s not just about particular apps. Enterprises are doing text mining POCs (Proofs of Concept) around specific apps, commonly in the CRM area, but immediately structuring the buying process in anticipation of a rollout across multiple departments in the enterprise.

Other highlights of what they said included: Read more

October 5, 2007

Nice new phrase — Voice of the Market

Michelle DeHaaff, Attensity’s VP of Marketing, just introduced me to a nice phrase — Voice of the Market, obviously related to Voice of the Customer. As Michelle put it:

We’ve also expanded into what we call Voice of the Market data – providing a combination of analysis on external and internal data

– this is how we’ve heard our customers put it:

*Customer feedback comes in many forms……when customers don’t know you are listening (blogs, public web forums) it is important to hear what they say.

*When customers purposely tell you something (via emails, in surveys, captured in customer service notes) it is not only important, but expected….

The first of those would be Voice of the Market, while the second would be Voice of the Customer.

October 5, 2007

When to use exhaustive extraction

I’ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they’re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge’s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. However, their current high end is several million documents* per year. They suspect that in some current projects with much higher volumes the default may finally be turned off. Read more

October 5, 2007

David Bean of Attensity explains sentiment and other qualifiers

David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what? Read more

September 18, 2007

Predictive analytics vendors’ text mining sophistication

Steve Gallant of KXEN contacted me over the summer to show me KXEN’s new text mining capability. It was pretty basic bag-of-words stuff, which is still a lot better than nothing, and actually fits pretty well with KXEN’s general simplicity-centric strategy.

This inspired me to check whether there had been any big changes in text mining capabilities at SAS or SPSS. It turned out there hadn’t. SAS is also still on the bag-of-words level. SPSS, however, does do sentiment analysis (pretty obvious, considering their focus on surveys and the like) and negation.

Thanks go out to Mary Crissey and Olivier Jouve for getting back to me when I asked, along with apologies for taking a while to post what they told me.

August 3, 2007

More on text processing in CEP

StreamBase isn’t the only complex event/stream processing (CEP) vendor doing text processing. Progress Apama is as well. Stemming, fuzzy matching, and so on seem to happen all the time. But there’s also at least one case where they flat-out do sentiment analysis.  Edit:  I presume this is in the investment market, as that’s where most of Progress Apama’s business is. Read more

July 26, 2007

Event stream processors active in text filtering

OK. I secured permission to actually quote the details on something I’d previously dropped a small hint about — stream processing for text messages. Traditionally, that’s been the province of enterprise search companies. A decade ago, Verity had a kernel group of 6-7 engineers under Phil Nelson. They managed to produce not only a decent search engine, but a search engine “turned on its side” as well. I.e., instead of running one query against a corpus, they could run many queries each against documents as they arrived, one document at a time. Subsequently, the same idea has been implemented by most enterprise search providers, at least those that are serious about the intelligence market.

Well, the event-processing guys are active in that market too. At least StreamBase is. Read more

July 22, 2007

Text analytics marketplace trends

It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.

*Factiva is the most significant exception. Hint, hint.

If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.

*I.e., part of the “T” in “ETL” (Extract/Transform/Load).

Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more

← Previous PageNext Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.