Analysis of data mining/predictive analytics vendor SAS Institute’s efforts in text mining.
Steve Gallant of KXEN contacted me over the summer to show me KXEN’s new text mining capability. It was pretty basic bag-of-words stuff, which is still a lot better than nothing, and actually fits pretty well with KXEN’s general simplicity-centric strategy.
This inspired me to check whether there had been any big changes in text mining capabilities at SAS or SPSS. It turned out there hadn’t. SAS is also still on the bag-of-words level. SPSS, however, does do sentiment analysis (pretty obvious, considering their focus on surveys and the like) and negation.
Thanks go out to Mary Crissey and Olivier Jouve for getting back to me when I asked, along with apologies for taking a while to post what they told me.
It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.
*Factiva is the most significant exception. Hint, hint.
If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.
*I.e., part of the “T” in “ETL” (Extract/Transform/Load).
Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more
|Categories: Application areas, ClearForest/Reuters, Custom publishing, Factiva/Dow Jones, Mark Logic, nStein, SAS, Search engines, Spam and antispam, Text Analytics Summit, Text mining, Voice of the Customer||2 Comments|
After missing what seems to have been an uninformative press conference anyway, I hooked up later with the Business Objects folks on the phone. I say that it was probably uninformative because in the short call, it was pointed out to me that they really weren’t at liberty to say much anyway. Here are a couple of tidbits I picked up even so.
- Business Objects’ text mining partnerships have been more demo/sales-cycle than actual sales up until now. That said, they have a few deals each with Attensity and Inxight (but not with ClearForest, which pulled in its horns prior to being acquired by Reuters). I still think they’re the leading BI vendor in integrating with text mining, SAS perhaps aside (who if nothing else have a lot of fun using text mining for data cleaning). The working Inxight partnership, by the way, was all about the specific app of email compliance, with the demo being based on the publicly available Enron corpus.
- Inxight’s visualization technology is in the form of an SDK anyway. So integrating it into BOBJ’s product line should be straightforward. Note: Through the Excelsius acquisition, BOBJ has been trying to gain competitive advantage in the cool-visualization area.
- Inxight’s “federation” capability for search is pretty primitive (my term and opinion of course, not theirs). It takes in search result sets from various sources, then clusters and/or refilters them. What it does NOT do is the much harder task of taking actual relevancy rankings from various engines and somehow arbitrating between them. Nor, I’m guessing, does it even assign higher or lower weights to various corpuses or anything like that. Thus, it does not sound terribly competitive with the distributed search capabilities built into any state-of-the-art enterprise search engine.
|Categories: Attensity, Business Objects and Inxight, ClearForest/Reuters, Enterprise search, SAS, Search engines, Text mining||5 Comments|
We’ve now solidified the membership of the Text Analytics Summit marketing panel. It is:
- Curt Monash, President, Monash Information Services
- Dave Kellogg, CEO, Mark Logic Corporation
- Michelle De Haaff, VP Marketing, Attensity Corporation
- Michel Lemay, VP Marketing, nstein Technologies
- Mary Crissey, SAS Analytics Marketing Manager, SAS Institute
Michelle, Michel, and Mary are all obvious choices, responsible for marketing at leading text mining vendors. In addition, Mary has excelled on the same panel in the past, Michel sent me e-mail with some brilliant thoughts on the panel subject, and Attensity has one of the most interesting strategies in the text analytics market.
As for Dave — he’s simply one of the most astute marketing theorists working in software today. And he runs a very interesting text technology company. And he used to be most senior marketing guy in all of business intelligence, when he was SVP at Business Objects. In his copious free time, he writes a really cool blog.
I’m hearing the same thing from multiple BI vendors, with SAS being the most recent and freshest in my mind — customers want them to “integrate” with Google OneBox. Why Google rather than a better enterprise search technology, such as FAST’s? So far as I’ve figured out, these are the reasons, in no particular order:
- Ease of installation (real or imagined).
- The familiar Google brand name.
- The familiar Google UI.
- Google OneBox’s ability to search relational records, reports, etc. along with more tradtional record types.
The last point, I think, is the most interesting. Lots of people think text search is and/or should be the dominant UI of the future. Now, I’ve been a big fan of natural language command line interfaces ever since the days of Intellect and Lotus HAL. But judging by the market success of those products — or for that matter of voice command/control — I was in a very small minority. Maybe the even simpler search interface — words jumbled together without grammatical structure — will win out instead.
Who knows? Progress is a funny thing. Maybe the ultimate UI will be one that responds well to grunts, hand gestures, and stick-figure drawings. We could call it NeanderHAL, but that would wrong …
|Categories: BI integration, Enterprise search, FAST, Google, Natural language processing (NLP), SAS, Search engines||1 Comment|
One thing that didn’t go so well at the Text Analytics Summit was the marketing panel. Indeed, when we wracked our brains afterward, Mary Crissey (who was on the panel) and I could only think of a single observation that was actually made about marketing. Namely, she referred to a core truth of marketing: Just selling features doesn’t work (nobody cares). Just selling benefits doesn’t work (you’re not differentiated). What you have to do is sell the connection between your features and desirable benefits.
So I’m going to try to gather some useful observations on marketing here, filling the gap that the panel left. Key questions I’d love input on include:
1. Which feature-benefit connections do you see customers easily accepting?
2. Which feature-benefit connections is it harder to get them to believe?
3. How are customers defining text analytics market segments?
4. What do they see as the key issues in each segement?
5. Which application areas are showing growth even beyond that of the market overall?
I’m particularly interested in comments from the larger vendors that are selling into multiple parts of the text mining and text analytics market. But everybody else’s input would be warmly appreciated too.
The comment thread to this post is open for business!