SAS

Analysis of data mining/predictive analytics vendor SAS Institute’s efforts in text mining.

Text mining
(in DBMS2) SAS’s main businesses
(in The Monash Report) Data mining

September 18, 2007

Predictive analytics vendors’ text mining sophistication

Steve Gallant of KXEN contacted me over the summer to show me KXEN’s new text mining capability. It was pretty basic bag-of-words stuff, which is still a lot better than nothing, and actually fits pretty well with KXEN’s general simplicity-centric strategy.

This inspired me to check whether there had been any big changes in text mining capabilities at SAS or SPSS. It turned out there hadn’t. SAS is also still on the bag-of-words level. SPSS, however, does do sentiment analysis (pretty obvious, considering their focus on surveys and the like) and negation.

Thanks go out to Mary Crissey and Olivier Jouve for getting back to me when I asked, along with apologies for taking a while to post what they told me.

Categories: SAS, Sentiment analysis, SPSS, Text mining

Text analytics marketplace trends

It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.

*Factiva is the most significant exception. Hint, hint.

If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.

*I.e., part of the “T” in “ETL” (Extract/Transform/Load).

Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more

Categories: Application areas, ClearForest/Reuters, Custom publishing, Factiva/Dow Jones, Mark Logic, nStein, SAS, Search engines, Spam and antispam, Text Analytics Summit, Text mining, Voice of the Customer

2 Comments

May 23, 2007

(A little) more on Business Objects/Inxight

After missing what seems to have been an uninformative press conference anyway, I hooked up later with the Business Objects folks on the phone. I say that it was probably uninformative because in the short call, it was pointed out to me that they really weren’t at liberty to say much anyway. Here are a couple of tidbits I picked up even so.

Business Objects’ text mining partnerships have been more demo/sales-cycle than actual sales up until now. That said, they have a few deals each with Attensity and Inxight (but not with ClearForest, which pulled in its horns prior to being acquired by Reuters). I still think they’re the leading BI vendor in integrating with text mining, SAS perhaps aside (who if nothing else have a lot of fun using text mining for data cleaning). The working Inxight partnership, by the way, was all about the specific app of email compliance, with the demo being based on the publicly available Enron corpus.
Inxight’s visualization technology is in the form of an SDK anyway. So integrating it into BOBJ’s product line should be straightforward. Note: Through the Excelsius acquisition, BOBJ has been trying to gain competitive advantage in the cool-visualization area.
Inxight’s “federation” capability for search is pretty primitive (my term and opinion of course, not theirs). It takes in search result sets from various sources, then clusters and/or refilters them. What it does NOT do is the much harder task of taking actual relevancy rankings from various engines and somehow arbitrating between them. Nor, I’m guessing, does it even assign higher or lower weights to various corpuses or anything like that. Thus, it does not sound terribly competitive with the distributed search capabilities built into any state-of-the-art enterprise search engine.

Categories: Attensity, Business Objects and Inxight, ClearForest/Reuters, Enterprise search, SAS, Search engines, Text mining

5 Comments

March 21, 2007

Text Analytics Summit marketing panel: Membership firmed up

We’ve now solidified the membership of the Text Analytics Summit marketing panel. It is:

Curt Monash, President, Monash Information Services
Dave Kellogg, CEO, Mark Logic Corporation
Michelle De Haaff, VP Marketing, Attensity Corporation
Michel Lemay, VP Marketing, nstein Technologies
Mary Crissey, SAS Analytics Marketing Manager, SAS Institute

Michelle, Michel, and Mary are all obvious choices, responsible for marketing at leading text mining vendors. In addition, Mary has excelled on the same panel in the past, Michel sent me e-mail with some brilliant thoughts on the panel subject, and Attensity has one of the most interesting strategies in the text analytics market.

As for Dave — he’s simply one of the most astute marketing theorists working in software today. And he runs a very interesting text technology company. And he used to be most senior marketing guy in all of business intelligence, when he was SVP at Business Objects. In his copious free time, he writes a really cool blog.

Categories: Attensity, Mark Logic, nStein, SAS, Text Analytics Summit, Text mining

3 Comments

September 1, 2006

Why the BI vendors are integrating with Google OneBox

I’m hearing the same thing from multiple BI vendors, with SAS being the most recent and freshest in my mind — customers want them to “integrate” with Google OneBox. Why Google rather than a better enterprise search technology, such as FAST’s? So far as I’ve figured out, these are the reasons, in no particular order:

Price.
Ease of installation (real or imagined).
The familiar Google brand name.
The familiar Google UI.
Google OneBox’s ability to search relational records, reports, etc. along with more tradtional record types.

The last point, I think, is the most interesting. Lots of people think text search is and/or should be the dominant UI of the future. Now, I’ve been a big fan of natural language command line interfaces ever since the days of Intellect and Lotus HAL. But judging by the market success of those products — or for that matter of voice command/control — I was in a very small minority. Maybe the even simpler search interface — words jumbled together without grammatical structure — will win out instead.

Who knows? Progress is a funny thing. Maybe the ultimate UI will be one that responds well to grunts, hand gestures, and stick-figure drawings. We could call it NeanderHAL, but that would wrong …

Categories: BI integration, Enterprise search, FAST, Google, Natural language processing (NLP), SAS, Search engines

1 Comment

June 23, 2006

The current state of text mining/analytics marketing?

One thing that didn’t go so well at the Text Analytics Summit was the marketing panel. Indeed, when we wracked our brains afterward, Mary Crissey (who was on the panel) and I could only think of a single observation that was actually made about marketing. Namely, she referred to a core truth of marketing: Just selling features doesn’t work (nobody cares). Just selling benefits doesn’t work (you’re not differentiated). What you have to do is sell the connection between your features and desirable benefits.

So I’m going to try to gather some useful observations on marketing here, filling the gap that the panel left. Key questions I’d love input on include:

1. Which feature-benefit connections do you see customers easily accepting?

2. Which feature-benefit connections is it harder to get them to believe?

3. How are customers defining text analytics market segments?

4. What do they see as the key issues in each segement?

5. Which application areas are showing growth even beyond that of the market overall?

I’m particularly interested in comments from the larger vendors that are selling into multiple parts of the text mining and text analytics market. But everybody else’s input would be warmly appreciated too.

The comment thread to this post is open for business!

Categories: About this blog, SAS, SPSS, Text Analytics Summit, Text mining

6 Comments

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

SAS