Analysis of data mining/predictive analytics vendor SPSS’s efforts in text mining.
Text analytics application areas typically fall into one or more of three broad, often overlapping domains:
- Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).
- Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.
- Aiding text search, custom publishing, and other electronic document-shuffling use cases, often via document augmentation.
For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:
- A bunch of documents are analyzed to ascertain the ideas expressed in them.
- A count is made as to how many times each idea turns up.
- The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.
Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.
But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. Read more
|Categories: Attensity, BI integration, Investment research and trading, SPSS, Text mining, Voice of the Customer||12 Comments|
I emailed a bit with Olivier Jouve last week, and chatted with him at the Text Analytics Summit yesterday. He cited a figure of 2400 SPSS text mining users (unique user organizations). The majority of these are for a low-cost, desktop-based surveys product. But when I pressed him, he eventually gave a 500-1000 figure for actual Text Mining For Clementine users. Read more
Steve Gallant of KXEN contacted me over the summer to show me KXEN’s new text mining capability. It was pretty basic bag-of-words stuff, which is still a lot better than nothing, and actually fits pretty well with KXEN’s general simplicity-centric strategy.
This inspired me to check whether there had been any big changes in text mining capabilities at SAS or SPSS. It turned out there hadn’t. SAS is also still on the bag-of-words level. SPSS, however, does do sentiment analysis (pretty obvious, considering their focus on surveys and the like) and negation.
Thanks go out to Mary Crissey and Olivier Jouve for getting back to me when I asked, along with apologies for taking a while to post what they told me.
If there was one theme to this year’s Text Analytics Summit, it’s “Voice of the Customer.” Attensity’s pre-conference press release was about a Voice of the Customer offering. Clarabridge’s sponsored user talk was about a Voice of the Customer app. SPSS’s marketing materials emphasized Voice of the Customer. Sentiment analysis and Web/blog scraping were frequently mentioned, in contexts such as “customer care,” “reputation management,” and/or “competitive intelligence.”
But above all, it was “Voice of the Customer.” I know it’s till June, but I think we have our text analytics industry buzzphrase of the year.
|Categories: Attensity, Clarabridge, SPSS, Text Analytics Summit, Text mining, Voice of the Customer||3 Comments|
I’m a huge fan of the idea that companies should deliberately capture as much information as possible for analysis. In the case of text, since I personally hate structured survey forms, I believe that free-form surveys have the potential to capture a lot more information than traditionally Procustean abominations do. SPSS indicated that there’s indeed some activity in this regard.
I found another example. Read more
One thing that didn’t go so well at the Text Analytics Summit was the marketing panel. Indeed, when we wracked our brains afterward, Mary Crissey (who was on the panel) and I could only think of a single observation that was actually made about marketing. Namely, she referred to a core truth of marketing: Just selling features doesn’t work (nobody cares). Just selling benefits doesn’t work (you’re not differentiated). What you have to do is sell the connection between your features and desirable benefits.
So I’m going to try to gather some useful observations on marketing here, filling the gap that the panel left. Key questions I’d love input on include:
1. Which feature-benefit connections do you see customers easily accepting?
2. Which feature-benefit connections is it harder to get them to believe?
3. How are customers defining text analytics market segments?
4. What do they see as the key issues in each segement?
5. Which application areas are showing growth even beyond that of the market overall?
I’m particularly interested in comments from the larger vendors that are selling into multiple parts of the text mining and text analytics market. But everybody else’s input would be warmly appreciated too.
The comment thread to this post is open for business!
One of the major factors driving successful use of advanced analytic tools is direct initiatives to procure more data. The single best example I can think of is the gaming industry’s use of otherwise-contrived loyalty cards; improved marketing based on that data at chains like Harrah’s seems to produce upwards of 100% of total profits.
So can we apply the same approach to text mining? One place would be surveys. Rather than those annoying, contrived forms demanding we fill in a lot of choices as if we were taking the SATs all over again, maybe users would be more revealing if they could just write whatever they wanted? The obvious firm to ask is SPSS, which is big both in surveys and text mining, not to mention the intersection of the two markets. So I emailed Olivier Jouve, and he shot back an answer from an airport. Read more