It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.
*Factiva is the most significant exception. Hint, hint.
If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.
*I.e., part of the “T” in “ETL” (Extract/Transform/Load).
Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet.
Two business publishers who have made major investments in owning text analytics technology are Dow Jones (now sole owners of Factiva) and Reuters (recent purchaser of ClearForest). Beyond that, however, I don’t yet see a lot of activity in the investor/trading market, although ClearForest reported some activity last year and StreamBase reports that one customer is using them for text filtering, presumably alongside the ticker-munching traders usually use StreamBase for.
Obviously, the intelligence market is what fueled the start of the text analytics business, and still provides the majority of revenue at multiple companies. Certainly it’s still going strong. But it’s tough to gauge the growth potential from here, especially since the details of usage are typically classified.
Similar things could be said about pharmaceutical research. Text analytics is totally accepted in that market, but what’s the growth potential from here? And “here” isn’t actually very big (much smaller than intelligence). The related category of patient records analysis looks very promising, but is basically still at the research-project stage. (In general, an explosion in biological IT can be expected when research methods are adapted for clinical use.)
The warranty analysis market, so promising early on, is not showing a lot of growth and depth. The same thing has happened many times before with innovative technologies sold to manufacturing companies’ engineers. It seems to be happening again now.
Voice of the customer* is pretty much the same thing, but for service industries. And the text analytics market for VotC is evidently stronger right now than that for warranty analysis. This makes sense, because the obvious alternative to text analytics – multiple-choice coded forms – is less appealing, due to two application differences:
VotC looks for opinion as well as fact.
VotC looks for input from people under no obligation to share it, and who hence can’t be compelled to play along with a structured form – let alone trained to fill it in accurately.
*Definitional note: Voice of the customer is when customers or prospects communicates with you directly, e.g. via a survey form or an angry email. Reputation management is when you web-scrape and find out what they’re saying to everybody else. At least, I think marketers are still using the terms that way pretty consistently.
Reputation management is surely becoming a standard application for the biggest consumer brands. How deep that market turns out to be, however, remains to be seen.
Text analytics for fraud discovery seems poised to sweep the insurance industry, and then the rest of financial services. Current activity, however, while decent, still seems to consist of more poising than sweeping.
Compliance is a minimum-acceptable-efforts kind of activity in most markets. Accordingly search/clustering seems to be the preferred text-checking approach. Where that’s not the case, the market seems to have gone to specialized products like Assentor (stock brokerage).
Human resources is a good area to sell follow-on applications, at least to enterprises with so many employees that they want to automate the reading of employee feedback. I’m not aware of it being the first-sale app to very many enterprises, however.
SAS used to speak glowingly of text mining used directly for ETL. However, nobody else has talked about this, and even from SAS I get the sense that some of the glow has worn off. As noted above, text analytics is an important ingredient to the transformation part of ETL, but it I think it rarely would be the best option for doing the transformations directly.