Text mining is science-project artificial intelligence. Fiction. Text mining is proven in many practical applications.
To implement text mining, you need computational linguists. Fact. Monash’s Second Law of Commercial Semantics states “Where there are ontologies, there is consulting.” And it’s linguists, or reasonable facsimiles of same, who do the consulting.
To use text mining, you need computational linguists. Fiction. When last I counted, the number of known computational linguists working for end-user organizations, worldwide, was precisely 1, at Procter & Gamble. (Intelligence agencies excepted, of course.) I’d guess it’s higher now, but I probably could still count them all without taking my socks off.
CRM applications are driving the growth of text mining. Fact. Most current growth in text mining seems to come from Voice of the Customer and Voice of the Market/competitive intelligence applications. And a couple of years ago, when SAS and SPSS had a joint boom in text mining, a lot of that was coming from CRM.
Text mining products are useful mainly for large enterprises. More fact than fiction. Text mining makes the most sense when you have too much text for humans to read and summarize.
Text mining doesn’t fit well with relational databases. Fiction. The fastest-growing text mining companies seem to be Attensity and Clarabridge, who consistently extract textual information into relational databases.
Text mining imposes structure on unstructured* data. More fact than fiction. Most text mining applications involve examining free-text documents and creating entries in relational or XML databases. Most people would call that a transition from unstructured to structured form.
*I still don’t like the “structured/unstructured” distinction, but with repetition I’m getting somewhat inured to it.
Enterprise search is an alternative to text mining. Fact. You can use a high-end search engine to cluster documents and look for trends and insight. It’s not the real McCoy, but in some cases it gives you 80% of the benefit of the real thing.
Text mining is an ingredient, not a product category. Part fact, part fiction. The biggest text mining efforts in the world are probably at Google, Yahoo, Microsoft search, and Dow Jones/Factiva. Antispam vendors also invest a lot in text mining. Two of the top five independent text mining vendors were acquired this year (ClearForest and Inxight). And of the many dozens of small text mining independents, most are focused on specific niches.
Even so, Attensity, Clarabridge, and Temis show that, at least for now, text mining remains a legitimate product category.
The text mining industry is in trouble. Part fact, part fiction. As I recently ranted, even the leading text mining vendors are letting many opportunities pass them by. And like many software sectors, text mining seems poised to be absorbed via large-company acquisition. SAP has already secured a text mining business via BOBJ/Inxight, but at least one vendor each could easily be bought by Oracle, Microsoft (despite the in-house expertise from its search arm), and IBM (despite or even in connection with UIMA).
But in the meantime, a few small text mining vendors are still showing rapid growth.
Previous “fact and fiction” post: Data warehouse appliances.