December 23, 2007

Text mining – fact and fiction

Text mining is science-project artificial intelligence. Fiction. Text mining is proven in many practical applications.

To implement text mining, you need computational linguists. Fact. Monash’s Second Law of Commercial Semantics states “Where there are ontologies, there is consulting.” And it’s linguists, or reasonable facsimiles of same, who do the consulting.

To use text mining, you need computational linguists. Fiction. When last I counted, the number of known computational linguists working for end-user organizations, worldwide, was precisely 1, at Procter & Gamble. (Intelligence agencies excepted, of course.) I’d guess it’s higher now, but I probably could still count them all without taking my socks off.

CRM applications are driving the growth of text mining. Fact. Most current growth in text mining seems to come from Voice of the Customer and Voice of the Market/competitive intelligence applications. And a couple of years ago, when SAS and SPSS had a joint boom in text mining, a lot of that was coming from CRM.

Text mining products are useful mainly for large enterprises. More fact than fiction. Text mining makes the most sense when you have too much text for humans to read and summarize.

Text mining doesn’t fit well with relational databases. Fiction. The fastest-growing text mining companies seem to be Attensity and Clarabridge, who consistently extract textual information into relational databases.

Text mining imposes structure on unstructured* data. More fact than fiction. Most text mining applications involve examining free-text documents and creating entries in relational or XML databases. Most people would call that a transition from unstructured to structured form.

*I still don’t like the “structured/unstructured” distinction, but with repetition I’m getting somewhat inured to it.

Enterprise search is an alternative to text mining. Fact. You can use a high-end search engine to cluster documents and look for trends and insight. It’s not the real McCoy, but in some cases it gives you 80% of the benefit of the real thing.

Text mining is an ingredient, not a product category. Part fact, part fiction. The biggest text mining efforts in the world are probably at Google, Yahoo, Microsoft search, and Dow Jones/Factiva. Antispam vendors also invest a lot in text mining. Two of the top five independent text mining vendors were acquired this year (ClearForest and Inxight). And of the many dozens of small text mining independents, most are focused on specific niches.

Even so, Attensity, Clarabridge, and Temis show that, at least for now, text mining remains a legitimate product category.

The text mining industry is in trouble. Part fact, part fiction. As I recently ranted, even the leading text mining vendors are letting many opportunities pass them by. And like many software sectors, text mining seems poised to be absorbed via large-company acquisition. SAP has already secured a text mining business via BOBJ/Inxight, but at least one vendor each could easily be bought by Oracle, Microsoft (despite the in-house expertise from its search arm), and IBM (despite or even in connection with UIMA).

But in the meantime, a few small text mining vendors are still showing rapid growth.

Previous “fact and fiction” post: Data warehouse appliances.

Related links


5 Responses to “Text mining – fact and fiction”

  1. Bill Burke on December 24th, 2007 7:21 pm

    text-minig isn’t really in trouble, it’s just evolving!

    Bill Burke
    Your Audio, Live Streams, and more…
    To every phone: On-Demand or Shouted-Out

  2. DBMS2 — DataBase Management System Services » Blog Archive » Data warehouse appliances – fact and fiction on April 25th, 2008 12:09 am

    […] If you liked this post, you might also like one on text mining fact and fiction. […]

  3. Infology.Ru » Blog Archive » Комплексы для хранилищ данных – факты и вымыслы on August 19th, 2008 2:36 pm

    […] Если вам понравился этот пост, вам также может понравиться пост о фактах и вымыслах text mining. […]

  4. Three broad categories of data | DBMS2 -- DataBase Management System Services on January 17th, 2010 11:32 am

    […] gray area lies in text that gets linguistically processed – i.e. via text-mining tools – with the output placed into a relational database. Well, let’s just say no taxonomy […]

  5. Monash’s First Law of Commercial Semantics explained | Strategic Messaging on April 9th, 2011 5:17 am

    […] way, Monash’s Second Law of Commercial Semantics is much more technologically oriented:   Where there are ontologies, there is consulting. I first said that at the Text Mining Summit, and it seemed to win immediate, widespread […]

Leave a Reply

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.