June 26, 2006

Scoping the text mining market

Another Text Analytics/Mining Summit, another occasion to discuss text mining market numbers. Except — it’s really hard to get any specifics. Before writing this post, I decided to web search on text mining market to see if anybody had posted anything about its size or growth. The first and pretty much only relevant hit I could find was my own blog post of a year ago, reproduced below. Oh dear.

Susan Feldman of IDC probably has better numbers than I do, but she correctly points out how unreliable they are. Thus, all she’ll say is that the vastly bigger market of text/content-related stuff is growing at a very healthy 35% clip. However, that doesn’t say much about the small text mining segment. Also confounding the issue:

One data point that arose at the Text Analytics Summit was that a typical leading text mining company gets only 25% or so of its revenue from professional services, down from 50%+ three or so years ago. However, there’s no assurance that professional services revenues have been growing much, and hence this doesn’t tell us much about license fee growth besides the obvious point that it’s probably 20%+.

Bottom line: The text mining market has roughly $50-100 million annual product revenue, and is growing at roughly 40-60% annually. If those numbers aren’t accurate, they’re close enough for most purposes that you’d need market statistics for. And please don’t ask me to show the work on which those numbers are based.

And here’s what I said last year:

I vigorously resist estimating market sizes, due to mutliple levels of definitional problems — what products are in the market, which revenue dollars should be associated with which product, etc. But I’ve been talking so much about text mining recently, in the aftermath of an excellent text mining conference, that questions on the subject keep getting posed to me. So here are a few thoughts and data points.

  • I estimate that SPSS and SAS have several hundred customers each for text data mining, narrowly construed.
  • In addition, SPSS has many hundreds more customers for text mining as specifically applied to opinion surveys, and a bunch more text mining customers that don’t fit neatly into either of the first two groups I cited. Based on this, they have a compelling claim to be the text mining market leader.
  • As a wild guess, I estimate that Oracle has in the dozens of text mining customers total, not counting text mining done by other vendors against data in Oracle databases.
  • The leaders among the specialist text mining vendors seem to have a few dozen customers each. Inxight is a special case exception because they OEM technology to lots of other search and text mining vendors, including SAS.
  • As noted in the post linked above, medical-discovery text mining is around a $10 million market, which isn’t a lot given the large amount of smart and important work being done in the area.

I think these numbers will get a lot bigger soon. Text mining is a very hot area.


2 Responses to “Scoping the text mining market”

  1. mary grace crissey on July 4th, 2006 5:22 pm

    One of the reasons we find so few market sizing figures for text mining — or text analytic technologies is because its hard to “draw the line” around this field.

    I’ve seen text analytical tools lumped in with
    • the “content management – Information Management” software technologies
    • others see Text analytics as one form of analysis to add to predictive analytics and data mining suites
    • others focus on the linguistics and semantics
    • Others toss it in as a BI enhancement and insist on reporting TM in with the entire BI area as one metric.
    • Perhaps we can call TM a form of Artificial Intelligence – you can see evidence of this as text applications are showing up in ACM conferences and research institutions around the world

    so before we lament about the lack of revenue numbers for this emerging field – lets begin by building awareness especially in the IT communities of what Text Mining is and start to highlight what is and is not text mining.

    Setting boundaries is especially tricky when you try to determine the fair share of revenue coming from “turbo charged” solutions. By this I mean those specific industrial focused implementations such as warranty that are growing by tremendous leaps and bounds now that it has added text analytics to the” Engine.”

  2. Curt Monash on July 6th, 2006 10:58 pm

    Good points all, Mary. Although the definitional wars for “text mining” may be more trouble than they’re worth.

    Maybe we should collect some proof points, like “so-and-so many customers for flavor X of app, and so-and-so many for flavor Y.” Actually, I suspect there are a number of subcategories in which SAS is the actual leader, exceeding even SPSS.

    Curt Monash

Leave a Reply

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.