Text analytics vendors participate in the same trends as other software and technology vendors. For example, relational business intelligence and data warehousing products are increasingly being sold to departmental buyers. Those buyers place particularly high value on ease of installation. And golly gee whiz, both parts of that are also true in text mining.
But beyond such general trends, I’ve identified six developments that I think could radically transform the text analytics market landscape. Indeed, they could invalidate the neat little eight-bucket categorization I laid out in the prior post. Each is highly likely to occur, although in some cases the timing remains greatly in doubt.
These six market-transforming trends are:
- Web/enterprise/messaging integration
- BI integration
- Universal message retention
- Portable personal profiles
- Electronic health records
- Voice command & control
I’ll explain briefly.
1. Google and Microsoft are two of the three leaders in web search. Now that Microsoft has bought FAST, they are also two of the leaders in enterprise search. They are also two of the leaders in hosted email. Ditto instant messaging. So there’s a good chance these various disciplines will converge.
2. There are a number of ways text analytics and traditional analytics can and are being integrated:
- Enterprise search and business intelligence are akin; both involve digging information out of the data you already have.
- Text mining is naturally integrated with business intelligence and/or data mining.
- There’s a trend toward using text search to dig up business intelligence documents such as specific reports, spreadsheets, etc.
To date the latter is focused on reports that already exist, rather than queries that could be run on the fly, but I hope and trust the technology will be extended over time. Natural language queries have merit anyway; I’d like to see the search box be extended in functionality to a true data-retrieval command line.
3. One of the big purchase drivers of storage, search, and clustering technology is mandates to preserve information and make it available to auditors, regulators, and/or people who want to sue you. Email in particular is changing from being ephemeral to becoming part of the permanent record. Well, if the information is being retained anyway, then maybe it’s time to see how to get useful insight from it.
Right now, a company’s overall text archives aren’t being leveraged in the same way data warehouses are. That will change.
4. For over a decade, online companies have fought to exploit the fact that users were registered with their sites or services, but not with others. Huge amounts of investment money were wasted in the dot-com bubble because people thought “registered users” was a significant metric, or that ISP subscribers could be directed to proprietary content. Enormous valuations are being assigned to Facebook and LinkedIn on similar theories today.
But as site owners and other marketers get ever more aggressive about exploiting user-specific information, users will get ever more sophisticated about controlling it. The obvious solution is for each internet user to control a sophisticated database of their contact information, presence information, actions, preferences, and writings, and to be very selective about which online services are allowed to see which portions of the data. I think that will come about some day, but I don’t know when. When it does, text analytics will be affected in a variety of interesting ways.
5. Electronic health records are almost unique in IT. What other enterprise app can you think of for which relational DBMS aren’t the default underpinning? (Intersystems’ object-oriented DBMS Cache’ has huge share in the clinical records market.) Normal tabular data, text, images, sensor output streams – health records have it all. What’s more, the health records area is coming upon some very interesting times in the area of data sharing, at least in the US.
Just as retailing went from being an IT backwater (through the mid-1980s), to a sophisticated user of database technology (1990s), to the leader of the internet revolution (rise of e-commerce), I think health care is due to take a leadership role in IT advances. And when it does, search, text mining, and voice recognition will all play important roles.
6. Most people reading this far have probably watched Star Trek. Well, what is keeping us from being able to command computers in a Star Trek fashion? Not really that much. Sure, there are some big missing pieces. We need a mapping from commands to the specific applications that would carry them out. We also need a more structured kind of analytic middle tier so that there’s something to map questions to. But those are solvable problems. And by the way – when everybody wears headphones, voice commands emanating from the next cubicle are no longer the big annoyance they would be today. Mobile/small devices only add to the business case for voice recognition advances.
When voice becomes a primary mode of human/device communication, “text” analytics will be affected in any number of ways.
- The introductory post in this series
- 19 possible Microsoft/Yahoo synergies, many of them related to text technology convergence, e.g. between web search and enterprise search
- The compelling case for letting Google handle your enterprise email
- An old post on why BI vendors flocked to integrate with Google OneBox
- A proposal to refactor social networks
- An old post in which I outlined some of the criteria for Profiles 2.0
- Why text technologies are going to recombine (in A World of Bytes)