Discussion of efforts to integrate text analytics with business intelligence and other analytic technologies. Related subjects include:
Text analytics application areas typically fall into one or more of three broad, often overlapping domains:
- Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).
- Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.
- Aiding text search, custom publishing, and other electronic document-shuffling use cases, often via document augmentation.
For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:
- A bunch of documents are analyzed to ascertain the ideas expressed in them.
- A count is made as to how many times each idea turns up.
- The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.
Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.
But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. Read more
|Categories: Attensity, BI integration, Investment research and trading, SPSS, Text mining, Voice of the Customer||8 Comments|
The newsletter/column excerpted below was originally published in 1998. Some of the specific references are obviously very dated. But the general points about the requirements for successful natural language computer interfaces still hold true. Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts — especially in the area of search-over-business-intelligence — are at least mildly encouraging. Emphasis added.
Natural language computer interfaces were introduced commercially about 15 years ago*. They failed miserably.
*I.e., the early 1980s
For example, Artificial Intelligence Corporation’s Intellect was a natural language DBMS query/reporting/charting tool. It was actually a pretty good product. But it’s infamous among industry insiders as the product for which IBM, in one of its first software licensing deals, got about 1700 trial installations — and less than a 1% sales close rate. Even its successor, Linguistic Technologies’ English Wizard*, doesn’t seem to be attracting many customers, despite consistently good product reviews.
*These days (i.e., in 2009) it’s owned by Progress and called EasyAsk. It still doesn’t seem to be selling well.
Another example was HAL, the natural language command interface to 1-2-3. HAL is the product that first made Bill Gross (subsequently the founder of Knowledge Adventure and idealab!) and his brother Larry famous. However, it achieved no success*, and was quickly dropped from Lotus’ product line.
*I loved the product personally. But I was sadly alone.
In retrospect, it’s obvious why natural language interfaces failed. First of all, they offered little advantage over the forms-and-menus paradigm that dominated enterprise computing in both the online-character-based and client-server-GUI eras. If you couldn’t meet an application need with forms and menus, you couldn’t meet it with natural language either. Read more
|Categories: BI integration, IBM and UIMA, Language recognition, Natural language processing (NLP), Progress and EasyAsk, Search engines, Speech recognition||2 Comments|
Late last year, there was a little flap about who invented the phrase business intelligence. Credit turns out to go to an IBM researcher named H. P. Luhn, as per this 1958 paper. Well, I finally took a look at the paper, after Jeff Jones of IBM sent over another copy. And guess what? It’s all about text analytics. Specifically, it’s about what we might now call a combination of classification and knowledge management.
Half a century later, the industry is finally poised to deliver on that vision.
Text analytics vendors participate in the same trends as other software and technology vendors. For example, relational business intelligence and data warehousing products are increasingly being sold to departmental buyers. Those buyers place particularly high value on ease of installation. And golly gee whiz, both parts of that are also true in text mining.
But beyond such general trends, I’ve identified six developments that I think could radically transform the text analytics market landscape. Indeed, they could invalidate the neat little eight-bucket categorization I laid out in the prior post. Each is highly likely to occur, although in some cases the timing remains greatly in doubt.
These six market-transforming trends are:
- Web/enterprise/messaging integration
- BI integration
- Universal message retention
- Portable personal profiles
- Electronic health records
- Voice command & control
|Categories: BI integration, Enterprise search, Google, Microsoft, Search engines, Social software and online media, Text mining||1 Comment|
As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:
1. Web search
2. Public-facing site search
3. Enterprise search and knowledge management
4. Custom publishing
5. Text mining and extraction
Three are more standalone:
6. Spam filtering
7. Voice recognition
8. Machine translation
Attivio is having a house party and product rollout in the latter part of January, and details are scarce in the mean time. But here are some highlights.
- Attivio was founded in August. It has 21 people and 1 VC. The VC has invested >$6 million and committed >$12 million total.
- Attivio has ambitious plans for a fully integrated data management/real-time BI stack. It’s currently called the “Active Intelligence Engine.” Read more
|Categories: Attivio, BI integration, Investment research and trading, Lucene, Open source text analytics||4 Comments|
I just had a quick chat with text mining vendor Clarabridge’s CEO Sid Banerjee. Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question. Attensity is unsurprisingly #1. What’s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force. Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.
The most interesting point was text mining SaaS (Software as a Service). When Clarabridge first put out its “We offer SaaS now!” announcement, I yawned. But Sid tells me that about half of Clarabridge’s deals now are actually SaaS. The way the SaaS technology works is pretty simple. The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center. If there’s a desire to join the results of the text analysis with some tabular data from the client’s data warehouse, the needed columns get sent over as well. And then Clarabridge does its thing. Read more
|Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, IBM and UIMA, Software as a Service (SaaS), Text mining, Text mining SaaS||1 Comment|
Today’s big news is IBM’s $5 billion acquisition of Cognos. Part of the analyst conference call was two customer examples of how the companies had worked together in the past — and one of those two had a lot of “integration of structured and unstructured data.” The application sounded more like a 360-degree customer view, retrieving text documents alongside relational records, than it did like hardcore text analytics. Even so, it illustrates a trend that I was seeing even before BOBJ’s buy of Inxight, namely an increasing focus in the business intelligence world on at least the trappings of text analytics.
I’m at the Business Objects annual user conference, and had a couple of chances to talk with Inxight/text analytics folks. When I asked about areas of commercial application traction, answers were similar to those I got from Attensity and Clarabridge, but not quite the same. Specifically:
- Voice of the Customer is definitely tops.
- Some of the other applications Attensity and Clarabridge mentioned appear as well (e.g., antifraud).
- Business Objects also has a couple of customers looking at text mining as an aid to medical records, e.g. by helping to catch errors in tabular-field coding.
- There are some projects in actual investment research/analysis/trading, e.g. in correlating news announcements and stock price movements.
The Business Objects/Inxight folks also made a couple of interesting general technical points. Read more
|Categories: Application areas, BI integration, Business Objects and Inxight, Investment research and trading, Voice of the Customer||Leave a Comment|
More precisely, SAP is acquiring Business Objects, and of course Business Objects already acquired Inxight.
This could be interesting …