Text mining

Analysis of text mining companies, technology, and trends. Related subjects include:

December 1, 2010

The state of the art in text analytics applications

Text analytics application areas typically fall into one or more of three broad, often overlapping domains:

Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).
Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.
Aiding text search, custom publishing, and other electronic document-shuffling use cases, often via document augmentation.

For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:

A bunch of documents are analyzed to ascertain the ideas expressed in them.
A count is made as to how many times each idea turns up.
The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.

Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.

But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. Read more

Categories: Attensity, BI integration, Investment research and trading, SPSS, Text mining, Voice of the Customer

12 Comments

October 24, 2010

Notes, links, and comments, October 24, 2010

Time for a notes/links/comments post just for Text Technologies: Read more

Categories: Blogosphere, Online media, Sentiment analysis, Social software and online media, Text mining

Maybe text mining SHOULD be playing a bigger role in data warehousing

When I chatted last week with David Bean of Attensity, I commented to him on a paradox:

Many people think text information is important to analyze, but even so data warehouses don’t seem to wind up holding very much of it.

Categories: Attensity, Comprehensive or exhaustive extraction, Sentiment analysis, Text mining

5 Comments

October 24, 2008

Attensity update

I had a brief chat with the Attensity guys at their Teradata Partners Conference booth – mainly CTO David Bean, although he did buck one question to sales chief Jeff Johnson. The business trends story remained the same as it was in June: The sweet spot for new sales remains Voice of the Customer/Voice of the Market, while on-premise/SaaS new-name accounts are split around 50-50 (by number, not revenue).

David’s thoughts as to why the SaaS share isn’t even higher – as it seems to be for Clarabridge* – centered on the point that some customers want to blend internal and external data, and may not want to ship the internal part out to a SaaS provider. Besides, if it’s tabular data, I suspect Attensity isn’t the right place to ship it anyway.

*Speaking of Clarabridge, CEO Sid Banerjee recently posted a thoughtful company update in this comment thread.

When I challenged him on ease of use, David said that Attensity is readying a Microstrategy-based offering, which is obviously meant to compete with Clarabridge and any of its perceived advantages head-on.

Categories: Application areas, Attensity, Clarabridge, Competitive intelligence, Software as a Service (SaaS), Text mining, Text mining SaaS, Voice of the Customer

1 Comment

September 19, 2008

Low-latency text mining in the investment market

I’m not at Gartner’s Event Processing conference, but there seem to be some interesting posts and articles coming out of it. Seth Grimes has one on Reuters’ integration of text mining and event processing, including sentiment analysis. Well worth reading. Lots more detail than I’ve ever posted on similar applications.

Categories: ClearForest/Reuters, Investment research and trading, Sentiment analysis, Text mining

4 Comments

September 8, 2008

The layered messaging marketing model as applied to Attensity

My general layered messaging theory survived its first test against an IT vendor example – Netezza. Let’s try another, in this case a company that’s not a Monash Research client. Read more

Categories: Attensity, Competitive intelligence, Text mining, Voice of the Customer

3 Comments

August 7, 2008

Lexalytics has merged with part of Infonic

As reported on the Lexalytics blog, sentiment analysis specialist Lexalytics has merged with the text analytics division of Infonic to form Lexalytics Limited. The deal seems to have a screwy financial structure — which Seth Grimes made a valiant effort to decipher (I think from vacation, poor guy) — as is common when companies much too small to be public wind up trading publicly anyway.

Related links

Categories: Lexalytics, Sentiment analysis

If you think sentiment analysis technology can detect idiom, I have a bridge I’d like to sell you

Text mining tools are just WONDERFUL at detecting idiom, sarcasm, and figurative speech … Yeah, right. I asked Lexalytics CEO Jeff Catlin whether his tool could do that kind of thing, and he looked at me like I’d just grown a third ear.

Actually, he didn’t. But just like every other sentiment analysis vendor I encountered at the Text Analytics Summit or spoke to beforehand, he made it clear that his tool could only handle straightforward, literal expressions of opinion. Idiom, irony, sarcasm, metaphor, et al. are beyond the current reach of the technology.

Aren’t you just thrilled that I shared that earth-shattering news with you?

Categories: Lexalytics, Sentiment analysis, Text mining

15 Comments

June 19, 2008

6 trends that could shake up the text analytics market

My last two posts were based on the introductory slide to my talk The Text Analytics Marketplace: Competitive landscape and trends. I’ll now jump straight ahead to the talk’s conclusion.

Text analytics vendors participate in the same trends as other software and technology vendors. For example, relational business intelligence and data warehousing products are increasingly being sold to departmental buyers. Those buyers place particularly high value on ease of installation. And golly gee whiz, both parts of that are also true in text mining.

But beyond such general trends, I’ve identified six developments that I think could radically transform the text analytics market landscape. Indeed, they could invalidate the neat little eight-bucket categorization I laid out in the prior post. Each is highly likely to occur, although in some cases the timing remains greatly in doubt.

These six market-transforming trends are:

Web/enterprise/messaging integration
BI integration
Universal message retention
Portable personal profiles
Electronic health records
Voice command & control

Categories: BI integration, Enterprise search, Google, Microsoft, Search engines, Social software and online media, Text mining

1 Comment

June 19, 2008

The Text Analytics Marketplace: Competitive landscape and trends

As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:

1. Web search

2. Public-facing site search

3. Enterprise search and knowledge management

4. Custom publishing

5. Text mining and extraction

Three are more standalone:

6. Spam filtering

7. Voice recognition

8. Machine translation

Categories: Audio and video search, BI integration, Custom publishing, Enterprise search, Google, Natural language processing (NLP), Nuance, Progress and EasyAsk, Search engines, Social software and online media, Spam and antispam, Speech recognition, Structured search, Text Analytics Summit, Text mining

3 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in