Sergei Ananyan’s claim that analytic business processes involving text are still very primitive is absolutely correct. Indeed, analytic business processes have a lot of maturing to do overall. Still, there’s one area where the industry has devoted a lot of thought over the past few years, and some notion of process has emerged. This is in the finding of warning signs.
Note: Hat tip to Attensity for focusing on this in our talk today, and even more in the part of the slide deck we didn’t actually go over, but they’re far from the only vendor to be thinking along these lines.
If we look at the major application areas for text mining, most of them fit more or less neatly into the “warning signs” bucket. In particular, that’s true of:
- Vehicle safety
- Other manufacturing/warranty analysis apps
- Reputation management (for the most part)
- Other customer sentiment apps (some, perhaps most)
- Sarbanes-Oxley compliance
- Stopping money laundering
- Clinical applications (some)
- Early insurance risk management apps
- Early experimental hedge fund apps
And you can probably think of more examples yet.
So what are some processes used to deal with these apps?
1. In some cases, one has ongoing trouble, and is trying to diagnose it so as to prevent more occurrences. Sometimes there even are regular write-ups of known bad situations, such as warranty claims (technician or customer reports), insurance claims, Suspicious Activity Reports (for money laundering), etc. Then one can mine those write-ups to extract any facts that seem to be prevalent in those situations. This kicks off a standard data mining process – get and test some hunches, test some more, build an appropriate rule set, get the model into operational production for, as the case may be, either real-time (or real-enough-time) decisioning, or else a place of honor on dashboards and other performance monitors.
2. When the write-ups aren’t so regular, one can do the same thing anyway. An example might be correspondence from customers who later canceled their accounts.
3. In other cases, one is looking for trouble even before one has found some. Compliance often falls into this category, as does web-crawling reputation management. One process, favored by Autonomy, is simply to monitor document flow for all important themes, and hope that the trouble signs jump out at you. Alternatively, one can monitor documents for known bad event flags – vehicle malfunctions, drug side effects, angry customers, whatever. If there are only a few documents with such flags, one can read them directly If there are too many for humans to just read and digest in a timely manner – well, then you’ve transitioned into Case 1 or Case 2!