Text mining for compliance and legal discovery
One theme that keeps recurring in my talks with text mining and other text analytics/text technology companies is compliance. Ditto legal discovery, which is closely related. Most of the focus seems to be on three kinds of data:
- Vehicle defect evidence. The TREAD Act is of course the big driver here (no pun intended).
- Drug side effect evidence. The FDA is pushing that one.
- Email/correspondence archives. Text search/filtering/clustering/mining whatever is now a standard part of legal discovery.
| Categories: Enterprise search, Search engines, Text mining | 2 Comments |
Autonomy on text mining
I asked Mike Lynch (Autonomy CEO) about text mining. He responded with an example:
A very well-known company “mines” its incoming emails for signs of trouble, not via any linguistics-driven approach, but just by clustering them. If a cluster changes size anomalously over time, it bears close investigation.
| Categories: Autonomy, Search engines, Text mining | 1 Comment |
Update: Autonomy/Verity merger
I had a couple of very interesting calls with Autonomy last week. One message I got was that they do not want to be pigeonholed in search, which they think on the whole is a primitive way of dealing with unstructured information. Nonetheless, my first post based on those calls will indeed focus on text indexing and search. You see, I wrote quite skeptically about the Autonomy/Verity merger when it was announced, and I’d like to amend that with an updated opinion. Autonomy’s claims can be summarized in part by the following: Read more
| Categories: Autonomy, Enterprise search, Search engines | Leave a Comment |
Lead UIMA architect Dave Ferrucci speaks about adoption
Dave Ferrucci, lead architect for UIMA, shared some detailed views with me about UIMA adoption. WIth his permission, they are reproduced below. UIMA is still not getting a lot of attention from commercial text analytics vendors, but ultimately I think it will prevail, if just because nobody cares enough to start a war of dueling alternative standards.* So it’s something you should educate yourself about as it progresses.
*And IBM plans to convince me ASAP that even that assessment is too negative, which it well may be. Stay tuned.
So to sum up — 1. We seem to have fair amount of traction with the UIMA framework by communities that are very interested in plug-n-play with components from other providers. This includes the government, life sciences and research communities. 2. The UIMA standard, as opposed to the specific Java Framework implementation, developed under an SDO will broaden the opportunity and strengthen the case of adoption of UIMA as a standard for text and multi-modal analytics that allows interoperability across different frameworks and applications. It would of course be the case that the Java UIMA Framework would comply to the standard.
The complete email follows.
| Categories: IBM and UIMA, Open source text analytics | 2 Comments |
Should ontology management be open sourced?
I’ve argued previously that enterprises need serious ontologies, and that this lack is holding back growth in multiple areas of text technology – search, text mining and knowledge extraction, various forms of speech recognition, and so on. The core point was:
The ideal ontology would consist mainly of four aspects:
1. A conceptual part that’s language-independent.
2. A general language-dependent part.
3. A sensitivity to different kinds of text – language is used differently when spoken, for instance, than it is in edited newspaper articles.
4. An enterprise-specific part. For example, a company has product names, it has competitors with product names, those names have abbreviations, and so on.
| Categories: Ontologies, Open source text analytics | Leave a Comment |
Towards an enterprise text architecture
My column this month for Computerworld is on enterprise text technology architecture. A sequel is promised for next month.
This month’s column focuses mainly on reciting application needs. Did I leave any important ones out?
Next time I’ll focus more on how to meet those needs. I need to write it in in 2 1/2 weeks or so. I plan to talk with a lot of industry players between now and then.
| Categories: Ontologies, Search engines, Text mining | 4 Comments |
Google’s internal text-based project/knowledge management
Slashdot turned up an amazing article in Baseline on Google’s infrastructure. There’s lots of gee-whiz stuff in there about server farms, petabytes of disk packed into a standard shipping container so as to allow the setup of more server farms around the globe, and so on. But even more interesting to me was another point, about Google’s internal use of its own technology. In at least one case – a hybrid of project and knowledge management – Google really seems to be doing what other firms only dream about as futures. Here’s the relevant excerpt:
| Categories: Enterprise search, Google, Search engines, Specialized search | 1 Comment |
