July 23, 2006

Update: Autonomy/Verity merger

I had a couple of very interesting calls with Autonomy last week. One message I got was that they do not want to be pigeonholed in search, which they think on the whole is a primitive way of dealing with “unstructured information.” Nonetheless, my first post based on those calls will indeed focus on text indexing and search. You see, I wrote quite skeptically about the Autonomy/Verity merger when it was announced, and I’d like to amend that with an updated opinion. Autonomy’s claims can be summarized in part by the following: Read more

July 19, 2006

Lead UIMA architect Dave Ferrucci speaks about adoption

Dave Ferrucci, lead architect for UIMA, shared some detailed views with me about UIMA adoption. WIth his permission, they are reproduced below. UIMA is still not getting a lot of attention from commercial text analytics vendors, but ultimately I think it will prevail, if just because nobody cares enough to start a war of dueling alternative standards.* So it’s something you should educate yourself about as it progresses.

*And IBM plans to convince me ASAP that even that assessment is too negative, which it well may be. Stay tuned.

So to sum up — 1. We seem to have fair amount of traction with the UIMA framework by communities that are very interested in plug-n-play with components from other providers. This includes the government, life sciences and research communities. 2. The UIMA standard, as opposed to the specific Java Framework implementation, developed under an SDO will broaden the opportunity and strengthen the case of adoption of UIMA as a standard for text and multi-modal analytics that allows interoperability across different frameworks and applications. It would of course be the case that the Java UIMA Framework would comply to the standard.

The complete email follows.
Read more

July 17, 2006

Should ontology management be open sourced?

I’ve argued previously that enterprises need serious ontologies, and that this lack is holding back growth in multiple areas of text technology – search, text mining and knowledge extraction, various forms of speech recognition, and so on. The core point was:

The ideal ontology would consist mainly of four aspects:

1. A conceptual part that’s language-independent.
2. A general language-dependent part.
3. A sensitivity to different kinds of text – language is used differently when spoken, for instance, than it is in edited newspaper articles.
4. An enterprise-specific part. For example, a company has product names, it has competitors with product names, those names have abbreviations, and so on.

Read more

July 11, 2006

Towards an enterprise text architecture

My column this month for Computerworld is on enterprise text technology architecture. A sequel is promised for next month.

This month’s column focuses mainly on reciting application needs. Did I leave any important ones out?

Next time I’ll focus more on how to meet those needs. I need to write it in in 2 1/2 weeks or so. I plan to talk with a lot of industry players between now and then.

July 11, 2006

Google’s internal text-based project/knowledge management

Slashdot turned up an amazing article in Baseline on Google’s infrastructure. There’s lots of gee-whiz stuff in there about server farms, petabytes of disk packed into a standard shipping container so as to allow the setup of more server farms around the globe, and so on. But even more interesting to me was another point, about Google’s internal use of its own technology. In at least one case – a hybrid of project and knowledge management – Google really seems to be doing what other firms only dream about as futures. Here’s the relevant excerpt:

Read more

June 26, 2006

Scoping the text mining market

Another Text Analytics/Mining Summit, another occasion to discuss text mining market numbers. Except — it’s really hard to get any specifics. Before writing this post, I decided to web search on text mining market to see if anybody had posted anything about its size or growth. The first and pretty much only relevant hit I could find was my own blog post of a year ago, reproduced below. Oh dear.

Read more

June 25, 2006

Relationship analytics — turbocharge for text mining?

While at the Text Analystics Summit, I came increasingly to suspect that two technologies – both of which I’ve put considerable research into recently — are very synergistic with each other:

Read more

June 24, 2006

The French love their language

One noteworthy aspect of the Text Analytics Summit is the French presence. France is generally inept in the software industry, but the text mining business is a clear exception. Temis is a French company. SPSS’s text mining operation (which was Lexiquest), is part French, part English, and run by a Frenchman. Teragram was founded by French guys. For variety, clustering company Semio was founded by a French semiotics professor, and nStein’s managers are a bunch of Quebecois.

Read more

June 24, 2006

Procter & Gamble on text mining projects

Terry McFadden of Procter & Gamble made a number of interesting points in his Text Analytics Summit talk, in the area of how to build and “amass” (his word) lexicons. Above all, I’m thrilled that he recognized the necessity of amassing lexicography that can be reused from one app to the next. Beyond that, specific comments and tips included: Read more

June 23, 2006

The current state of text mining/analytics marketing?

One thing that didn’t go so well at the Text Analytics Summit was the marketing panel. Indeed, when we wracked our brains afterward, Mary Crissey (who was on the panel) and I could only think of a single observation that was actually made about marketing. Namely, she referred to a core truth of marketing: Just selling features doesn’t work (nobody cares). Just selling benefits doesn’t work (you’re not differentiated). What you have to do is sell the connection between your features and desirable benefits.

So I’m going to try to gather some useful observations on marketing here, filling the gap that the panel left. Key questions I’d love input on include:

1. Which feature-benefit connections do you see customers easily accepting?

2. Which feature-benefit connections is it harder to get them to believe?

3. How are customers defining text analytics market segments?

4. What do they see as the key issues in each segement?

5. Which application areas are showing growth even beyond that of the market overall?

I’m particularly interested in comments from the larger vendors that are selling into multiple parts of the text mining and text analytics market. But everybody else’s input would be warmly appreciated too.

The comment thread to this post is open for business!

← Previous PageNext Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.