October 10, 2008

More on Languageware

Marie Wallace of IBM wrote back in response to my post on Languageware. In particular, it seems I got the Languageware/UIMA relationship wrong. Marie’s email was long and thoughtful enough that, rather than just pointing her at the comment thread, I asked for permission to repost it. Here goes:

Thanks for your mention to LanguageWare on your blog, albeit a skeptical one :-) I totally understand your scepticism as there is so much talk about text analytics these days and everyone believes they have solved the problem. I guess I can only hope that our approach will indeed prove to be different and offers some new and interesting perspectives.

The key differentiation in our approach is that we have completely decoupled the language model from the code that runs the analysis. This has been generalized to a set of data-driven algorithms that apply across many languages so that you can have an approach that makes the solution hugely and rapidly customizable (without having to change code). It is this flexibility that we believe is core to realizing multi-lingual and multi-domain text analysis applications in a real-word scenario. This customization environment is available for download from Alphaworks, http://www.alphaworks.ibm.com/tech/lrw, and we would love to get feedback from your community.

On your point about performance, we actually consider UIMA one of our greatest performance optimizations and core to our design. The point about one-pass is that we never go back over the same piece of text twice at the same “level” and take a very careful approach when defining our UIMA Annotators. Certain layers of language processing just don’t make sense to split up due to their interconnectedness and therefore we create our UIMA annotators according to where they sit in the overall processing layers. That’s the key point.

Anyway those are my thoughts, and thanks again for the mention. It’s really great to see these topics being discussed in an open and challenging forum.

Comments

3 Responses to “More on Languageware”

  1. Bob Carpenter on October 22nd, 2008 5:49 pm

    We typically tell our potential customers that not only don’t we have any magic pixie dust, no one does. It’s best to cut out the hype up front!

    I didn’t understand Marie’s point about how their key differetiator is that they’ve “completely decoupled the language model from the code that runs the analysis”. Doesn’t everyone do this?

    Our product, LingPipe, has general high-level interfaces that are uniform across applications for everything from spelling correction to classification to tokenization, sentence detection, part-of-speech tagging and entity extraction.

    Uniformity and portability I understand (up to the problem of adapting tag sets and tokenization standards), but how could UIMA help with speed? To code reusable components in UIMA requires translation into (and often out of) the common analysis stream (CAS) that handles “data exchange” among modules. For third parties, this is prohibitive, because I need to translate our tokens into UIMA tokens and back again before I send them to a tagger. Of course, I can just wrap tokenization, tagging, sentence extraction and entity extraction in a single UIMA module, but that defeats plug-and-play portability.

    UIMA isn’t unusual with respect to streaming. We can do named entity annotation on multi-GB XML docs with very low memory overhead using the SAX parser and our generic entity extraction interface. With a single model shared across threads.

    Is anyone seeing demand for UIMA outside of government-funded research? We’re still debating whether it’s worth our time to write more general and complete UIMA wrappers for LingPipe than have been contributed by third parties.

  2. Curt Monash on October 22nd, 2008 8:41 pm

    Temis had a deal with, I think, Europol that specified UIMA.

    Otherwise, I haven’t heard of a lot of demand.

  3. Kas Thomas on October 23rd, 2008 9:00 am

    I don’t mean to be flip, but the mere fact that you have to ask the question points to the answer, I think. I don’t know anyone who is seeing significant demand for UIMA. The folks at Nstein might have some insight into this.

Leave a Reply




Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.